doug-swisher.net

September 5, 2008

When are two HashSets equal?

Filed under: Software — Tags: , — Doug @ 8:20 pm

It has been a couple of days since I’ve posted, as I’ve been trying to track down an elusive bug deep in the bowels of the C# port of BioJava.  I don’t have it fixed yet, but I’ve determined the root cause.

There are number of places in BioJava where they use a Set as a key in a dictionary, to handle things such as ambiguity symbols.  For example, in DNA, the symbol ‘N’ represents any base A, G, C, or T.  If a reverse lookup is done, asking for the ambiguity symbol for AGCT, it should always return the same symbol – N.

In the port, I goofed and used a List<> in a couple of places to port a Set.  When I went to fix that and replace it with a .Net HashSet<>, it didn’t resolve the problem, much to my surprise.  I wrote a quick test to try and figure out what was going on:

[Test]
public void HashSetAsKey()
{
    HashSet<int> h1 = new HashSet<int>();
    h1.Add(1);
    h1.Add(2);

    HashSet<int> h2 = new HashSet<int>();
    h2.Add(2);
    h2.Add(1);

    Dictionary<HashSet<int>, string> dict = new Dictionary<HashSet<int>, string>();

    dict.Add(h1, "hello");

    Assert.AreEqual("hello", dict[h2]);
}

This test fails, with a KeyNotFound exception.  The sets are equivalent, so why didn’t this work?  Time for another test.

[Test]
public void HashSetEquality()
{
    HashSet<int> h1 = new HashSet<int>();
    h1.Add(1);
    h1.Add(2);

    HashSet<int> h2 = new HashSet<int>();
    h2.Add(1);
    h2.Add(2);

    Assert.AreEqual(h1, h2);    // Why does this fail???
}

Even adding the items in the same order fails!  In fact, you can remove the four Add calls and compare two empty HashSets – the test will still fail.

After some digging, it turns out that the equality test I was expecting is implemented as a separate method, called SetEquals.  The following test will pass:

[Test]
public void HashSetSpecialEquality()
{
    HashSet<int> h1 = new HashSet<int>();
    h1.Add(1);
    h1.Add(2);

    HashSet<int> h2 = new HashSet<int>();
    h2.Add(1);
    h2.Add(2);

    Assert.IsTrue(h1.SetEquals(h2));
}

So, when are two HashSets equal?  It looks like the answer is: never.

Argh.

That makes the HashSet class useless as the key to a dictionary, unless I’m missing some other way to make it work.

I’m off to write my own Set class.

1 Comment »

  1. You have great blog and this post is good!

    best regards, Greg

    Comment by WillieNY — January 24, 2010 @ 4:05 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: