It has been a couple of days since I’ve posted, as I’ve been trying to track down an elusive bug deep in the bowels of the C# port of BioJava. I don’t have it fixed yet, but I’ve determined the root cause.
There are number of places in BioJava where they use a Set as a key in a dictionary, to handle things such as ambiguity symbols. For example, in DNA, the symbol ‘N’ represents any base A, G, C, or T. If a reverse lookup is done, asking for the ambiguity symbol for AGCT, it should always return the same symbol – N.
In the port, I goofed and used a List<> in a couple of places to port a Set. When I went to fix that and replace it with a .Net HashSet<>, it didn’t resolve the problem, much to my surprise. I wrote a quick test to try and figure out what was going on:
[Test] public void HashSetAsKey() { HashSet<int> h1 = new HashSet<int>(); h1.Add(1); h1.Add(2); HashSet<int> h2 = new HashSet<int>(); h2.Add(2); h2.Add(1); Dictionary<HashSet<int>, string> dict = new Dictionary<HashSet<int>, string>(); dict.Add(h1, "hello"); Assert.AreEqual("hello", dict[h2]); }
This test fails, with a KeyNotFound exception. The sets are equivalent, so why didn’t this work? Time for another test.
[Test] public void HashSetEquality() { HashSet<int> h1 = new HashSet<int>(); h1.Add(1); h1.Add(2); HashSet<int> h2 = new HashSet<int>(); h2.Add(1); h2.Add(2); Assert.AreEqual(h1, h2); // Why does this fail??? }
Even adding the items in the same order fails! In fact, you can remove the four Add calls and compare two empty HashSets – the test will still fail.
After some digging, it turns out that the equality test I was expecting is implemented as a separate method, called SetEquals. The following test will pass:
[Test] public void HashSetSpecialEquality() { HashSet<int> h1 = new HashSet<int>(); h1.Add(1); h1.Add(2); HashSet<int> h2 = new HashSet<int>(); h2.Add(1); h2.Add(2); Assert.IsTrue(h1.SetEquals(h2)); }
So, when are two HashSets equal? It looks like the answer is: never.
Argh.
That makes the HashSet class useless as the key to a dictionary, unless I’m missing some other way to make it work.
I’m off to write my own Set class.
You have great blog and this post is good!
—
best regards, Greg
Comment by WillieNY — January 24, 2010 @ 4:05 pm