It has been a couple of days since I’ve posted, as I’ve been trying to track down an elusive bug deep in the bowels of the C# port of BioJava. I don’t have it fixed yet, but I’ve determined the root cause.
There are number of places in BioJava where they use a Set as a key in a dictionary, to handle things such as ambiguity symbols. For example, in DNA, the symbol ‘N’ represents any base A, G, C, or T. If a reverse lookup is done, asking for the ambiguity symbol for AGCT, it should always return the same symbol – N.
In the port, I goofed and used a List<> in a couple of places to port a Set. When I went to fix that and replace it with a .Net HashSet<>, it didn’t resolve the problem, much to my surprise. I wrote a quick test to try and figure out what was going on:
[Test]
public void HashSetAsKey()
{
HashSet<int> h1 = new HashSet<int>();
h1.Add(1);
h1.Add(2);
HashSet<int> h2 = new HashSet<int>();
h2.Add(2);
h2.Add(1);
Dictionary<HashSet<int>, string> dict = new Dictionary<HashSet<int>, string>();
dict.Add(h1, "hello");
Assert.AreEqual("hello", dict[h2]);
}
This test fails, with a KeyNotFound exception. The sets are equivalent, so why didn’t this work? Time for another test.
[Test]
public void HashSetEquality()
{
HashSet<int> h1 = new HashSet<int>();
h1.Add(1);
h1.Add(2);
HashSet<int> h2 = new HashSet<int>();
h2.Add(1);
h2.Add(2);
Assert.AreEqual(h1, h2); // Why does this fail???
}
Even adding the items in the same order fails! In fact, you can remove the four Add calls and compare two empty HashSets – the test will still fail.
After some digging, it turns out that the equality test I was expecting is implemented as a separate method, called SetEquals. The following test will pass:
[Test]
public void HashSetSpecialEquality()
{
HashSet<int> h1 = new HashSet<int>();
h1.Add(1);
h1.Add(2);
HashSet<int> h2 = new HashSet<int>();
h2.Add(1);
h2.Add(2);
Assert.IsTrue(h1.SetEquals(h2));
}
So, when are two HashSets equal? It looks like the answer is: never.
Argh.
That makes the HashSet class useless as the key to a dictionary, unless I’m missing some other way to make it work.
I’m off to write my own Set class.