I stumbled across the Bioinformatics Group at the UofM, and realized that I met the president at a birthday party for a mutual friend a few months ago. I may have the opportunity to contribute to a project or two in the coming semester(s), so I started reading a bit about bioinformatics (again).
I went looking for some code, and found a framework called BioPerl, which seems fairly popular. My perl skills have atrophied over the years, and when I found BioJava, I was a bit more excited. It provides a number of useful functions, and seems fairly active. There is also a related database project, BioSQL, that both BioPerl and BioJava (along with BioRuby and BioPython) have incorporated language bindings. BioJava even uses Hibernate as its O/R mapping layer.
Since I like to work in C#, I started playing around with porting BioJava to C#. It’s a huge project, but it’s also a great way to see how BioJava is put together. I’ve managed to get far enough that I can transcribe DNA to RNA using the following code:
private static void TranscribeDNAtoRNA()
//make a DNA SymbolList
ISymbolList symL = DNATools.CreateDNA("atgccgaatcgtaa");
Console.WriteLine("DNA: " + symL.SeqString);
symL = DNATools.ToRNA(symL);
// just to prove it worked
Console.WriteLine("RNA: " + symL.SeqString);
catch (IllegalSymbolException ex)
// this will happen if you try and make the DNA seq using non IUB symbols
catch (IllegalAlphabetException ex)
// this will happen if you try and transcribe a non DNA SymbolList
When run, the output is:
Yup. A few dozen classes and a few hundred lines of code, and I can replace t’s with u’s. Pretty exciting, eh?
Actually, I think it is pretty cool. I’m pretty close to having the code working that will let me translate the RNA to a protein sequence or form the complement of a DNA strand. Not rocket science, but I’ve only begun to tap the surface. The framework allows reading sequence files (BLAST, FASTA), edit large sequences (efficiently), do pairwise alignment, and a whole lot more.
If you’re curious, you can compare the above C# code to the original Java code, which comes from the BioJava cookbook.