I stumbled across the Bioinformatics Group at the UofM, and realized that I met the president at a birthday party for a mutual friend a few months ago. I may have the opportunity to contribute to a project or two in the coming semester(s), so I started reading a bit about bioinformatics (again).
I went looking for some code, and found a framework called BioPerl, which seems fairly popular. My perl skills have atrophied over the years, and when I found BioJava, I was a bit more excited. It provides a number of useful functions, and seems fairly active. There is also a related database project, BioSQL, that both BioPerl and BioJava (along with BioRuby and BioPython) have incorporated language bindings. BioJava even uses Hibernate as its O/R mapping layer.
Since I like to work in C#, I started playing around with porting BioJava to C#. It’s a huge project, but it’s also a great way to see how BioJava is put together. I’ve managed to get far enough that I can transcribe DNA to RNA using the following code:
private static void TranscribeDNAtoRNA() { try { //make a DNA SymbolList ISymbolList symL = DNATools.CreateDNA("atgccgaatcgtaa"); Console.WriteLine("DNA: " + symL.SeqString); symL = DNATools.ToRNA(symL); // just to prove it worked Console.WriteLine("RNA: " + symL.SeqString); } catch (IllegalSymbolException ex) { // this will happen if you try and make the DNA seq using non IUB symbols Console.WriteLine(ex); } catch (IllegalAlphabetException ex) { // this will happen if you try and transcribe a non DNA SymbolList Console.WriteLine(ex); } }
When run, the output is:
DNA: atgccgaatcgtaa RNA: augccgaaucguaa
Yup. A few dozen classes and a few hundred lines of code, and I can replace t’s with u’s. Pretty exciting, eh?
Actually, I think it is pretty cool. I’m pretty close to having the code working that will let me translate the RNA to a protein sequence or form the complement of a DNA strand. Not rocket science, but I’ve only begun to tap the surface. The framework allows reading sequence files (BLAST, FASTA), edit large sequences (efficiently), do pairwise alignment, and a whole lot more.
If you’re curious, you can compare the above C# code to the original Java code, which comes from the BioJava cookbook.