Bioinformatics | doug-swisher.net

October 7, 2008

First BioSharp App – RestrictionFinder

Filed under: Bioinformatics — Tags: RestrictionFinder BioSharp — Doug @ 7:57 am

I’ve started work on my first application that uses BioSharp. It is called RestrictionFinder, and its purpose is to help find a pair of restriction enzymes that give distinct cleavage patterns when an insert is present in a plasmid in the forward direction, reverse direction, or absent. It also has the ability to limit the search to pairs of enzymes with compatible buffers.

Here is a screen shot of the sequence entry form:

Sequence Entry Form

If the sequence contains uppercase and lowercase, the lowercase is assumed to be the insert, and the start/end positions are set automatically.

As part of the solution, I needed a small database (just a file, really) of enzymes and their buffers. I could not find a readily available file for this, so I wrote a small console app that extracts the data from the REBASE web site.

This is still a work in progress, but the source code is checked into the BioSharp SVN repository.

September 26, 2008

BioSharp web site is live

Filed under: Bioinformatics — Tags: biosharp — Doug @ 4:47 pm

I’ve uploaded the first incarnation of the BioSharp web site. It is still a bit thin, but at least the API docs are available.

The next step in the project will be to work on an application that was the whole motivation for BioSharp. As that progresses, I’m sure I’ll be continuing to port bits. According to my port status page, I’m just under 5% done…only 1413 classes left to port. Even at this point, though, the library has some useful functionality, as demonstrated by a few of my earlier posts.

If you are interested in seeing a specific module ported over, don’t hesitate to add a comment here to post in the forums. I’ll also look at getting a mailing list set up sometime soon.

September 23, 2008

Finding and flipping a DNA sequence with BioSharp

Filed under: Bioinformatics, Software — Tags: biosharp — Doug @ 11:06 am

The BioSharp port is still moving forward. I have enough functionality now to be able to create a DNA sequence, find a subsequence with that sequence, and create a new sequence with the subsequence flipped around.

For example, it can take “aacgaa”, search for “cg”, flip it around, and create the new sequence “aagcaa”. It would be trivial to do this just by string manipulation; hopefully the investment in the library will be worth it.

Here is a bit of sample code to do the search and flip.

private static void FindAndFlip()
{
    // Create our two bits of DNA
    ISymbolList bigDNA = DNATools.CreateDNA("acgatagatagctacgcatagctagctaagctacgactacgctacgctacg");
    ISymbolList subSequence = DNATools.CreateDNA("agctagctaagct");

    // Find the smaller piece within the larger piece
    KnuthMorrisPrattSearch search = new KnuthMorrisPrattSearch(subSequence);

    int[] results = search.FindMatches(bigDNA);

    if (results.Length == 0)
    {
        Console.WriteLine("subSequence not found!");
        return;
    }

    // Reverse the small piece
    ReverseSymbolList reverseSubSequence = new ReverseSymbolList(subSequence);

    // Make a copy of the big sequence that we can play with...
    ISymbolList reverseBigDNA = new SimpleSymbolList(bigDNA);

    // Overwrite the forward sequence with the reverse...
    Edit edit = new Edit(results[0], subSequence.Length, reverseSubSequence);

    reverseBigDNA.Edit(edit);

    // Print out the results...
    Console.WriteLine("subSequence:        " + subSequence.SeqString);
    Console.WriteLine("reverseSubSequence: " + reverseSubSequence.SeqString);
    Console.WriteLine("bigDNA:             " + bigDNA.SeqString);
    Console.WriteLine("reverseBigDNA:      " + reverseBigDNA.SeqString);
}

Here is the output from this snippet, with the flipped sequence highlighted in red:

subSequence:        agctagctaagct
reverseSubSequence: tcgaatcgatcga
bigDNA:             acgatagatagctacgcatagctagctaagctacgactacgctacgctacg
reverseBigDNA:      acgatagatagctacgcattcgaatcgatcgaacgactacgctacgctacg

Note that this is simply the reverse of the subsegment, and not the reverse compliment. The reverse compliment would be just as easy to do, though…

Comments (1)

September 1, 2008

Bioinformatics – BioJava and C#

Filed under: Bioinformatics — Tags: biojava, biosharp — Doug @ 10:36 pm

I stumbled across the Bioinformatics Group at the UofM, and realized that I met the president at a birthday party for a mutual friend a few months ago. I may have the opportunity to contribute to a project or two in the coming semester(s), so I started reading a bit about bioinformatics (again).

I went looking for some code, and found a framework called BioPerl, which seems fairly popular. My perl skills have atrophied over the years, and when I found BioJava, I was a bit more excited. It provides a number of useful functions, and seems fairly active. There is also a related database project, BioSQL, that both BioPerl and BioJava (along with BioRuby and BioPython) have incorporated language bindings. BioJava even uses Hibernate as its O/R mapping layer.

Since I like to work in C#, I started playing around with porting BioJava to C#. It’s a huge project, but it’s also a great way to see how BioJava is put together. I’ve managed to get far enough that I can transcribe DNA to RNA using the following code:

        private static void TranscribeDNAtoRNA()
        {
            try
            {
                //make a DNA SymbolList
                ISymbolList symL = DNATools.CreateDNA("atgccgaatcgtaa");

                Console.WriteLine("DNA: " + symL.SeqString);

                symL = DNATools.ToRNA(symL);

                // just to prove it worked
                Console.WriteLine("RNA: " + symL.SeqString);
            }
            catch (IllegalSymbolException ex)
            {
                // this will happen if you try and make the DNA seq using non IUB symbols
                Console.WriteLine(ex);
            }
            catch (IllegalAlphabetException ex)
            {
                // this will happen if you try and transcribe a non DNA SymbolList
                Console.WriteLine(ex);
            }
        }

When run, the output is:

DNA: atgccgaatcgtaa
RNA: augccgaaucguaa

Yup. A few dozen classes and a few hundred lines of code, and I can replace t’s with u’s. Pretty exciting, eh?

Actually, I think it is pretty cool. I’m pretty close to having the code working that will let me translate the RNA to a protein sequence or form the complement of a DNA strand. Not rocket science, but I’ve only begun to tap the surface. The framework allows reading sequence files (BLAST, FASTA), edit large sequences (efficiently), do pairwise alignment, and a whole lot more.

If you’re curious, you can compare the above C# code to the original Java code, which comes from the BioJava cookbook.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

doug-swisher.net