doug-swisher.net

September 23, 2008

Finding and flipping a DNA sequence with BioSharp

Filed under: Bioinformatics, Software — Tags: — Doug @ 11:06 am

The BioSharp port is still moving forward.  I have enough functionality now to be able to create a DNA sequence, find a subsequence with that sequence, and create a new sequence with the subsequence flipped around.

For example, it can take “aacgaa”, search for “cg”, flip it around, and create the new sequence “aagcaa”.  It would be trivial to do this just by string manipulation; hopefully the investment in the library will be worth it.

Here is a bit of sample code to do the search and flip.

private static void FindAndFlip()
{
    // Create our two bits of DNA
    ISymbolList bigDNA = DNATools.CreateDNA("acgatagatagctacgcatagctagctaagctacgactacgctacgctacg");
    ISymbolList subSequence = DNATools.CreateDNA("agctagctaagct");

    // Find the smaller piece within the larger piece
    KnuthMorrisPrattSearch search = new KnuthMorrisPrattSearch(subSequence);

    int[] results = search.FindMatches(bigDNA);

    if (results.Length == 0)
    {
        Console.WriteLine("subSequence not found!");
        return;
    }

    // Reverse the small piece
    ReverseSymbolList reverseSubSequence = new ReverseSymbolList(subSequence);

    // Make a copy of the big sequence that we can play with...
    ISymbolList reverseBigDNA = new SimpleSymbolList(bigDNA);

    // Overwrite the forward sequence with the reverse...
    Edit edit = new Edit(results[0], subSequence.Length, reverseSubSequence);

    reverseBigDNA.Edit(edit);

    // Print out the results...
    Console.WriteLine("subSequence:        " + subSequence.SeqString);
    Console.WriteLine("reverseSubSequence: " + reverseSubSequence.SeqString);
    Console.WriteLine("bigDNA:             " + bigDNA.SeqString);
    Console.WriteLine("reverseBigDNA:      " + reverseBigDNA.SeqString);
}

Here is the output from this snippet, with the flipped sequence highlighted in red:

subSequence:        agctagctaagct
reverseSubSequence: tcgaatcgatcga
bigDNA:             acgatagatagctacgcatagctagctaagctacgactacgctacgctacg
reverseBigDNA:      acgatagatagctacgcattcgaatcgatcgaacgactacgctacgctacg

Note that this is simply the reverse of the subsegment, and not the reverse compliment.  The reverse compliment would be just as easy to do, though…

Blog at WordPress.com.