doug-swisher.net

September 23, 2008

Finding and flipping a DNA sequence with BioSharp

Filed under: Bioinformatics, Software — Tags: — Doug @ 11:06 am

The BioSharp port is still moving forward.  I have enough functionality now to be able to create a DNA sequence, find a subsequence with that sequence, and create a new sequence with the subsequence flipped around.

For example, it can take “aacgaa”, search for “cg”, flip it around, and create the new sequence “aagcaa”.  It would be trivial to do this just by string manipulation; hopefully the investment in the library will be worth it.

Here is a bit of sample code to do the search and flip.

private static void FindAndFlip()
{
    // Create our two bits of DNA
    ISymbolList bigDNA = DNATools.CreateDNA("acgatagatagctacgcatagctagctaagctacgactacgctacgctacg");
    ISymbolList subSequence = DNATools.CreateDNA("agctagctaagct");

    // Find the smaller piece within the larger piece
    KnuthMorrisPrattSearch search = new KnuthMorrisPrattSearch(subSequence);

    int[] results = search.FindMatches(bigDNA);

    if (results.Length == 0)
    {
        Console.WriteLine("subSequence not found!");
        return;
    }

    // Reverse the small piece
    ReverseSymbolList reverseSubSequence = new ReverseSymbolList(subSequence);

    // Make a copy of the big sequence that we can play with...
    ISymbolList reverseBigDNA = new SimpleSymbolList(bigDNA);

    // Overwrite the forward sequence with the reverse...
    Edit edit = new Edit(results[0], subSequence.Length, reverseSubSequence);

    reverseBigDNA.Edit(edit);

    // Print out the results...
    Console.WriteLine("subSequence:        " + subSequence.SeqString);
    Console.WriteLine("reverseSubSequence: " + reverseSubSequence.SeqString);
    Console.WriteLine("bigDNA:             " + bigDNA.SeqString);
    Console.WriteLine("reverseBigDNA:      " + reverseBigDNA.SeqString);
}

Here is the output from this snippet, with the flipped sequence highlighted in red:

subSequence:        agctagctaagct
reverseSubSequence: tcgaatcgatcga
bigDNA:             acgatagatagctacgcatagctagctaagctacgactacgctacgctacg
reverseBigDNA:      acgatagatagctacgcattcgaatcgatcgaacgactacgctacgctacg

Note that this is simply the reverse of the subsegment, and not the reverse compliment.  The reverse compliment would be just as easy to do, though…

1 Comment »

  1. it’s substring not subsequence see http://en.wikipedia.org/wiki/Subsequence#Substring_vs._subsequence

    Comment by hribek — December 21, 2008 @ 4:15 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: