CSC334 Lab3
<meta name="keywords" content="computer science, bioinformatics, DNA, CSC334, Lab" /> <meta name="description" content="DNA Sequence Logo Lab" /> <meta name="title" content="Bioinformatics Lab" /> <meta name="abstract" content="DNA Alignment" /> <meta name="author" content="thiebaut at cs.smith.edu" />
Contents
- 1 DNA Sequence Alignment with Proce55ing--A First Approach
- 1.1 Methodology
- 1.1.1 Step 1: Display in the Status Box
- 1.1.2 Step 2: Display a vertical bar between exactly matching symbols
- 1.1.3 Step 3: Translate DNA Sequence 1 when the user presses the key 1
- 1.1.4 Step 4: Translate DNA Sequence 2 when the user presses the key 2
- 1.1.5 Step 5: Keep track of longest subsequence of exactly matching symbols
- 1.1.6 Step 6: Return automatically to best match found (Optional)
- 1.1.7 Step 7: Animation: make the program shift both sequences and compute the best exact alignment
- 1.2 Resources and Links
- 1.3 Solution program
- 1.1 Methodology
DNA Sequence Alignment with Proce55ing--A First Approach
In this lab you will use a DNA alignment program written in Processing to explore the concept of aligning DNA sequences.
You may want to install Processing on your machine if it is not installed yet, and run through the very good tutorial listed in the resource section at the end to understand the basics of Processing. In this lab, however, we won't use any sophisticated 2- or 3-D graphics. Just moving text.
Methodology
- Open the Processing edit window.
- Copy and paste the following program into the edit window: DNA_Align.pde .
- Save your program as DNA_Align.pde
- Create the font needed for the program. Click on Tools, Create Font, and select Monaco, with a size of 12 points.
This will create the file Monaco-12.vlw in the data folder of your Processing sketchbook. If you cannot find the font Monaco on your computer, select a monospace font such as Prestige Elite, or Courier.
- Run the program.
- Type + or - to make the sequences move left and right.
- Notice that the number of exact matches is printed in the Processing console.
Step 1: Display in the Status Box
Instead of displaying the number of exact matches in the console, let's make it appear in the status box.
The status box contains 5 lines. To display the value of an integer variable, say count on the first line of the status box, we simply need to insert the following statement in the right location:
status.print( 0, "count = " + str( count ) );
0 indicates the first line in the box, and str( count ) transforms the contents of count into a string which is appended to "count = " to create the correct output.
Go ahead and locate the statement that prints the match counter in the console, and replace it by a statement similar to the one above.
Step 2: Display a vertical bar between exactly matching symbols
The DNAString class can be instructed to display a bar below any of its symbol. All it needs is the index of the symbol. For example, if we want a bar under the symbol in position 3 of dna1, we would write:
dna1.setLeg( 3 );
Go ahead and put a bar between each matching symbol of dna1 and dna2. Your sequences should look something like this:
Step 3: Translate DNA Sequence 1 when the user presses the key 1
Since a strand of DNA always has a complementary strand where the thymine(T) is always facing an adenine (A), and a guanine (G) always a cytosine (C), it is possible that our dna1 sequence might match better if its complementary strand is used with dna2, and conversely for dna2.
Modify the keyPressed() function so that if the user types 1 on the keyboard, DNA Sequence dna1 is transformed in its complementary strand.
You will need to add a method to the DNAString class which will transpose all the Ts into As and all the Gs into Cs, and conversely.
Make sure that when the user presses 1, a match is performed right after the transposition of nucleotides.
The + and - key should still work the same and shift the strands left and right.
Step 4: Translate DNA Sequence 2 when the user presses the key 2
Similarly, modify the program so that when the user types 2, dna2 is transformed in its complementary sequence.
Step 5: Keep track of longest subsequence of exactly matching symbols
It's not the total number of exact matches that bioinformaticiens are after, in general. It is the longest sequence of matching symbols that can be found.
Modify the program so that it displays the longest sequence of consecutive matching symbols in the status box, on Line 2.
Step 6: Return automatically to best match found (Optional)
If you are ambitious, add a new key that the program can recognize, say B for Best, which automatically shifts the sequences in the position that yields the longest sequence recorded so far.
Step 7: Animation: make the program shift both sequences and compute the best exact alignment
If you are even more ambitious, modify the program and fill in the draw() function so that the sequences automatically shift back and forth and get matched at each step, and finally stop moving, adopting the position that yields the longest sequence of consecutive matching symbols.
Resources and Links
- A good tutorial on Processing can be found here File:ProcessingTutorial.pdf
- The main page for syntax help on Processing is processing.org/reference. A quick way to find information on a given topic in Processing, say on rectangles, is to enter something like this in the Google search bar: site:processing.org rectangle.
Solution program
Back to CSC334 Lab Page
© D. Thiebaut 2008