dot plot
DESCRIPTION
Dot Plot. Dot Plot. Goal. We will take two nucleotide base strings and look for common patterns – stretches where the bases match. GAATTCATACCAGATCACCGAAAACTGTCCTCCAAATGTGTCCCCCTCACACTCCCAAAT TCGCGGGCTTCTGCTCTTAGACCACTCTACCCTATTCCCCACACTCACCGGAGCCAAAGC. - PowerPoint PPT PresentationTRANSCRIPT
Goal
• We will take two nucleotide base strings and look for common patterns – stretches where the bases match.
• GAATTCATACCAGATCACCGAAAACTGTCCTCCAAATGTGTCCCCCTCACACTCCCAAAT
• TCGCGGGCTTCTGCTCTTAGACCACTCTACCCTATTCCCCACACTCACCGGAGCCAAAGC
Anatomy of the formula (Part 1)
• =IF(MID($B$1,E$3,$B$4)=MID($B$2,$D4,$B$4),1,0)
• Recall MID takes a string $B$1 is the first base sequence and $B$2 is the second base sequence
• Then MID takes a part of the string beginning at the “second argument”
Anatomy of the formula (Part 2)
• =IF(MID($B$1,E$3,$B$4)=MID($B$2,$D4,$B$4),1,0)
• The starting point varies. • E$3 stays in the third row as the formula is
copied and uses the various numbers 1 through 60 set up in row 3.
• $D4 stays in column D and uses the various numbers 1 through 60 set up in column D.
Anatomy of the formula (Part 3)
• The third argument is the length of the match we seek. They are both the same length.
• If the two “substrings” (base mini sequences) match, output a 1, otherwise a zero.
• Then copy the formula throughout the grid.
We are we looking for?
• In dot plots, one looks for dots (for us colored cells) along diagonals.
• A “long” diagonal means that the mini base sequences within the longer sequence match.
Problem
• We are looking for diagonal matches; however, increasing the length of the match only allows only one of the two diagonal types to survive.