billions and billions of bases how does a biologist maintain a grip on reality?

130

Upload: kristian-bates

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 2: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Billions and Billions of Bases

How does a biologist maintain a grip on reality?

Page 3: Billions and Billions of Bases How does a biologist maintain a grip on reality?

46 chromosomes~3 billion nucleotides

The Human Genome Project

One millionth of total

Page 4: Billions and Billions of Bases How does a biologist maintain a grip on reality?

The Human Genome ProjectTGAGACACATATTTTTGATATTCCAGTTGTTGCAATCGAATGTAAAACATATTTAGATCTTTAAATGTATGGTACATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAACTGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTAGGTTTTGAAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATACCTGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGTTATTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGAATATCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAGGGATTTAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATGGTTTATTCATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTCATGGAAAACGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAGATATTATTAAGAAAACATTTGAAGAACTTGGTTATTTTGTCGAAGTATGGGTTTTAAATGCTGCGGAATATGGCATTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACTAGGTATTCCTAAAAAAACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGATATGAGTATTATACCTGCACTAACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAAGAGCAACCCTATCATTTAAAACCTCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAATGG

AACATTCTGACCGTTTAGTAGAACGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGTAGTGGTAATGGTGAATTATCAAACAAATCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTGTCC

ATCCTTTTCAACATCGAAATTTAACAGCCCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATATAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTAGTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTACAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCAAGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAAGATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTAACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATAGGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCACTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGACAACCTGTTTT

CAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGTCATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAAAATATAAAGTTGATCAAATTTATGTACTACGTCAGCAAAAAAATACTGATAGAGAGTTTAGGTATGAGTCAACTTACATAAAAAAT

Page 5: Billions and Billions of Bases How does a biologist maintain a grip on reality?

The Human Genome ProjectAATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT

Page 6: Billions and Billions of Bases How does a biologist maintain a grip on reality?

The Human Genome Project

Page 7: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 8: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 9: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 10: Billions and Billions of Bases How does a biologist maintain a grip on reality?

A Walk in the Forest

* Photo courtesy of www.webshots.com

Page 11: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Observation

* Photos courtesy of www.webshots.com and Peter Smallwood

Page 12: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Observation

* Photos courtesy of www.webshots.com and Peter Smallwood

Page 13: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Observation

* Photos courtesy of www.webshots.com and Peter Smallwood

Page 14: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Observation

* Photos courtesy of www.webshots.com and Peter Smallwood

Page 15: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Experiment

* Photos courtesy of www.webshots.com and Peter Smallwood

Page 16: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersSquirrel filter

Page 17: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersMolecule filter

Page 18: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersSequence filter

How organism is made

How organism works

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT

CTCCGTAAAC CTCTAAC...

Page 19: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Genetic code Rules of folding

Active site

Page 20: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Active site

Cell interaction

Metabolism,Architecture

Genetic code Rules of folding

Page 21: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Genetic code

Active site

Gives us:

• Custom antibiotics

Genetic code Rules of folding

Page 22: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Gives us:

• Custom antibiotics • Custom antibodies• Custom enzymes• New materials

Genetic code Rules of folding

Active site

Page 23: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Genetic code

Rules of transcriptional and post-transcriptional control

• Begin transcription• End transcription• Splice transcript• Begin translation

ATGACTTATGATCAACGCACAGGGCTA3%

?

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA

Page 24: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Genetic code

Rules of transcriptional and post-transcriptional control

TCTACTTATATTCAATCCACAGGGCTACACCTAGTTCTTGAAGAGTCTGTTGAATGAACACATACATGGTTTATCTGTTTTTCTGTCTGCTCTGACCTCTGGCAGCTT

TAGCCTGCCCCACTCTTAGATAAACGAACCTTAGTGACTTCTGCTATACCAAAGTCTCCACGCCCCTCCGTAAACCTCTAACATGATGTCAGCAAATATTAAAAATGA

97%

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA

?

• Begin transcription• End transcription• Splice transcript• Begin translation

Page 25: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

Natural filters/transformations

• Selective transcription

• Selective processing

• Translation

• Folding

DNA Functional protein

Page 26: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow does Nature do it?

Natural filters/transformations

DNA Functional protein

Simulation of Nature Surrogate Processes

From Sequence to OrganismHow can WE do it?

Page 27: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow can WE do it?

Simulation of Nature

Utterance of W Shakespeare

Utterance of George W Bush

“Whether ‘tis nobler in the mind to suffer the slings and arrows

of outrageous fortune...”

“We must give our military every tool and weapon it needs to prevail...”

???

Page 28: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow can WE do it?

Surrogate Processes

Utterance of W Shakespeare

Utterance of George W Bush

“Whether ‘tis nobler in the mind to suffer the slings and arrows

of outrageous fortune...”

“We must give our military every tool and weapon it needs to prevail...”

Word frequency

Page 29: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow can WE do it?

Surrogate Processes

Utterance of W Shakespeare

Utterance of George W Bush

“Whether ‘tis nobler in the mind to suffer the slings and arrows

of outrageous fortune...”

“We must give our military every tool and weapon it needs to prevail...”

Word frequency, words/sentence…

Page 30: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow can WE do it?

Natural filters/transformations

• Selective transcription

• Selective processing

• Translation

• Folding/function

Surrogate filters

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC

Characteristics of coding sequences/introns

• Gene finders

Predicted coding regionsMy sequence

Page 31: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow can WE do it?

Natural filters/transformations

• Selective transcription

• Selective processing

• Translation

• Folding/function

Surrogate filters• Gene finders

Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Function?

Page 32: Billions and Billions of Bases How does a biologist maintain a grip on reality?

From Sequence to OrganismHow can WE do it?

Natural filters/transformations

• Selective transcription

• Selective processing

• Translation

• Folding/function

Surrogate filters• Gene finders

• Similarity finders

My predicted geneSequence/motif

databases

globin

globin?

Similar genes

Page 33: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Surrogate FiltersGene finders

Start/Stop codon search

CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAATGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA

CT CCA CGC CCC TCC GTA CAC CTC TAA CAT GAT CTC AGC AAA TAT TAA AAA TGA ATA AAC TTT GTG ACA TGT ACA AAT GGA AAT ATG CAA

CTC CAC GCC CCT CCG TAC ACC TCT AAC ATG ATC TCA GCA AAT ATT AAA AAT GAA TAA ACT TTG TGA CAT GTA CAA ATG GAA ATA TGC AAC TCC ACG CCC CTC CGT ACA CCT CTA ACA TGA TCT CAG CAA ATA TTA AAA ATG AAT AAA CTT TGT GAC ATG TAC AAA TGG AAA TAT GCA A

Look for start codons (ATG) (GTG,TTG)

Look for stop codons (TAA,TAG,TGA)

Page 34: Billions and Billions of Bases How does a biologist maintain a grip on reality?

CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAATGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA

TTGCATATTTCCATTTGTACATGTCACAAAGTTTATTCATTTTTAATATTTGCTGAGATCATGTTAGAGGTGTACGGAGGGGCGTGGAG

Surrogate FiltersGene finders

Start/Stop codon search

Look for start codons (ATG) (GTG,TTG)

Look for stop codons (TAA,TAG,TGA)

Highly inaccurate

Page 35: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Surrogate FiltersGene finders

Hidden Markov Model (HMM)-based recognition

Step 1: Create model through extensive training set

AAAAACAAGAATACA . . .TTGTTT

TrainingSet

AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTCATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAACAAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATGACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGATCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT

Page 36: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Step 1: Create model through extensive training set

AAAA: 33%

AAAC: 25%

AAAG: 12%

AAAT: 30%

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

AAAAACAAGAATACA . . .TTGTTT

TrainingSet

AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTCATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAACAAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATGACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGATCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT

Page 37: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Step 1: Create model through extensive training set

AACA: 30%

AACC: 20%

AACG: 15%

AACT: 35%

AAAAACAAGAATACA . . .TTGTTT

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

TrainingSet

AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTCATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAACAAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATGACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGATCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT

Page 38: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Step 2: Assess candidate genes

0.12

A C G TAAA 0.33 0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG 0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20 0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25 0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35

Candidategene

AAAGCAA…

3rd order Markov model

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

Page 39: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Step 2: Assess candidate genes

AAAGCAA…

0.12 x 0.15

3rd order Markov model

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

A C G TAAA 0.33 0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG 0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20 0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25 0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35

Candidategene

Page 40: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Step 2: Assess candidate genes

AAAGCTA…

0.12 x 0.15 . . .

So far, not a good candidate!

3rd order Markov model

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

A C G TAAA 0.33 0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG 0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20 0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25 0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35

Candidategene

Page 41: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Step 2: Assess candidate genes

3rd order Markov model

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

Candidate genes Predicted genes

Page 42: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Predicted genes

Step 2: Assess candidate genes

3rd order Markov model

Surrogate FiltersGene finders

Class 3: Hidden Markov Model (HMM)-based recognition

Candidate genes Predicted genes

Conform to standard modelChallenge

accepted beliefs

Page 43: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Computers are powerful

globin

Highly filtered output • Easy to grasp• High-level insights

Unfiltered output• Confusing• Basic insights

Page 44: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Computers are tempting

Page 45: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Globin

Computers are tempting

Page 46: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Crisis in Bioinformatics

1. Need high-level filters

2. Need access to raw phenomena

3. Need new tools for new phenomena

4. Need intuitive representation of results

Need a new generation

5. Need ability to build new tools

Page 47: Billions and Billions of Bases How does a biologist maintain a grip on reality?

View of the Future

Page 48: Billions and Billions of Bases How does a biologist maintain a grip on reality?

View of the Future Integration of information

ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...

Cell interaction

Metabolism,Architecture

Genetic code Rules of folding

Active site

Page 49: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Prochlorococcus MED4

Prochlorococcus MIT9313

Page 50: Billions and Billions of Bases How does a biologist maintain a grip on reality?

• Gene present in Prochlorococcus MED4 MED4 is naturally adapted to grow in high light.

How do cells control response to light?

• Ortholog absent in Prochlorococcus MIT9313 MIT9313 is naturally adapted to grow in low light

• Ortholog present in Synechocystis PCC 6803 Reason will become apparent in a moment

• Synechocystis PCC 6803 ortholog responds to high light Gene turns on by factor > 2 in response to high light

What genes are related to the adaptation to high light?

Look for:

Page 51: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Build set Display set

Click on Build Set to begin finding orfs with

the desired specifications

HELPSet operation

Page 52: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

All amino acid sequences of

All intergenic regions of

Human-annotated orfs of

Private set

Public set

All open reading frames of

Build set Display set

Choose set type

Goal is to find all open reading frames within Prochlorococcus MED4 that

meet certain specifications, so click on All open reading frames in

CancelHELPSet operation

Page 53: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of Arthrobacter platensisGloeobacter violaceusMicrocystis aeruginosa

Nostoc punctiformeNostoc PCC 7120

Prochlorococcus MED4Prochlorococcus MIT9313

Prochlorococcus S120Synechococcus PCC6301Synechococcus PCC7942

Synechococcus WHSynechocystis PCC 6803Thermosynechococcus

TrichodesmiumUnicellulularFilamentous

All

Prochlorococcus MED4

Build set Display set

Choose set type Choose database

Click on Prochlorococcus MED4

CancelHELPSet operation

Page 54: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of Prochlorococcus MED4

Display set

such that:

Variable Data Operation Function Done

Choose set type Choose database

Build set

You will ask that an ortholog of each desired MED4 genes exists in Synechocystis PCC 6803. It is

convenient to define the ortholog now. Click the Variable button

CancelHELPSet operation

Page 55: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of Prochlorococcus MED4

Display set

such that:

Variable Data

ItemNew variable

Variable

Choose set type Choose database

New variable

Build set

Item refers to the MED4 orf under consideration. You want to define its ortholog

in Synechocystis, so click on New variable

Operation Function Done

CancelHELPSet operation

Page 56: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of Prochlorococcus MED4

Display set

such that:

Variable Data

6803 ortholog

Type variable name

=

Choose set type Choose database

Build set

You can name the variable representing the ortholog anything you

like. For this simulation, a name is provided. Press the Enter key

Operation Function Done

CancelHELPSet operation

Page 57: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of Prochlorococcus MED4

Display set

such that:

Variable Data

6803 ortholog

Type variable name

= Closest ortholog of

Protein product of

Upstream region of

Downstream region of

Ortholog of (item

Choose set type Choose database

Choose function

Build set

One variable can be defined with respect to another in several ways.

The relationship you want is Ortholog of

Operation Function Done

CancelHELPSet operation

Page 58: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

= Ortholog of (item in Arthrobacter platensisGloeobacter violaceusMicrocystis aeruginosa

Nostoc punctiformeNostoc PCC 7120

Prochlorococcus MED4Prochlorococcus MIT9313

Prochlorococcus S120Synechococcus PCC6301Synechococcus PCC7942

Synechococcus WHSynechocystis PCC 6803Thermosynechococcus

Trichodesmium

Choose database

Synechocystis PCC6803

)Choose function

Build set

Clicking on Synechocystis PCC6803 defines the variable 6803 ortholog as

the ortholog in Synechocystis to a given orf of MED4.

6803 ortholog

Type variable name

Operation Function Done

CancelHELPSet operation

Page 59: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Synechocystis PCC 6803

Build set

)

The first limitation on the MED4 orf is that no ortholog of it exists in MIT9313. To evoke the concept of ortholog, press the

Function button

= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Operation Function Done

CancelHELPSet operation

Page 60: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

Click on Ortholog of

Closest ortholog of

Protein product of

Upstream region of

Downstream region of

Ortholog of

Choose function

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Operation Function Done

CancelHELPSet operation

Page 61: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

As always, Item refers to the orf of MED4 that is being defined. You want to specify that an ortholog of it in MIT9313 doesn’t

exist, so click on Item.

Item6803 ortholog

Variable

Item( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Operation Function Done

CancelHELPSet operation

Page 62: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

Clicking on Prochlorococcus MIT9313 defines an ortholog of a MED4 gene in MIT9313 (if such an ortholog exists)

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Arthrobacter platensisGloeobacter violaceusMicrocystis aeruginosa

Nostoc punctiformeNostoc PCC 7120

Prochlorococcus MED4Prochlorococcus MIT9313

Prochlorococcus S120Synechococcus PCC6301Synechococcus PCC7942

Synechococcus WHSynechocystis PCC 6803Thermosynechococcus

Trichodesmium

Choose database

)

Prochlorococcus MIT9313

Operation Function Done

CancelHELPSet operation

Page 63: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

You want to keep only those MED4 genes where an ortholog in MIT9313 does NOT

exist, so click on doesn’t exist.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) =

existsdoesn’t existdoesn’t exist

Op

Operation Function Done

CancelHELPSet operation

Page 64: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

That completes one specification, but there are more. Click on the Operation button to

connect one specification to the next.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

Operation Function Done

CancelHELPSet operation

Page 65: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

You want both the first specification AND the second to be true, so click on AND.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

ANDOR

AND

Op

Operation Function Done

CancelHELPSet operation

Page 66: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

The second specification is that microarray data for the 6803 ortholog meets a certain criterion. To get at that

data, press the Data button

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

Operation Function Done

CancelHELPSet operation

Page 67: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

The data you want is for the 6803 ortholog. Click on 6803

ortholog.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for ( Item6803 ortholog

New variable

Variable

6803 orthologin

Operation Function Done

CancelHELPSet operation

Page 68: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

Choose the Hihara experiment, which measured expression changes upon shift from low light to high light. If you didn’t

know which experiment was appropriate, you could have clicked on Choose data set for a description of the choices

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for ( 6803 ortholog

Variable

in Microarray:Hihara1(6803)Microarray:Suzuki1(6803)

Microarray:Yoshimura1(6803)Microarray:Meeks(Npun)Microarray:Golden(7120)

Choose data set

Microarray:Hihara1(6803) )

Operation Function Done

CancelHELPSet operation

High light vs low light experiment

Page 69: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

You want the ratio of experimental condition to control to exceed a

specified value. Click on >.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for (

Variable

in Microarray:Hihara1(6803)

Choose data set

) << or =

=> or =

>>

Op

6803 ortholog

Operation Function Done

CancelHELPSet operation

Page 70: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

You can type in the value you want. For this simulation a number is supplied. Press the Enter key.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for (

Variable

in Microarray:Hihara1(6803)

Choose data set

) >

Op Value

]+26803 ortholog

Operation Function Done

CancelHELPSet operation

Page 71: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

No more specifications. Press the Done button.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for (

Variable

in Microarray:Hihara1(6803)

Choose data set

) >

Op Value

]+26803 ortholog

Operation Function Done

CancelHELPSet operation

Page 72: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

This was a complicated search. If you wanted to do it again, you could save the

search description. In this case, just save the results by clicking on Save only results.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for (

Variable

in Microarray:Hihara1(6803)

Choose data set

) >

Op Value

]+26803 ortholog

Save results and scriptSave only resultsSave only results

Operation Function Done

CancelHELPSet operation

Page 73: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

All MED4 genes meeting the given specifications will be collected into a set. You can name the set anything you want. For this

simulation, a name is provided. Press the Enter key.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for (

Variable

in Microarray:Hihara1(6803)

Choose data set

) >

Op Value

]+2

Light-specific genes

Type name of set

6803 ortholog

Operation Function Done

CancelHELPSet operation

Page 74: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Build set Display set

:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus

:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus

:all0688 hupS [NiFe] uptake hydrogenase small subunit

:alr0692 similar to nifU

:alr0874 nifH2 dinitrogenase reductase

:asr1309 similar to nifU

:alr1407 nifV1 homocitrate synthase

:asr1408 nifZ iron-sulfur cofactor synthesis

:asr1408 nifT

Set: Light-specific genes

ProcMed4:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus

ProcMed4:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus

ProcMed4:all0688 hupS [NiFe] uptake hydrogenase small subunit

ProcMed4:alr0692 similar to nifU

ProcMed4:alr0874 psbBX dinitrogenase reductase

ProcMed4:asr1309 similar to nifU

ProcMed4:alr1407 psbY1 homocitrate synthase

ProcMed4:asr1408 psbX iron-sulfur cofactor synthesis

ProcMed4:asr1408 nifT

• The results are displayed as a list of orfs (Of course, the search capabilities do not now exist, and the results of the described search are unknown)• Clicking on the name of any orf brings you to its page (see Scenarios 1 and 2).• Clicking on circles next to the orf names allows you to modify the set.• The genetic neighborhood of each orf is shown to the right.

DoneHELPSet operation

[WARNING: Fantasy filtration not in effect!]

Page 75: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Prochlorococcus MED4: pll1290

Replicon: Chromosome

Coordinates: 1533026 (stop) <- 1533931 (start-TTG) Human Length = 301 amino acids

Strand: Complementary

Gene name(s): proXM

Function: Putative type II DNA cytosine methyltransferase (CAGCTG-specific) Human Classification: Type II beta (N4) Human

Activity: Protects against: PvuII Experiment In vivo activity: exists Experiment

Cyanobacterial orthologs: none

ProcMED4

Proteus vulgaris

Salmonella paratyphi

Streptomyces spectabilis

OptionsAnnotateMain Menu History

More

A

A

A

A

A

HELP

[WARNING: Fantasy filtration not in effect!]

Page 76: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

This was a complicated search. If you wanted to do it again, you could save the

search description. In this case, just save the results by clicking on Save only results.

Item

Variable

( in

Synechocystis PCC 6803 )= Ortholog of (item inChoose databaseChoose function

6803 ortholog

Type variable name

Ortholog of

Choose function

Prochlorococcus MIT9313

Choose database

) doesn’t exist

Op

AND

Op

[

data for (

Variable

in Microarray:Hihara1(6803)

Choose data set

) >

Op Value

]+26803 ortholog

Save results and scriptSave only results

Save results and script

Operation Function Done

CancelHELPSet operation

Page 77: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Equivalent script that bypasses interface

FOR orf IN (orfs:ProcMED4) { 6803ortholog = Ortholog(orf,orfs:Syny6803); WHEN (NOT Exists(Ortholog(orf,orfs:Proc9313)) AND Data(6803ortholog,microarray:Hihara1) > +2){ COLLECT orf INTO light_specific_genes; }}DISPLAY (light_specific_genes, “BNC”);

or

MAIL (light_specific_genes,[email protected],“BNC”);

The same search could have been conducted through the script shown above. The script interface makes possible complex

searches beyond the scope of the graphical interface.

Page 78: Billions and Billions of Bases How does a biologist maintain a grip on reality?

All items in All open reading frames of

Choose set type

Prochlorococcus MED4

Choose database

Display set

such that:

Variable Data

Build set

Operation Function Done

CancelHELPSet operation HELP

???

Page 79: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Cyanobacterial Knowledge BaseVirtual Help Desk

How to search for

data?

How to build a

new filter?

Page 80: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Cyanobacterial Knowledge BaseVirtual Help Desk

How to......I don’t know!

Virtual Help Desk Staff

HELP

Page 81: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Cyanobacterial Knowledge BaseVirtual Help Desk

Upper echelons Staff

You

Virtual Help Desk Staff

HELP

Page 82: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Billions and Billions of Bases

How does a biologist maintain a grip on sanity?reality?

Page 83: Billions and Billions of Bases How does a biologist maintain a grip on reality?

View of the Future Interplay of low- & high-level perception

ProcMED4

Proteus vulgaris

Salmonella paratyphi

Streptomyces spectabilis

Page 84: Billions and Billions of Bases How does a biologist maintain a grip on reality?

View of the Future Interplay of low- & high-level perception

Anab7120

Proteus vulgaris

Salmonella paratyphi

Streptomyces spectabilis

TCTACTTATATTCAATCCACAGGGCTACACCTAGTTCTTGAAGAGTCTGTTGAATGAACACATACATGGTTTATCTGTTTTTCTGTCTGCTCTGACCTCTGGCAGCTT

TAGCCTGCCCCACTCTTAGATAAACGAACCTTAGTGACTTCTGCTATACCAAAGTCTCCACGCCCCTCCGTAAACCTCTAACATGATGTCAGCAAATATTAAAAATGA

97%

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA

Page 85: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Anabaena Chromosome (6413771 bp): 4001 to 5000

cgcccaacaataacaaatgtgtaatctagaccttctgccttgagttccttggcgcggttttcggcacgacggatgacgttggtattgtaaccgccgcacaaaccacgatcgccagaaataactagcaagcctactgatttaacttcccgttttttcagtagaggtaagtctacatcttcaaaccgtagacgagtttgcaaaccgtataatacttgtgccaaacggtcagcaaaaggacgagtagcgattacttgttcttgggcgcgacgtacacgcgccgccgctaccagccgcatggcttctgtgattttcttggtgtttttgaccgactgaatgcgatcgcgtattgatttgagattaggcataatatttgttgattgtcagttgtcagttgtcagttgtcagttgtcagtgtctattgctactgaccactgaccaatgactaatgactaattacgctgtagctttgaaggtctttttgtagtcttctaaagctgccttcaatgctttttcttcatcatcacccagtgctttcttcgattgtacgtcttggaagtaggggttaacgccggacttcaagtaatctctcaagcctttggtgaaggtggtgactttatcaacagggatatcatctaagtaaccgttgatacctgcgtacagaatggctacttgttcagctacggatagaggctgattttgggactgtttgaggagttcccgcaggcgttgacctcttgccaattggtcttgggtggctttatctaggtcggaagcaaattgcgcgaaggcttggaggtcgtcaaactgtgctagttcgagcttaatcttaccagcaacttttttcatcgctttggtttgtgccgcagaacccacacgggatacagagataccagggtttacagccggacgaataccagcgttaaataagtcagaagataagaatatctgaccgtctgtaatagaaattacgttggtaggaatgtaggcagaaacgtcacca

Typical output of current programs

Page 86: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Future: Sequence plus genetic context

Noncoding region

Page 87: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Future: Both filtered and raw data

Page 88: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Future: Both filtered and raw data

Page 89: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersBuild filter to find repeated sequences

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT

TGGTCTCCGACCGACCGTAGGTCATCGTGGTCTCCGACCGACCGTAGGTCATCG

CTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGTAGCCGAAGTAGTTCGACTGAGCGTAGTCGAAGTC

...

Repeat filter

Entire genome Repeated sequences

Page 90: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 91: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersBuild repeats filter

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT

TGGTCTCCGACCGACCGTAGGTCATCGTGGTCTCCGACCGACCGTAGGTCATCG

CTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGTAGCCGAAGTAGTTCGACTGAGCGTAGTCGAAGTC

...

Repeat filter

Entire genome Repeated sequences

NIS-1: repeat family

Page 92: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Alignment of NIS-1

(…271 more)

Page 93: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersBuild secondary repeats filter

A: CTTGTACTGAGCGAAGTCGAAGTAB: CTTGTACTGAGCGTAGCCGAAGTA

Distance = 2

CTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTA ...CTTGTACTGAGCGAAGTCGAAGTA

Copy number = 10

Subfamily A

CTTGTACTGAGCGTAGCCGAAGTACTTGTACTGAGCGTAGCCGAAGTA

Copy number = 2

Subfamily B

GTTCGACTGAGCGTAGTCGAAGTC

Copy number = 1

Subfamily C

Page 94: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersBuild secondary repeats filter

Distance = 2

A: CTTGTACTGAGCGAAGTCGAAGTAC: GTTCGACTGAGCGTAGTCGAAGTC

Distance = 5

CTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTA ...CTTGTACTGAGCGAAGTCGAAGTA

Copy number = 10

Subfamily A

CTTGTACTGAGCGTAGCCGAAGTACTTGTACTGAGCGTAGCCGAAGTA

Copy number = 2

Subfamily B

GTTCGACTGAGCGTAGTCGAAGTC

Copy number = 1

Subfamily C

Page 95: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Filters: Information reducersBuild secondary repeats filter

B: CTTGTACTGAGCGTAGCCGAAGTAC: GTTCGACTGAGCGTAGTCGAAGTC

Distance = 5Distance = 5Do for all pairs of subfamilies

CTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTACTTGTACTGAGCGAAGTCGAAGTA ...CTTGTACTGAGCGAAGTCGAAGTA

Copy number = 10

Subfamily A

CTTGTACTGAGCGTAGCCGAAGTACTTGTACTGAGCGTAGCCGAAGTA

Copy number = 2

Subfamily B

GTTCGACTGAGCGTAGTCGAAGTC

Copy number = 1

Subfamily C

Distance = 2

Page 96: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Diameter Copies of exact repeats

Distance Number of mismatches

Relationship between related repeats in genome(sequences within NIS-1 repeat family)

Page 97: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Crisis in Bioinformatics

1. Need high-level filters2. Need access to raw phenomena

Integrated knowledge base

Page 98: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Crisis in Bioinformatics

1. Need high-level filters2. Need access to raw phenomena

3. Need new tools for new phenomena4. Need intuitive representation of results

Integrated knowledge base

Tools that bridge levels of perception

Page 99: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Crisis in Bioinformatics

1. Need high-level filters2. Need access to raw phenomena

3. Need new tools for new phenomena4. Need intuitive representation of results

Long term: Need a new generation

5. Need ability to build new tools

Integrated knowledge base

Tools that bridge levels of perception

Short term: Graphical programming Human help

Page 100: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Billions and Billions of Bases

How does a biologist maintain a grip on reality?

Filtering reality Raw reality

Real questions with real answers

Page 101: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

Page 102: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

Page 103: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

Page 104: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

Page 105: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

Page 106: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

Page 107: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

How do we figure out how cars are made?

Genetic approach Biochemical approach

Page 108: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular BiologyGeneticist’s Approach

Page 109: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular BiologyGeneticist’s Approach

Page 110: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Isolation of Defective Gene

Pre-genomic Molecular BiologyGeneticist’s Approach

Page 111: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

How do we figure out how cars are made?

Genetic approach Biochemical approach

Page 112: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular BiologyBiochemist’s Approach

Page 113: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular BiologyBiochemist’s Approach

Page 114: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular BiologyBiochemist’s Approach

Page 115: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular BiologyBiochemist’s Approach

Page 116: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Pre-genomic Molecular Biology

How do we figure out how cars are made?

Genetic approach Biochemical approach

Page 117: Billions and Billions of Bases How does a biologist maintain a grip on reality?

• One component at a time

• Highly filtered perception

• Many local viewpoints

Pre-genomic Molecular BiologyHow we viewed the world

Page 118: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Post-genomic Molecular Biology

Page 119: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Post-genomic Molecular BiologyBioinformaticist’s Approach

(long term)

Assemble the whole

Page 120: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Post-genomic Molecular BiologyBioinformaticist’s Approach

(short term)

Identify critical parts

Page 121: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Globin

Current Biology

Page 122: Billions and Billions of Bases How does a biologist maintain a grip on reality?

AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT

Future Biology

Page 123: Billions and Billions of Bases How does a biologist maintain a grip on reality?

AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT

Future Biology

Page 124: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Globin

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT

Current BiologyCurrent Life

Page 125: Billions and Billions of Bases How does a biologist maintain a grip on reality?

“Axis of Evil...”

Current Life

Page 126: Billions and Billions of Bases How does a biologist maintain a grip on reality?

“No war for oil...”Globin

Current Life

Page 127: Billions and Billions of Bases How does a biologist maintain a grip on reality?

“No war for oil...”Globin

Current Life

Page 128: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 129: Billions and Billions of Bases How does a biologist maintain a grip on reality?
Page 130: Billions and Billions of Bases How does a biologist maintain a grip on reality?

Contact Information

Jeff ElhaiDepartment of BiologyVirginia Commonwealth UniversityRichmond, VA

E-Mail: [email protected]: 804-828-0794Web: www.people.vcu.edu/~elhaij/