biomedical informatics michael d. kane, ph.d.. the cell is a living machine

16
Biomedical Informatics Michael D. Kane, Ph.D.

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

Biomedical Informatics

Michael D. Kane, Ph.D.

Page 2: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

The Cell is a Living Machine

Page 3: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

DNA is Information Storage

Page 4: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

“Zipped Files”

Decompression

“Executable Files”

Page 5: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

DNA is Double Stranded – One strand is the “coding strand” and the other strand is there to stabilize the DNA sequence when not in use. Double-stranded DNA is very durable in our environment.

Page 6: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

CAGGACCATGGAACTCAGCGTCCTCCTCTTCCTTGCACTCCTCACAGGACTCTTGCTACTCCTGGTTCAGCGCCACCCTAACACCCATGACCGCCTCCCACCAGGGCCCCGCCCTCTGCCCCTTTTGGGAAACCTTCTGCAGATGGATAGAAGAGGCCTACTCAAATCCTTTCTGAGGTTCCGAGAGAAATATGGGGACGTCTTCACGGTACACCTGGGACCGAGGCCCGTGGTCATGCTGTGTGGAGTAGAGGCCATACGGGAGGCCCTTGTGGACAAGGCTGAGGCCTTCTCTGGCCGGGGAAAAATCGCCATGGTCGACCCATTCTTCCGGGGATATGGTGTGATCTTTGCCAATGGAAACCGCTGGAAGGTGCTTCGGCGATTCTCTGTGACCACTATGAGGGACTTCGGGATGGGAAAGCGGAGTGTGGAGGAGCGGATTCAGGAGGAGGCTCAGTGTCTGATAGAGGAGCTTCGGAAATCCAAGGGGGCCCTCATGGACCCCACCTTCCTCTTCCAGTCCATTACCGCCAACATCATCTGCTCCATCGTCTTTGGAAAACGATTCCACTACCAAGATCAAGAGTTCCTGAAGATGCTGAACTTGTTCTACCAGACTTTTTCACTCATCAGCTCTGTATTCGGCCAGCTGTTTGAGCTCTTCTCTGGCTTCTTGAAATACTTTCCTGGGGCACACAGGCAAGTTTACAAAAACCTGCAGGAAATCAATGCTTACATTGGCCACAGTGTGGAGAAGCACCGTGAAACCCTGGACCCCAGCGCCCCCAAGGACCTCATCGACACCTACCTGCTCCACATGGAAAAAGAGAAATCCAACGCACACAGTGAATTCAGCCACCAGAACCTCAACCTCAACACGCTCTCGCTCTTCTTTGCTGGCACTGAGACCACCAGCACCACTCTCCGCTACGGCTTCCTGCTCATGCTCAAATACCCTCATGTTGCAGAGAGAGTCTACAGGGAGATTGAACAGGTGATTGGCCCACATCGCCCTCCAGAGCTTCATGACCGAGCCAAAATGCCATACACAGAGGCAGTCATCTATGAGATTCAGAGATTTTCCGACCTTCTCCCCATGGGTGTGCCCCACATTGTCACCCAACACACCAGCTTCCGAGGGTACATCATCCCCAAGGACACAGAAGTATTTCTCATCCTGAGCACTGCTCTCCATGACCCACACTA

Page 7: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

THEREDCAT_HSDKLSD_WASNOTHOTBUT_WKKNASDNKSAOJ.ASDNALKS_WASWET_ASDFLKSDOFIJEIJKNAWDFN_ANDMAD_WERN.JSNDFJN_YETSAD_MNSFDGPOIJD_BUTTHEFOX_SDKMFIDSJIR.JER_GOTWET_JSN.DFOIAMNJNER_ANDATEHIM.

Page 8: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

Start with a thin 2 x 4 lego block…

Add a 2 x 2 lego block…

Add a 2 x 3 lego block…

Add a 2 x 4 lego block…

Page 9: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine
Page 10: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

What are the comparative genome sizes of humans and other organisms being studied?

organism estimated sizeestimated

gene number

average gene densitychromo-some

number

Homo sapiens(human)

2900 million bases ~30,000 1 gene per 100,000 bases 46

Rattus norvegicus(rat)

2750 million bases ~30,000 1 gene per 100,000 bases 42

Mus musculus (mouse)

2500 million bases ~30,000 1 gene per 100,000 bases 40

Drosophila melanogaster(fruit fly)

180 million bases 13,600 1 gene per 9,000 bases 8

Arabidopsis thaliana(plant)

125 million bases 25,500 1 gene per 4000 bases 5

Caenorhabditis elegans(roundworm)

97 million bases 19,100 1 gene per 5000 bases 6

Saccharomyces cerevisiae(yeast)

12 million bases 6300 1 gene per 2000 bases 16

Escherichia coli(bacteria)

4.7 million bases 3200 1 gene per 1400 bases 1

H. influenzae (bacteria)

1.8 million bases 1700 1 gene per 1000 bases 1

Genome size does not correlate with evolutionary status, nor is the number of genes proportionate with genome size.

Page 11: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

>gi|1924939|emb|X98411.1|HSMYOSIE Homo sapiens partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTCTATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGCAGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAACTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAGGCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACAAGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCACCATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGCGCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCAGCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTTCCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGCTCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCAAGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATACCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTGACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCCAGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCTCCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATCCAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGAGGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGAGGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCCATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGCGAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCTCAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGACAGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCTTCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGGCGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGATGGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTAAACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGGGGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCACAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCAACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGGGCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGATGTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGGAAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTGGGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGGGAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCTGGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCCTCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAAGAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTGGGGAGGGGGGGCCGGAATCCGC

FASTAFileFormat

Page 12: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

>gi|1924939|emb|X98411.1|HSMYOSIE Homo sapiens partial mRNA for myosin-IF CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGTGACCC TCAACGTGGAGCAGGCAGCCTACACCCGTGATGCCCTGGCCAAGGGGCTCTATGCCCGCCTCTTCGACTT CCTCGTGGAGGCCATCAACCGTGCTATGCAGAAACCCCAGGAAGAGTACAGCATCGGTGTGCTGGACATT TACGGCTTCGAGATCTTCCAGAAAAATGGCTTCGAGCAGTTTTGCATCAACTTCGTCAATGAGAAGCTGC AGCAAATCTTTATCGAACTTACCCTGAAGGCCGAGCAGGAGGAGTATGTGCAGGAAGGCATCCGCTGGAC TCCAATCCAGTACTTCAACAACAAGGTCGTCTGTGACCTCATCGAAAACAAGCTGAGCCCCCCAGGCATC ATGAGCGTCTTGGACGACGTGTGCGCCACCATGCACGCCACGGGCGGGGGAGCAGACCAGACACTGCTGC AGAAGCTGCAGGCGGCTGTGGGGACCCACGAGCATTTCAACAGCTGGAGCGCCGGCTTCGTCATCCACCA CTACGCTGGCAAGGTCTCCTACGACGTCAGCGGCTTCTGCGAGAGGAACCGAGACGTTCTCTTCTCCGAC CTCATAGAGCTGATGCAGTCCAGTGACCAGGCCTTCCTCCGGATGCTCTTCCCCGAGAAGCTGGATGGAG ACAAGAAGGGGCGCCCCAGCACCGCCGGCTCCAAGATCAAGAAACAAGCCAACGACCTGGTGGCCACACT GATGAGGTGCACACCCCACTACATCCGCTGCATCAAACCCAACGAGACCAAGCACGCCCGAGACTGGGAG GAGAACAGAGTCCAGCACCAGGTGGAATACCTGGGCCTGAAGGAAAACATCAGGGTGCGCAGAGCCGGCT TCGCCTACCGCCGCCAGTTCGCCAAATTCCTGCAGAGGTATGCCATTCTGACCCCCGAGACGTGGCCGCG GTGGCGTGGGGACGAACGCCAGGGCGTCCAGCACCTGCTTCGGGCGGTCAACATGGAGCCCGACCAGTAC CAGATGGGGAGCACCAAGGTCTTTGTCAAGAACCCAGAGTCGCTTTTCCTCCTGGAGGAGGTGCGAGAGC GAAAGTTCGATGGCTTTGCCCGAACCATCCAGAAGGCCTGGCGGCGCCACGTGGCTGTCCGGAAGTACGA GGAGATGCGGGAGGAAGCTTCCAACATCCTGCTGAACAAGAAGGAGCGGAGGCGCAACAGCATCAATCGG AACTTCGTCGGGGACTACCTGGGGCTGGAGGAGCGGCCCGAGCTGCGTCAGTTCCTGGGCAAGAAGGAGC GGGTGGACTTCGCCGATTCGGTCACCAAGTACGACCGCCGCTTCAAGCCCATCAAGCGGGACTTGATCCT GACGCCCAAGTGTGTGTATGTGATTGGGCGAGAGAAGATGAAGAAGGGACCTGAGAAAGGTCCAGTGTGT GAAATCTTGAAGAAGAAATTGGACATCCAGGCTCTGCGGGGGGTCTCCCTCAGCACGCGACAGGACGACT TCTTCATCCTCCAAGAGGATGCCGCCGACAGCTTCCTGGAGAGCGTCTTCAAGACCGAGTTTGTCAGCCT TCTGTGCAAGCGCTTCGAGGAGGCGACGCGGAGGCCCCTGCCCCTCACCTTCAGCGACACACTACAGTTT CGGGTGAAGAAGGAGGGCTGGGGCGGTGGCGGCACCCGCAGCGTCACCTTCTCCCGCGGCTTCGGCGACT TGGCAGTGCTCAAGGTTGGCGGTCGGACCCTCACGGTCAGCGTGGGCGATGGGCTGCCCAAGAACTCCAA GCCTACCGGAAAGGGATTGGCCAAGGGTAAACCTCGGAGGTCGTCCCAAGCCCCTACCCGGGCGGCCCCT GGCGCCCCCCAAGGCATGGATCGAAATGGGGCCCCCCTCTGCCCACAGGGGGGGGCCCCCTGCCCCCTGG AGAAATTCATTTGGCCCAGGGGGCACCCACAGGCCTCCCCGGCCCTCCGTCCACATCCCTGGGATGCCAG CAGACGACCCCGGGCACGTCCGCCCTCAGAGCACAACACAGAATTCCTCAACGTGCCTGACCAGGGGATG GCCGGCATGCAGAGGAAGCGCAGCGTGGGGCAACGGCCAGTGCCTGTGGGCCGACCCAAGCCCCAGCCTC GGACACATGGTCCCAGGTGCCGGGCCCTATACCAGTACGTGGGCCAAGATGTGGACGAGCTGAGCTTCAA CGTGAACGAGGTCATTGAGATCCTCATGGAAGATCCCTCGGGCTGGTGGAAGGGCCGGCTTCACGGCCAG GAGGGCCTTTTCCCAGGAAACTACGTGGAGAAGATCTGAGCTGGGCCCTGGGATACTGCCTTCTCTTTCG CCCGCCTATCTGCCTGCCGGCCTGGTGGGGAGCCAGGCCCTGCCAATGAAAGCCTCGTTTACCTGGGCTG CAATAGCCTAAAAGTCCAATCCTTTGGCCTCCAGTCCTTGCCCAGGCCCTGGGTCACCAGGTCACTGGTG CAGCCCCCGCCCCTGGGCCCTGGTTTTCCTCCAACATCACACCTGCTGCCCATTGTCCAAAACTGTGTGT GTCAAAGGGGACTAACAGCAGAATTTACCTCCCAACTGCCATGTGATTAAGAAATGGGTCTTGAGTCCTG TGCTGTTGGCAAAGTTCCAGGCACAGTTGGGGAGGGGGGGCCGGAATCCGC

FASTAFileFormat

Page 13: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

FASTA File Format…(note: U = T)

>gi|1234|my name from genetic code in DNAATGATTTGTCACGCTGAGCTC-AAAGCTAACGAGTAA

>gi|1234|my name translated into proteinMICHAEL-KANE*

A alanine P prolineB aspartate Q glutamineC cystine R arginineD aspartate S serineE glutamate T threonineF phenylalanine U selenocysteineG glycine V valineH histidine W tryptophanI isoleucine Y tyrosineK lysine Z glutamineL leucine X anyM methionine “*” translation stopN asparagine “-” gap of indeterminate

length

Page 14: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

Where do we get DNA sequence information?

DNA Sequencing Methods-conversion of biological/bioanalytical data into sequence information

There are automated, high-throughput sequencing centers that COMPLETELY automate (robotics and information systems) DNA sequencing, preliminary identification and publishing.

Page 15: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

A G C T

5’-AAACCAGGCCGATAAGGTACTACACGAAAAAAA-3’

dATPdCTPdTTPdGTP

+ddATP32

ddCTP32

ddTTP32

ddGTP32 TTTGGTCCGGCTATTCCATGATGTGCTTTTTTTTTGGTCCGGCTATTCCATGATGTGCTTTTTTT

TGGTCCGGCTATTCCATGATGTGCTTTTTTTGGTCCGGCTATTCCATGATGTGCTTTTTTT

GTCCGGCTATTCCATGATGTGCTTTTTTTTCCGGCTATTCCATGATGTGCTTTTTTT

CCGGCTATTCCATGATGTGCTTTTTTTCGGCTATTCCATGATGTGCTTTTTTT

GGCTATTCCATGATGTGCTTTTTTTGCTATTCCATGATGTGCTTTTTTT

CTATTCCATGATGTGCTTTTTTTTATTCCATGATGTGCTTTTTTT

ATTCCATGATGTGCTTTTTTT

Step 1. Extend complementary sequence using “free” nucleotides with limiting amounts of radioactive “terminating” nucleotides.

Step 2. Run product out on a electrophoresis gel.

Step 3. Place gel against radiographic film, develop.

TTTTTTT

AAACCAGGCCGATAAGGTACTACACGAAAAA | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

DNA Sequencing (old method)

Page 16: Biomedical Informatics Michael D. Kane, Ph.D.. The Cell is a Living Machine

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/D/DNAsequencing.html

DNA Sequencing new method)