“the genome project: on zoos and curing cancer” - …€œthe genome project: on zoos and curing...
TRANSCRIPT
1
“The Genome Project:On Zoos and Curing Cancer”
Richard K. Wilson, Ph.D.Genome Sequencing Center
Washington University Schoolof Medicine
• “Completed” April 2003• ~3 billion base pairs (bp)• 24 chromosomes: 1-22, X, Y• ~25,000 - 30,000 genes• Only 2% of the genome encodes a gene• The sequence represents ~99% of the
genome. Approximately 99% of thesequence is in a highly accurate “finished”state. The other ~1% is somewhat lowerquality but is ordered & oriented.
The Human Genome
2
http://genome.ucsc.edu
http://www.ncbi.nlm.nih.gov
3
What’s next?
“applied genomics”
more genomes…
4
ChordataAmphibians
Vertebrate genomes:Completed or in progress
Fishes
ReptilesBirds
Marsupials
Monotrem
esRodents
Primates
CarnivoresH. sapiens
WU-GSC: work in progress…• Human: chr 2/4: manuscript in preparation…• Mouse: finishing in progress… (~30 Mb/month; target; Apr 2005)• Chimp:
- 4X draft: assembly released, analysis in progress…- Improved genome sequence: additional WGS, BACs in progress…- Finishing: chr 7 & Y (and ENCODE) in progress…
• Chicken:- 6.6X draft: assembly released, analysis in progress…- ENCODE regions in finishing…- Submitted proposal to improve draft…
• S. mediterranea: 6-8X draft: sequencing in progress…• Drosophila:
- D. yakuba: 8X WGS: assembly complete, analysis & pre-finishing in progress…- D. simulans: 8X WGS: sequencing (multiple strains) in progress…
• Macaque: 7-8X WGS: 7.4M reads scheduled…• Caenorhabditis:
- C. remanei: 7X WGS: sequencing complete, assembly in progress…- C. japonica: 7X WGS: heterozygosity analysis in progress…- CB5161: 7X WGS: heterozygosity analysis in progress…
• Others: Platypus & Lamprey BACs: investigatory sequencing in progress…
5
ComparativeGenomics
Developmental biologyAgriculture
Why sequence the chicken genome?
6
The evolution of birds,flight and feathers
Gallus gallus• Red jungle fowl• Ancestor of moderndomestic breeds
• Genome size = ~1.1 Gb• Autosomes ~38• Chromosome subtypes -macrochromosomes andmicrochromosomes
• Sex chromosomes: ZW(female is heterogamete)
Why sequence the chicken genome?
MammalsMammalsFishFish BirdsBirdsInvertebratesInvertebratesAmphibiansAmphibians
25
0 M
yr -
--
31
0 M
yr -
--
40
0 M
yr -
--
90
0 M
yr -
--
Caen
orha
bditi
s el
egan
s
Dro
soph
ila m
elan
ogas
ter
Ano
phel
es g
ambi
ae
Dan
io r
enio
Xeno
pus
trop
ical
is
Gal
lus
gallu
s
Orn
ithor
hync
hus
anat
inus
Bos
taur
us
Cani
s fa
mila
ris Ratt
us n
orve
gicu
s
Rhes
us m
acqu
e
Hom
o sa
pien
s
Pan
trog
lody
tes
Mus
mus
culu
s
Mon
odel
phis
dom
estic
a
Fugu
The comparative genomics playing field
7
whole genome
BACs
fosmids
G. gallus genome: Strategy…
plasmids
Genome assembly• Ultracontigs: 82• Avg. ucontig length: 11.4 Mb
Physical map• Total clones for FPC: 142,718• Total contigs: 260• Total contigs anchored to
chicken chromosomes: 202
• Avg. scontig length: 32 kb• N50 scontig length: 8.2 Mb
WGS sequence assembly• Genome size: 1.06 Gb• Total Q20 bp: 7.0 Gb• Sequence coverage: 6.6X• Total repeat bp: 7.6 Mb (7%)• Avg. contig length: 10 kb
• Anchored ucontigs: 60• Anchored ucontig length: 0.77 Gb
G. gallus genome: Strategy…
8
Minilda output: Martin Kryswinski
Clone set currently exists as 9,270 clones with 77 kb avg. overlap
BAC minimal tiling path
ensembl Chicken Gene Index• Gene predictions generated using
genewise and exonerate based onprotein, cDNA and EST evidence
• 17,784 gene structure predictions(including 75 pseudogenes)
• 28,491 transcripts• 6.5 exons/transcript
• 95% of chicken RefSeq CDS basescovered
• 87% of available chicken cDNAsaligned
• 18,995 EST genes produced from440,815 ESTs
• http://pre.ensembl.org/chicken
9
Genome sequence
Questions to Ponder
Why is the chick genome size constrained?How did sex chromosomes evolve?How did microchromosomes evolve?
What genes are lost/retained in bird to mammal lineage?Why are certain repeat classes absent?
Why is segmental duplication 2x that of human?How do we ascertain causative mutations from QTLs?
What do chicken to mammal alignments tell us about evolution?Do non-genic conserved sequences exist in chicken?
Improving the G. gallus genome1. Sequence selected BACs• Use the physical map• Sequence/local assembly of
difficult/repetitive regions• Sequence underrepresented
regions (e.g., Z & W chr)
2. Pre-finishing• Primer-directed reads to close
gaps & improve low qualityregions
3. Finishing• If necessary, depending on
genome-specific standard…
10
“applied genomics”
Variation and mutation
GCG AGG GAT AAT TGT …genome 2
…CysGlyAspArgAla
GCA AGC GAT AAT TGT …genome 4
…CysGlyAspSerAlaGCA AAA GAT AAT TGA …genome 5
…STOPGlyAspLysAla
GCA AAA GAT AAT TGT …genome 3
Cys …GlyAspLysAla
GCA AGA GAT AAT TGT …genome 1
Cys …GlyAspArgAla
correlationto disease?
11
“Mutational Profiling”
gene of interest
multiple individuals
Mutational Profiling
12
PCR amplification
Mutational Profiling
DNA sequencing
High-throughput M.P.
13
High-throughput M.P.
High-throughput M.P.
a new pipeline
• High quality data• Perfect sample tracking• Low cost
14
Mutational Profiling
• Pulmonary surfactant protein B(SPB) deficiency
• Prostate cancer• Non-small cell lung cancer• Acute myelogenous leukemia
Acute myelogenous leukemia• A group of diseases caused by a variety of inherited
and acquired genetic and epigenetic changes• Most frequently reported form of leukemia among
adults. Approximately 10,000 new cases per year inthe U.S.
• Generic therapeutics exist, but most patients stilldie from this disease.
• Risk stratification:• Age• De novo vs. secondary• Cytogenetic and genetic alterations• Response to initial therapy, relapse
15
Acute myelogenous leukemia
A variety of genetic and epigenetic events areprobably responsible for the heterogeneity of AML.
Acute myelogenous leukemia• ~450 target genes• “Discovery Set”:
Matched DNAsamples from 47 AMLpatients.
• “Validation Set”: Anadditional 94matched patientsamples.
• Expression profiling,array CGH, functionalgenomics/mousemodels.
16
Acute myelogenous leukemia
Target genesReceptor tyrosine kinasesCytoplasmic tyrosine kinasesHOX genesAbundant ProteasesTranscription factorsTumor suppressorsDNA repairRAS pathway genes
Signaling modifiersPhosphatasesCytokine receptorsCell cycle genesImmune surveillanceApoptosis relatedDrug resistanceMiscellaneous
Acute myelogenous leukemiaGene Mutations
CBF-! R151C
c-KIT M541L
c-MYC N11S Y32H V170I
FLT3 D835Y exon 11 internal tandem duplication (ITD)
NRAS G13R
PML R307C
RAR-" T43I
nonsynonomous
synonomous
20% of AML patients10% of AML patients
17
Acute myelogenous leukemia
TM TK TKFLT3
expansion
FLT3 *
20-25%
~10%D835Y
18
Non-small cell lung cancer• Lung cancer is the leading cause of cancer deaths in
men and women (US: 164,100 new cases and 156,900deaths in 2000).
• Non-small cell lung cancer is the most common typeof lung cancer. It typically metastasizes more slowlythan small cell lung cancer.
• Cigarette smoking is the most common cause of lungcancer: 87% of all cases are associated with smoking.
• Treatment is typically a combination of surgery,chemotherapy and radiation therapy.
NSC lungcancer
• Mutational profilingis focused on kinasegenes.
• Samples includetissue biopsies frompatients who haveresponded well tothe kinase inhibitorsIressa and Tarceva.
• Just underway…
19
Non-small cell lung cancer
05.22.04 EGFR Paraffin Extracted Samples and Germline Controls, tested over exons 18-24
IR18 E19 IR19 IR19 E20 E21 E23
155432
156146
156286
162603
162740
173192
180094
H_AO-0005038 GG ++ AA TT AG TT CT
H_AO-0005346 GG -- AA TT GG TT TT E19 156146del TTAAGAGAAGCAACATCT
H_AO-0005039 AG ++ AA TT AA TT CC
H_AO-0005344 AG ++ AA TT AA GT CC
H_AO-0005040 GG ++ AG CT AG TT CC
H_AO-0005342 GG ++ AA TT AG GT CC
H_AO-cntr101 GG ++ NN CT AG TT CC
H_AO-cntr102 GG ++ NN CT AG TT CC
egfrcon.c1 GG ++ AA TT GG TT TT
Exons examined: 18, 19, 20, 21, 22, 23, 24
Indels
Mutational profiling data for three Iressa responders
Revolutionizing medicine!
• A better understanding of thegenetics and the molecularevents underlying human disease.
• Diagnostic DNA sequencing• “Designer therapeutics”
20
Gene-based designer drugs
bcr-abl bcr-ablTargetprotein
Targetprotein
Challenges!
• DNA samples- Collection & records- Sample type & purity- Informed consent/HIPPA
• Technology & throughput improvements• Software tools & data management• Data visualization & statistical analysis• Correlation of complex datasets
21
Acknowledgements• WU Genome Sequencing Center
Lucinda Fulton, Bob Fulton, Pat Minx, Tina Graves, TracyMiner, Kim Delahaunty, Bill Nash, Kym Pepin, Ginger Fewell,Jim Eldred, Dave Dooling, Ph.D., Asif Chinwalla, LaDeanaHillier, Doris Kupfer, Ph.D., John Spieth, Ph.D., Wes Warren,Ph.D., Sandy Clifton, Ph.D., Elaine Mardis, Ph.D., many others…
• Acute myelogenous leukemiaTimothy J. Ley, M.D. - WUSM
• Non-small cell lung cancerHarold E. Varmus, M.D. & William Pao, M.D., Ph.D. - MSKCC
22