session 3: genetics and genomics - unc...
TRANSCRIPT
Session 3: Genetics and Genomics Magnuson (not available), Mieczkowski, Jones, Berg, Jeck, Rathmell
May 25, 2011
High Throughput Sequencing Facility at UNC
Piotr Mieczkowski Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill May 25th 2011
SERVER
SERVER
SERVER
Nitrogen
gas
UPS PacBio RS SERVER
Architecture 2011 (Carolina Crossing)
HiSeq 2000
SHORT READS PLATFORM at UNC
Initial capability: up to 200Gb per run (8 days). Chemistry v3 enabled capability (current): up to 500-600Gb per run Capability after upgrade of the system scheduled for end of 2011: 1Tb per run. Cost of resequencing of one human genome (30x coverage) Current - about $6,000 End of June - $4,000 One run (11 days): 5 human genomes 48 DNA capture (all exomes)
Adapter compatibility for HiSeq sequencing system.
“Old” adapters vs “New” adapters:
Adapters for SE (Single End) applications: • “Old” PE (Paired End) • “Old” SE (Single End) • “New” PE TruSeq Adapters for PE (Paired End) applications: • Only “New” TruSeq (note: “old” PE adapters chemistry is NOT compatible with HiSeq chemistry)
Adapters for Multiplex applications: • “Old” PE • “New” PE (TruSeq)
What 3rd Generation promised to deliver?
Single molecule resolution in real time • Short time to result and simple workflow
– Base-call generation in <1 day
– Polymerase speed ≥1 base per second
• No amplification required – Bias not introduced
– More uniform coverage
• Direct observation – Distinguish heterogeneous samples
– Simultaneous kinetic measurements
• Long reads – Identify repeats and structural variants
– Less coverage required
• Information content – One assay, multiple applications
• Genetic variation (SVs to SNPs)
• Methylation
• Enzymology
PacBio RT 3rd generation DNA sequencing system
NEXT-GENERATION SEQUENCING (DEEP SEQUENCING) PLATFORMS
o Short reads
1. Genome Analyzer IIx (GAIIx), HiSeq2000, MiSeq – Illumina
2. SOLiD 5500xl System – Applied Biosystem
3. HeliScope™ Single Molecule Sequencer - Helicos
o Long reads
1. Genome Sequencer FLX System (454) – Roche
2. PacBio RS - Pacific Bioscience (commercial release 2011)
3. Personal Genome Machine - Ion Torrent
o Mapping sequences to the large DNA fragments
1. NABsys
2. Bionanomatrix
Clinical samples collection
Processing – DNA capture of XXX genes
Sequence data – treatment options
• 1 mil reads • up to 200bp length of the reads • Analysis software – CLC Genomic Workbench
Building a Computational Infrastructure to Support NextGen Sequencing
Corbin Jones
Biology & CCGS
Faculty Director HTSF
The Task
Generate
Process
Analyze
Interpret
HTSF Large & Busy
0
1000
2000
3000
4000
5000
6000
2008-2009Total
2010 2011 Q1 2011Projected
Addiction TCGA
General
~1 Trillion nt/week 16-80 Tbytes per week
$1,040,999 LCCC Server
Upgrades, 8%
KURE Blades, 30%
KURE Isilon Storage, 27%
Data Backup, 35%
RC, 46%
GGTT, 54%
Sai Balu
Ruth Marinshaw
Results
New Current
Processors 64x8 cores 2.93 GHz cores
32x8 cores 2.6 GHz
Memory 72 GB per 8 core node
26 GB per 8 core node
Disk 195 TB Isilon Filetek Backup (Total of 909 TB available)
32 TB
0 20 40 60 80
N
C
0 20 40 60 80
N
C
0 50 100 150 200
N
C
PIPE DB
BSP LIMS
Lib Prep Sample
Flowcell
Seq Ware
You.
PIPE
Bioinformatics
Tape Archive
By hand Auto
TCGA
Analyze
SeqWare EveryWare
Generate
Process
Analyze
Interpret
SeqWare Transition Schedule
Jeff Roach & Co. ITS-RC
PIPE SeqWare Sample Submission
The Future
Generate
Process
Analyze
Interpret
Harnessing the power of genetics in whole genome analysis of
hereditary cancer susceptibility
Jonathan S. Berg, MD/PhD Department of Genetics
Department of Medicine/Hematology-Oncology
Clinical Cancer Genetics at UNC
• Assessment and genetic testing for suspected hereditary cancer predisposition
• >5000 families evaluated over 15 years
• Majority of patients with breast/ovarian cancer, many with GI cancer, polyposis
• Comprehensive database of pedigrees, risk calculations, and genetic test results
Hypothesis • Inidividuals highly suspicious for hereditary
cancer, who test negative for known genes,
carry mutations of novel cancer susceptibility
genes
Plan
• Identify variants using whole genome sequencing in a subset of study participants/families
• Select candidate mutations
• Test other patients for mutations in the same gene
Analytic approaches
• Family-based – WGS in paired affected
family members
– Identification of rare, likely deleterious variants that are shared
– Segregation analysis in other family members
• Phenotype-based – WGS in multiple unrelated
individuals
– Identification of genes in which affected individuals have rare, likely deleterious variants
– Utilize pedigree information regarding inheritance pattern
Enrollment
• Study team identified and triaged >100 probands
– pre-test probability of a mutation
– size of pedigree
– informative relatives available for testing
• Current enrollment:
– 77 breast/ovarian probands, 43 informative family members
– 19 polyposis probands, 3 informative family members
– 5 other cancers
• Consent for whole genome analysis and blood samples obtained
Breast/ovarian study participants are similar to known BRCA1/2 families
BRCAPro scores Age at breast cancer diagnosis
BR
CA
Pro
(to
tal)
Age
BRCA1+ BRCA2+ Study probands (BRCA1/2 neg)
p < 0.01 p < 0.01
BRCA1+ BRCA2+ Study probands (BRCA1/2 neg)
Whole genome sequencing
• Breast cancer – 16 individuals from 8 families
• 6 proband/relative pairs, 1 trio of cousins, 1 unpaired
• Polyposis – 8 unrelated individuals
• Mixture of simplex (AR or new mutation dominant) and AD pedigrees
– 2 members of a dominant pedigree
• 4 samples sequenced by UNC HTSF
• 22 samples sequenced by Complete Genomics (10 currently in transit)
Analysis (in progress)
• Collaboration with RENCI
– Pipeline for variant calling and annotation (UNC samples) based on Broad Institute’s GATK
– Database for storage of genomes and cross-comparisons
• 1000 genomes variant frequency data
• Protein prediction tools
• Human Gene Mutation Database
Analysis (in progress)
• Breast cancer:
– Strong candidate gene with frameshift mutation segregating in one family and possible splice site mutation segregating in another family
– Putative function in RAD50 pathway, mice heterozygous for deletion develop cancer
– Collaborating with Chuck Perou’s lab
Analysis (in progress)
• Polyposis patients:
– Strong candidate gene with heterozygous rare missense mutations in two different individuals
– Related to APC (the known cause of FAP)
– Further studies in progress
Next steps
• Analysis of incoming 10 genomes
• Follow-up candidate mutations in families
• Sequence candidate genes in unrelated individuals
• Functional studies
• Next batch of genomes to be sequenced (price coming down rapidly)
Questions?
UNC Cancer Genetics Clinic Jim Evans Kristy Lee Cecile Skrzynia Catherine Fine Ofri Leitner Kate Major
RENCI Kirk Wilhelmsen Charles Schmitt Chris Bizon Nassib Nassar
Students Jonathan Mathew Michael Adams
Targeted Somatic Mutation Discovery
for Clinical Care in Cancer
William Jeck
Genetics Curriculum
MD/PhD Program
Cancer is a Genetic Disease:
• “Somatic” mutations occur in cancers and determine the biology of the disease.
• Identifying the pathogenic “driver” mutations that cause the cancer will predict prognosis and response to therapy.
• Identification of all driver mutations is needed, but is not currently done.
Sequence Capture
UNCeq 3.0
Tumor
Normal DNA Illumina Libraries
or Somatic Calls
Why Use Sequence Capture:
• Separates diagnostic approach from discovery
• Estimated that > 90% of driver mutations in common solid cancers occur in < 50 genes
• Capture target is flexible
• Capture can save time, effort and money
UNCeqTM Gene List 3.0 Capturing Exons of:
AKT1 ALK APC AR ATM BRAF BRCA1 BRCA2 CCND1 CDH1 CDKN2A CDKN2B CTNNB1 EGFR EPHA10 EPHA6 ERBB2 FAM123B FBXW7 FGFR2 FGFR3 FLT1 FRAP1 HECW1 HER4 HRAS IDH1 IDH2 KIT KRAS MET MSH6 MYC NF2 NRAS PAK7 PDGFRA PHF6 PIK3CA PTCH1 PTEN PTK2 RAF1 RB1 SMAD4 STAT3 STK11 TET1 TET2 TP53 UTX UTY Additional Regions:
All introns of BRCA1/2 Intron 19 of ALK HPV genes E6 and E7
Result of Sequence Capture
Detecting Point Mutations (B-RAF)
WM2664
Sk-Mel 24
Sk-Mel 28
Control
BRAF V600E
Detecting Deletions (PTEN)
WM2664
Sk-Mel 24
Sk-Mel 28
Control
log(c
overa
ge)
Exon 2 Exon 6
Detecting Haploinsufficiency
UTX – chromosome X
Exons
Male 1
Male 2
Female 1
Female 2
Male 1
Male 2
Female 1
Female 2
BRCA1 – chromosome 17
Exons
Detecting Translocations & CNV
Using NextGen for Better Cancer Care:
• We can sequence 50-150 genes in patients’ tumor / germline in <4 weeks for < $1,500 per patient
• We see the vast majority of genetic events (PMs, dels, amps)
• Expect IRB approval for use in any patient summer of 2011,
• Plan to sequence ~10K patient tumors from the UCRF-funded cancer survivorship cohort
• Validated discoveries will be disclosed to patients and physicians when consistent with treatment guidelines, or if standard of care has been exhausted
Acknowledgements • Ned Sharpless Lab
– Patrick Dillon
– Christin Burd
– Alex Siebold
– Soren Johnson
– Jessica Sorrentino
– Chad Torrice
• Derek Chiang Lab
– Gleb Savych
• Chuck Perou Lab
– George Chao
• Neil Hayes Lab
– Xiaoying Yin
• Billy Kim Lab
– Jeff Damrauer
• Jonathan Berg
• Nancy Thomas
• Janiel Shields
• Juneko Grilley-Olson
• Jeanne Noe
• Corbin Jones
• Piotr Mieczkowski
Funding by the UCRF
Ian Davis W. Kimryn Rathmell
William Kim Terry Furey Jason Lieb
Chromatin Remodeling of RCC
Nucleosome loss indicates regulatory activity
DNA binding motif DNA binding protein Nucleosome
ON
Repressed Poised Active
X OFF
X
POISED
Influences on nucleosomal position
Doerr, Nat. Methods, 2007
Histone modification Methylation Acetylation Phosphorylation
Nucleosome repositioning Active – SWI/SNF Passive – nucleotide composition
MLL SETD2 UTX JARID1C
PBRM1 ARID1A
RCC: HIF
Preliminary analysis
• 4 tumors selected for initial analysis, 2 paired normals for comparison.
• ChIP-seq
– H3K4me1 (poised)
– H3K4me3 (active)
– H3K27me1 (repressed)
• FAIRE
– Global regions of open chromatin
H3K4me1 analysis, one gene, differential states
Predicted cis-regulatory regions (Genomic Regions Enrichment Annotation Tool, Bejerano)
Does clustering identify anything meaningful?
Gene list Gene ontology 836 regions associated with 1147 genes *HIF network **Hypoxia in cancer cells **DMOG in cancer Rose Brannon
Jeremy Simon
UCRF Proposal
• Perform FAIRE and Histone methylation-specific ChIP-seq on increased numbers of selected tumors.
• Selection criteria:
– ccA or ccB subtype
– IHC detection of H3 methylation marks
– PBRM1 mutated vs wild type
Conclusions:
• This analysis will enable us to:
– Determine global chromatin remodeling effect of PBRM1 mutations on RCC.
– Link transcriptional readouts to chromatin patterns
– Identify common elements that underlie the transforming events in RCC
• THANK YOU