a strategy for dna sequence analysis of genome rearrangements
DESCRIPTION
Piloting an approach to cloning and sequencing genome rearrangements using follicular lymphoma as a case study. A strategy for DNA sequence analysis of genome rearrangements. Background. - PowerPoint PPT PresentationTRANSCRIPT
Piloting an approach to cloning and sequencing genome rearrangements using follicular lymphoma as a case study.
A strategy for DNA sequence analysis of genome rearrangements
Background
“Low resolution, genome-wide studies are beginning to catalog additional changes, such as small deletions or amplifications, and directed studies of individual genes implicated in tumor biology are steadily increasing our awareness of the variety of mutations underlying cancer. These data make it evident that only a modest fraction of the molecular targets involved in tumorigenesis has been identified and that cancer is a very heterogeneous disease due to many different mutations, environmental factors, and the interaction between the two. To develop targeted interventions, it will be important to identify all or most of these events.”
“These technological improvements make obtaining sequence information increasingly rapid and relatively inexpensive. It is now possible to contemplate something that was previously incomprehensible: obtaining comprehensive sequence information from multiple tumor types, at different stages—in a process that would be unbiased by our currently selective knowledge of the biology of cancer—to catalog all the genomic changes associated with cancer, and to render them accessible to study and intervention.”
Exploring Cancer through Genomic Sequence Comparisons. A NCI – NHGRI Workshop, April 14-15, 2004.
Charge to the workshop
• Which sequencing technologies would be the most appropriate for this application?
• Which tumor(s) should receive initial focus?
• What other data should be collected on selected tumor types?
• How could such an effort be piloted?
Sequencing genome rearrangements could provide a perspective on what could be learned from whole
genome sequencing of cancers.
How can these be found (and sequenced)?
What cancers?
What stages of cancer?
RFI: Large-scale identification of somatic mutations in cancer
Genomic Instability and Disease
genomic rearrangements play a major role in pathogenesis of human genetic disease architectural features of the genome are associated with susceptibility to rearrangements segmental duplications are implicated in facilitating recombination events that lead to
rearrangements rearrangements are not strictly random events but reflect higher order genome architectural features
alteration of gene dosage or creation of novel fusion genes as a result of recombination red-green colour blindness is caused by frequent deletions/duplications between highly
similar red and green opsin loci (Xq28)Stankiewicz, P. and Lupski, J.R. 2002. Molecular-evolutionary mechanisms for genomic disorders. Curr Opin Genet Dev 12: 312-319.
Lymphoma and Leukemia-associated Rearrangements
Janz, S. et al. 2003. Genes Chromosomes Cancer 36: 211-223.
Follicular lymphoma as a test case
The incidence of Follicular Lymphoma in Canada is significant and increasing. New approaches for disease control are required, and these will most likely derive from enhanced understanding of disease biology.
There is a strong lymphoma research group at BCCA and UBC.
FL samples of > 90% “purity” can be obtained in quantities sufficient for BAC library construction.
Published evidence supports the correlation of genome rearrangements to transformation from indolent widespread disease to more aggressive DLBCL.
Rearrangement Detection Methods
Bacterial artificial chromosome (BAC) array CGH: 2400 elements
Albertson and Pinkel, HMG 2003 12 #2 145 – 152.
Bacterial artificial chromosome (BAC) array CGH: 32,000 elements De Leeuw et al., HMG 2004 13 (17); 1827 - 37
Affymetrix 100 k SNP arrays: Genotyping and Copy number analysis.
Affymetrix 100 K Users Manual
Affymetrix 100 k SNP arrays: Genotyping and Copy number analysis.
SNP_A-1672076 (chr9 : 92630957)
SNP_A-1745782 (chr9 : 92630992)
SNP_A-1651414 (chr9 : 92691075)
SNP_A-1643223 (chr9 : 92731507)
SNP_A-1758738 (chr9 : 92840593)
SNP_A-1707255 (chr9 : 92872748)
SNP_A-1641827 (chr9 : 92872961)
SNP_A-1755966 (chr9 : 92906870)
SNP_A-1709121 (chr9 : 92926427)
SNP_A-1710928 (chr9 : 93104564)
SNP_A-1687608 (chr9 : 93104830)
SNP_A-1737371 (chr9 : 93121772)
Chr 9: 92,630,957 - 93,121,772 (490 kb)12 SNPs with average Copy Number (CN)=4.5
Min CN=4Max CN=5
Oligonucleotide approaches to detection of genome copy number imbalances (Affymetrix, ROMA, Nimblegen)
Array methods can detect genome amp-lifications and deletions. Rearrangements that do not produce a “genome imbalance” are cryptic.
Array methods do not yield reagents (clones) that can be sequenced.
>>Experiment with clone based approachesEnd Sequence Profiling (ESP)Fingerprint Profiling (FPP)
Clone fingerprinting applications
Construction of genome mapsAssessment of clone sequence assemblies
Redeye, with G. Rubin, R. Hoskins and S. Celniker
Selection and validation of tiling setsBAC array CGH (Human, Mouse….)
Cloning (for sequencing) genome rearrangements
Overview of Genome Mapping
Total Maps: 27
Total Organisms: 20
Total Fingerprints:1.8 M
Total Bases Mapped: 21 GB
Capacity: 40 human genomeequivalents pa.
genomic DNA(chromosomes)
restriction enzyme partialdigestor shearing
DNA fragments of various
sizes
size separation on agarose gels
marker
isolate DNA from appropriate gel size fraction
BAC vector
ligase
transform clones into E.
coli
array in 384-well plates (library)
•compare BAC fingerprint patterns of each clone
•identify clones containing highly related DNA and reconstruct contiguous regions of the genome (FPC)
•overnight culture•purify BAC DNA•restriction enzyme digest•agarose gel electrophoresis•restriction fragment identification
fingerprinting
map construction
Clone Fingerprinting
Fingerprint Data Generation and Restriction Fragment Identification
Fingerprinting Gel
29,950
540
M
Automated analysis of gel images is performed by BandLeader*, customized Matlab software for accurate identification and sizing of restriction fragmentsBandleader performance was assessed on fingerprints of 322 fully sequenced BAC clones (human and mouse)
•96% of real fragments were detected (sensitivity)•96% of fragments called were real (specificity)•96% accuracy in detection of co-migrating fragments, including a cluster of 11 fragments
*D. Fuhrmann et al. - Genome Research, 2003
121 lanes (96 samples + 25 marker lanes); HindIII digest; 1.2% agarose; run for 7 hours at 3.5 volts/cm in 1xTAE; gels stained post electrophoresis with SYBR Green I; images collected on MD Fluorimager 595
Throughput
Fingerprints
Human Genome
Equivalents
Day 3,000 0.15
Week 15,000 0.75
Year 800,000 42
Redeye identifies BACs with fingerprints inconsistent with their in-silico restriction maps
localized inconsistency
fingerprints consistentconfirm BAC sequence
BAC A
BAC B
sequence restriction fragments arecoloured by the relative size distanceto the nearest matching fingerprint fragment
Experimental plan
plan incorporates multiple levels of discovery, validation and comparison
compare
Lymphoma genomicDNA
Choice informed by:•arrays
•genomics•literature
•cytogenetics•relevance to progression
Array methods and ESP / fingerprinting can identify rearrangement-bearing BACs for sequencing
End Sequence Profiling
TUMOR
Volik, S. et al. 2003. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc Natl Acad Sci 100: 7696-7701.
Alignment of Fingerprints to “Electronic Digests” of the reference Human Genome Sequence
experimentalfingerprint
electronicfingerprint
genome sequenceassembly
CA
TG
CA
CA
TTC
CTG
CTG
TC
ATC
CC
AA
TC
ATG
GTG
GC
CA
AC
TTTG
GA
GTC
TC
CTTG
GA
GA
GC
CTG
TG
AC
GG
CC
TC
AG
TG
GC
TG
TG
CA
CC
AG
GC
CC
AC
CG
AT
GC
TC
AG
GG
GTG
TA
GG
CTG
CTTC
CG
GG
TC
TC
ATC
TC
AG
ATC
CC
CG
CC
AG
TT
CTG
GC
TG
GC
GC
TG
TG
TC
AC
CTC
TTC
TC
TG
TG
TC
AG
GA
TC
ATTTTTA
TC
CC
TC
TC
TG
TC
TG
TC
TTTC
TG
TC
TTTC
CC
TG
TG
CC
CTC
CTTTC
TTC
CC
CG
GA
CC
AG
CTA
TTTC
AG
ATTC
CA
TTC
AA
CTC
TG
TTC
AG
TG
ATG
CTG
CC
GC
TC
TC
AA
TG
CG
GTTA
GA
GC
GC
AA
GA
TG
TG
AG
AA
CG
TC
TG
TG
CTG
AG
TG
GC
CTA
AA
CA
CTG
AA
GG
CTG
CG
GG
TC
TTTC
TA
ATTTC
AG
CA
TTG
AG
AC
TTTA
CA
AG
TC
CA
CA
TTC
TTG
GC
ATTG
CC
AA
CC
AG
TTA
GA
ATA
GA
AC
AA
TA
AA
TC
CC
AG
TTT
TTG
TC
ATG
GG
CG
TC
TG
TA
ATTA
AA
ATG
GC
AA
CTG
GA
AC
AA
GG
CA
GTC
AC
T
Needleman-Wunsch
global alignment
TUMOR NORMAL
align fingerprintTUMOR
Potentially confounding issues
Repeated sequences
Sensitivity and specificity of mapping BAC fingerprints to genomeMultiple enzyme digests optimal
Clone artifactsRedundant representation of regions in independently derived clones
Repeats, Sensitivity and Specificity
Approximately 47% of genome sequence is composed of repetitive DNA sequences.
3% of repeat content is in blocks > 7.4 kb (2X the average size of a fingerprint fragment) 27% of repeat content is in blocks >1,800 bp (2X the average size of an end sequence read)
ESP is substantially affected by repeats: - 27% of end reads will have ambiguous alignments to the genome sequence- 47% of BACs will have one or neither of their end sequences unambiguously aligned, and thus will
not be useful.
Is FPP affected by repeat content? The performance of FPP alignments was assessed using a set of 43,000 simulated 130 kb BAC clone fingerprints derived from the sequence assembly
FPP alignment sensitivity - 99.8% of all BACs had associated alignments in the correct location
FPP alignment specificity- 78% of all alignments do not extend past the actual edges of the BAC - 87% of all alignments do not extend by more than 1kb (e.g. 500bp on both ends)- 99% of all alignments do not extend by more than 10kb (e.g. 5kb on both ends)
FPP: less sensitive to repeats (reduced attrition due to repeats) and samples more of clone insert
Double digest fingerprints yield specific patterns
Number of genome fragmentsmatching fragment F
bin(kb)
total size of duplicated
patterns (Mb)
10 170
15 13
20 7
30 2
small regions of the genome produce unique patterns
larger domains yield more specific patterns
EcoRI/NcoI
5-fold fingerprinting provides coverage by at least 2 BACs for 93% of the genome
simulated 50X MboI library 500 iterations
simulation shows 96% coverage at 2X+
fp depthcoverag
eall
coverage 2X+
3X 93 76
4X 97 87
5X 98 93
Clone redundancy captures rearrangements in more than one BAC
internal validation
Redundancy and sensitivity
pos A pos B
pos A pos B
pos A pos B
pos A pos B
100 kb 100 kb
50 kb 150 kb
25 kb 175 kb
10 kb 190 kb
what is the smallest mappable fragment?
At ~ 5X, 94% of all breakpoints >20 kb from edge of nearest BAC
FPP Alignment Method (v 0.1)
MCF7 Proof of concept
607 clones from MCF7 breast cancer cell line fingerprinted with 5 enzymes C. Collins and S. Volik (UCSF) all clones subjected to ESP analysis HindIII, EcoRI, BglII, NcoI, PvuII
enzymes selected to optimize sampling resolution, coverage by sizeable fragments, band spacing, robustness in laboratory
fingerprints were compared to the reference human genome sequence to map the clones onto the genome and identify BACs containing putative genome rearrangements. 206/607 BACs were identified as containing candidate rearrangements 245/607 BACs identified by ESP
148/206 were also identified by ESP as containing rearrangements.
complex rearrangements not detected by ESP were found
Complex (potential) rearrangements
M0012O05 localized to chrs 1p13.3 17q23.2 20q13.2 alignments using BglII, EcoRI, HindIII, NcoI, PvuII
1 107.21 T |210000....|219146....|225959....|237503....|246728....|252248....|260278....|270933....|281502....|288004.... 1 107.21 n ............,,,,,,,,,,,,S,,,,,,,,,,xxxxxxxXXXXXXXxxxxxXXxxXXXXXxxxXXXXXxxxxxxxxxxxxxxXxxxxxXX...,,,,,,,,,,,,,, 1 107.21 e ......,,,,,.........,,,,,,,.....,,,,,,,xxxxxxxxxxxxxxxxxxxxxXxxxxxxXXXXXSXXXSXXXXXXXxxxxx,,,,,,,,,,,,,,,,,,,,, 1 107.21 p .,,...........s...,,,,,..............XSSXXXXXXxxxxxxxssxxXXxxxxxXxxxxXXXXXXXXXXXXXXXXXXXXXXX..,,,,.........,,, 1 107.21 h ....sSS,,,,,,...,,,,,SXXXX..,S,,,,xxxxxxxxxXXXXX..sxxxxxxxxxxxxxxxxxxxxxxxx,,,xxXxxxxxxx,,.......ssSsS,,,,SS,, 1 107.21 b ..,,,,,,S,.......,,............,,...XXXXXxxxXSSXXXXXxxxxxxxxxxXXXXXXxxXXXXxxxxx,,,xXXXX........,,,,,,,,Ss...., 1 107.21 P ..................................********************************************************....................
17 59.28 T |280000....|292135....|303038....|318237....|328080....|335700....|348112 17 59.28 n .......,,,,,,,,,,,xxXXX....XXXXXxxxxXXXxx,,,,,...ss.......,,,,S,,,,,, 17 59.28 e ..s.sS,,,,....,.,,,,,,,,,,xxxxxxxxxxxxxxxxxxxxx,,,,,,...,,,......,... 17 59.28 p .........,,.............XXXXXXXxxsxxxX......,,,,,,.....,,............ 17 59.28 h .............,,,,xxxxXXXXXXXxxXXXXXxxxxxxx,,,...,,,,........,,,,,,,,S 17 59.28 b .,,,,,,,....,,,,,,,xxxXXXxxxxXXXXXXXXXXX...,,,,,,,,,,,.......,,,...,, 17 59.28 P ..................*************************..........................
20 53200000 T |200000....|205896....|215551....|223167....|233993....|240978....|244492....|250169....|259400....|262360....|268239....|27820 53200000 n ...,,,,,,,,,,.....,,,,,,,,xxxxsS,,,,,,,,,xxxxxxXXxxXxxxxsxxXXXXXXXXXXXXXXXX...XXXXXXXXXXXXXSssxxxxxxxxxxxxxxxxXXXxxxxxxXXXXXX20 53200000 e SSXXXXXXXXXX........,,,,,,,......sss....ssxxxxxxxxxxxxxxxxxxxxxXXXXxxxxs.........................XXXXXXxxxxxxXXxxxxXXXxxxXXXX20 53200000 p xxxxsxxxxxxxxx,,S,,,,..sxxxxX.....sxxxxxxxxxxsxxxxXXXXxxxxxxXXXXxsxxxxxxxxXXXXXSSssxxxxXxxxxxxxxxxsSXXxxxXXS,,,,,,..XXXXXXXXX20 53200000 h S,,,,,,,SX.....,,,,,,,...............XXXXXX.....XXXXXxxxxxXXXXXXXXXXXxxxxXXXSXXXXXXXSsxxxxXXXxxxXXXXXxxxxxxxxxxxXXXXXxxxxxxXX20 53200000 b ......sxxxx,,,,,,,,xxxxxx,,,,,,,,,,,,,SXXXXXxxxxxxxxxxxXXXXXXS,,,,,,xxXXXXXXXXXXSSXXXXXXXxxxxxxXXXXXXXXXxxXXXXXXXXXXXXXXSSssx20 53200000 P ........................................****************************************.********************************************
..|278694....|285532....|290615....|296348....|305699....|309888....|315411....|327575....|333647....|336548....|346115. XXXXXXXXXXxxxXXXXXXXXXXxxxxXXXXXXXXXXXXxxxxxxsss...sSXXXXXxxxxXXXXXXXXX....,,,,,,,.......XXXXXXXXXXXXX.................. xxXXXXxxxxxxxxxxXXXxxxxxxxxxsxxxxxxXXXxxxxxxxxxxx,,,,,,,,xxsxxx,,,,,,,,,,,..,,SsS,,,,,,,...ss....,,,,,,,,,,........sS,,S XXXXXXXSXXXXXXSssxxxXXxxx,xxxxxxXXxxxxxxxxxxXXxxxx,,,,SSXXXXXXXXSSSXXxxxx,,,,......sSS,,,,.............XX...s.,..,....,, xxxxXXXXXXXSXXXXXXXXXXXXxxxxxxxxxXXXSXXXxxXXXXXxxxxxxxxxxxxxxXXXXSXXxxxx,,,,,,,,,,,,,.........,,,,Ss......,,,,,,,,,,,,,, XSSssxxxxXXXXXXXXXxxxXXXXXXXXXSXXXXXXXXXXxxXXXXXXXXXXXXxxxxxxxxxxxxxxx,,,,,,,,,,,,,,,,,.....,,,S,,,,,...,,,,,,,,........ ***********************************************************************.................................................
Visual comparison of results obtained from fingerprint and ESP methods.
Both methods are capable of detecting rearrangements not found by the other method.
Not all MCF-7 clones harboured rearrangements. Clones in pilot project were enriched for those with specific rearrangements on chrs 1, 3, 17 and 20.
Rearrangement profile of MCF-7 genome is known to be extremely complex.Davidson, J.M. et al. 2000. Molecular cytogenetic analysis of breast cancer cell lines. Br J Cancer 83: 1309-1317.
Progress: FPP analysis of FL
fingerprinted clones(EcoRI, NcoI)
50,30050% of target
clones with average fp size 60-260kb
46,501 (92%)
clones 60-260kb with an alignment >25kb
45,509 (98%)~2.1X
unique coverage 2.2 Gb (77%)
clones with multiple alignments (both >25kb)
intra-chromosomal 189 (0.4%)
inter-chromosomal 967 (2.1%)
Library #1:patient with cytogenetic profile showing only t(14;18); est average insert size 135 kb
Library #2: patient showing complex cytogenetic profile in addition to t(14;18); est average insert size 130 kb
90% of clones are in the range 75-225 kb
Empty wells < 1%
Failures:3.6 % vs. 7% (All)
FPP alignments currently cover 77% of the genome
average
Redundant coverage
Coverage bp >=1 2,216,990,483>=2 1,344,663,365 >=3 654,423,007>=4 264,307,507>=5 90,817,363>=6 27,173,488
candidate FPP rearrangements
inter-chromosomaln=967
intra-chromosomaln=189
Clone size distribution: AEX HT0001
Clones associated with candidate rearrangements tend to be long
Proximity of BAC alignments to segmental duplications
Shorter clones are more likely to be associated with a segmenatl duplication
Redundancy of sampling resolves inconsistent alignments due to chimeric clones
T0049G09 aligns to chrs 5 and 14 alignment on chr14 is embedded within a contig formed by clones with single alignments breakpoint suggested by T0049G09 is inconsistent with neighbouring alignments
Translocation seen in FL t(14;18)(q32.33;q21.33)
BCL-2
IGH
FPP alignments of clone T0099C19
chr 18
cloneregion T0099C19 18 58898198-58941792 size 43595 score 113.6433
18 58880000 T |880000....|906215....|921342....|932071....|951463. 18 58880000 e X,..,S,XXXxXXXXxxXXXxx..xxSsSxxSxXXXsX,,,,.,,.....,. EcoRI18 58880000 n ...,,,,,XxxxXxXXxxS,,..xxXXXXXxxxxSxxxxSx...,,.xS,,, NcoI18 58880000 F 1000000147************8****************9*61000010000 18 58880000 F 1000038***************8************85410100000010000 18 58880000 P .......**********************************...........
chr 14
cloneregion T0099C19 14 105464595-105493586 size 28992 score 96.0589 14 105454680-105464594 size 9913
14 105440000 T |440000....|451148....|467323....|476363....|491061 14 105440000 e ....,,,,,,,,,,,,,,,SxxxxxxSxxXXsSxxXXsSxxxxxx..,,. EcoRI14 105440000 n .sS,,.s.,.,.xSxXx.xxxSsX,XXXxxXXXXxxXXXXxSsSxx..,, NcoI14 105440000 F 00000000000010134013679***********************8400 14 105440000 F 00000000000497**97***********************876410000 14 105440000 P ............**********************************....
chr 18
3’ BCL-2BES @ 58,892,7715.4 kb away
chr 14
IgH BES @ 105,494,7031.1 kb away
Double digests:
EcoRI+NcoI double digest, 0.9% Trevi gel, 5.3 V/cm, 4 hour
sensitivity: 91.1% / 98.1% (c.f. 96.5% single)specificity: 88.2% / 93.8% (c.f. 96.2% single)
12 kb
0.43 kb
>0.43 kb >1.00 kb
Conclusions
A BAC fingerprinting – based approach can identify rearranged clones (MCF7 and lymphoma).
BAC libraries can be made from primary tumor material (2 lymphoma libraries).
Such libraries can be subjected to high throughput BAC fingerprinting.
The fingerprints can be scanned for BACs bearing candidate genome rearrangements.
FISH has confirmed ~50% of candidate rearrangements.
Redundancy is an asset!
Acknowledgments
BC Cancer Agency Joseph Connors Randy Gascoyne Doug Horsman
UCSF Comprehensive Cancer Centre Colin Collins Stas Volik
Joe Gray
BC Cancer FoundationMichael Smith Foundation for Health Research
National Human Genome Research Institute (USA)Genome Canada / Genome BC
BCCA Genome Sciences Centre
Genome Mapping Group Martin Krzywinski Jacquie Schein
DNA Sequencing Group Rob Holt George Yang