identifying susceptibility genes for familial pancreatic cancer using novel high-resolution
Post on 12-Feb-2022
4 Views
Preview:
TRANSCRIPT
Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-Resolution Genome
Interrogation Platforms
by
Wigdan Ridha Al-Sukhni
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Institute of Medical Science University of Toronto
© Copyright by Wigdan Ridha Al-Sukhni 2012
ii
Identifying Susceptibility Genes for Familial Pancreatic Cancer Using
Novel High-Resolution Genome Interrogation Platforms
Wigdan Ridha Al-Sukhni
Doctor of Philosophy
Institute of Medical Science
University of Toronto
2012
Abstract
Familial Pancreatic Cancer (FPC) is a cancer syndrome characterized by clustering of pancreatic cancer in
families, but most FPC cases do not have a known genetic etiology. Understanding genetic predisposition
to pancreatic cancer is important for improving screening as well as treatment. The central aim of this
thesis is to identify candidate susceptibility genes for FPC, and I used three approaches of increasing
resolution. First, based on a candidate-gene approach, I hypothesized that BRCA1 is inactivated by loss-
of-heterozygosity in pancreatic adenocarcinoma of germline mutation carriers. I demonstrated that 5/7
pancreatic tumors from BRCA1-mutation carriers show LOH, compared to only 1/9 sporadic tumors,
suggesting that BRCA1 inactivation is involved in tumorigenesis in germline mutation carriers. Second, I
hypothesized that the germline genomes of FPC subjects differ in copy-number profile from healthy
genomes, and that regions affected by rare deletions or duplications in FPC subjects overlap candidate
tumor-suppressors or oncogenes. I found no significant difference in the global copy-number profile of
FPC and control genomes, but I identified 93 copy-number variable genomic regions unique to FPC
subjects, overlapping 88 genes of which several have functional roles in cancer development. I
investigated one duplication to sequence the breakpoints, but I found that this duplication did not
segregate with disease in the affected family. Third, I hypothesized that in a family with multiple
pancreatic cancer patients, genes containing rare variants shared by the affected members constitute
iii
susceptibility genes. Using next-generation sequencing to capture most bases in coding regions of the
genome, I interrogated the germline exome of three relatives who died of pancreatic cancer and a relative
who is healthy at advanced age. I identified a short-list of nine candidate genes with unreported
mutations shared by the three affected relatives and absent in the unaffected relative, of which a few had
functional relevance to tumorigenesis. I performed Sanger sequencing to screen an unrelated cohort of
approximately 70 FPC patients for mutations in the top two candidate genes, but I found no additional
rare variants in those genes. In conclusion, I present a list of candidate FPC susceptibility genes for
further validation and investigation in future studies.
iv
Acknowledgments My research would not have been possible without the contribution of the following individuals:
A. Borgida, S. Holter, H. Rothenmund, and K. Smith at Ontario Pancreas Cancer Study and Ontario
Familial Gastrointestinal Cancer Registry for patient recruitment and selection. T. Selander of Samuel
Lunenfel Research Institute Biospecimen Repository for DNA extraction. S. Joe (Gallinger Lab) for
script-writing; N. Zwingerman, A. Gropper, and S. Moore (Gallinger Lab) for assistance with qPCR; A.
Lionel (Scherer Lab) for computational analysis of Affy6.0 data on Birdsuite and iPattern; Q. Trinh
(McPherson Lab) for computational analysis of exome data; R. Grant (Gallinger Lab) for assistance with
exome data interpretation; H. Kim and T. McPherson (Gallinger Lab) for assitance with PCR and Sanger
validation of exome variants. K. Hay, J. Keating, and S. Levitt (Gallinger Lab) for administrative support;
J. McPherson (Ontario Institute for Cancer Research) for exome sequencing data; and C. Marshall, D.
Pinto, D. Merico (The Centre for Applied Genomics), A. Shlien and D. Malkin (Malkin Lab) for their
advice on my data analysis and manuscript preparations.
My sincere gratitude to the Pancreatic Cancer Genetic Epidemiology Consortium (PACGENE) (PI - G
Petersen, Mayo) for being an invaluable source of DNA samples and insight into pancreatic cancer
genetics.
I am very grateful to my Program Advisory Committee (Gary Bader, Steven Narod, Stephen Scherer) for
their insightful feedback and advice throughout the five years of my PhD. In particular, their thoughtful
review of my manuscripts and thesis was most helpful and deeply appreciated.
To my supervisor, Steve Gallinger – I cannot adequately thank you in this crowded page for all that your
mentorship has meant to me since I first met you seven years ago. You pushed me when I needed
pushing and supported me when I was afraid of falling. You listened patiently to my complaints. You
cared about my success. I will always appreciate your open-mindedness, your integrity, and your
compassion. I feel most fortunate that I am able to call you my mentor and friend. Thank you for
everything.
A special thank you to M. Crump for helping me maneuver around some unexpected bumps in the road of
my PhD, and for exemplifying the compassionate clinician.
v
I dedicate this thesis to my beautiful family:
To Mama and Baba – Your love for me has been the greatest gift and blessing in my life, it is the reason
for who I am today. Thank you for supporting my aspirations even when you did not always understand
where they were taking me.
To Eisar, Mayce, Mohammed, and Bann – Thank you for putting up with me in my worst days… I am
proud of you all.
To my aunts, uncles, and cousins in Iraq and elsewhere – Thank you for keeping me alive in your hearts
despite the long years and oceans separating us. You inspire me.
I am grateful for the financial support received from the CIHR Vanier Doctoral Research Award,
Lustgarten grant, Invest-in-Research grant from Princess Margarte Hospital, Canadian Society for
Surgical Oncology grant, Johnson & Johnson research award, American HepatoPancreaticoBiliary
Association grant, and the Department of Surgery at the University of Toronto.
vi
Table of Contents Abstract..........................................................................................................................................................ii
Acknowledgments.........................................................................................................................................iv
List of Tables...............................................................................................................................................vii
List of Figures.............................................................................................................................................viii
List of Appendices........................................................................................................................................ix
Abbreviations................................................................................................................................................xi
Chapter 1 Literature Review.........................................................................................................................1
1. Pancreatic Cancer.................................................................................................................1
2. Copy Number Variation.......................................................................................................12
3. Whole-Exome Sequencing..................................................................................................37
Chapter 2 Loss of Heterozygosity at BRCA1 Locus in Pancreatic Adenocarcinoma.................................51
1. Abstract................................................................................................................................51
2. Introduction..........................................................................................................................51
3. Materials & Methods...........................................................................................................52
4. Results..................................................................................................................................55
5. Discussion............................................................................................................................58
Chapter 3 Germline Genomic Copy Number Variation in Familial Pancreatic Cancer.............................63
1. Abstract................................................................................................................................63
2. Introduction..........................................................................................................................63
3. Materials & Methods...........................................................................................................64
4. Results..................................................................................................................................73
5. Discussion............................................................................................................................94
Chapter 4 Exome Sequencing in a Familial Pancreatic Cancer Kindred..................................................100
1. Abstract..............................................................................................................................100
2. Introduction........................................................................................................................100
3. Materials & Methods.........................................................................................................101
4. Results................................................................................................................................106
5. Discussion..........................................................................................................................119
Chapter 5 General Discussion, Conclusions, and Future Directions......................................................122
References..................................................................................................................................................133
Appendices.................................................................................................................................................172
vii
List of Tables Table 1 Studies estimating risk of pancreatic adenocarcinoma in relatives of affected patients
Table 2 Summary of published studies reporting germline genomic copy-number variation in non-
disease samples
Table 3 Studies using exome-sequencing to identify genetic cause of disease
Table 4 Characteristics of BRCA1 mutation carriers and sporadic pancreatic cancer patients
Table 5 Pedigree summary for BRCA1 mutation carriers
Table 6 LOH results for BRCA1 mutation carriers and sporadic pancreatic cancer cases
Table 7 Proportion of high-confidence losses in cases and controls
Table 8 Proportion of high-confidence gains in cases and controls
Table 9 CNVs called by each of Birdsuite and iPattern in 36 samples on Affymetrix 6.0 array
Table 10 High confidence CNV profile of cases vs. controls (excluding EBV-derived samples and
excluding controls with data from only one chip)
Table 11 FPC specific CNVs
Table 12 Genes whose coding regions are affected by FPC-specific CNVs
Table 13 Summary of raw sequence data from Illumina GAII for each subject
Table 14 Sanger validation data for selected SNVs in each exome subject
Table 15 Sanger validation data for selected indels in each exome subject
Table 16 Number of variants identified in each exome subject
Table 17 Genes containing variants identified by filtration model #1, 2, 3, and/or 4
Table 18 Additional candidate variants in untranslated regions shared by exome subjects
viii
List of Figures Figure 1 Location of BRCA1 microsatellite markers on chromosome 17
Figure 2 Sample electropherogram of microsatellite marker fragment analysis
Figure 3 Three representative matched-pair electropherograms for microsatellite LOH
Figure 4 Representative sequencing result for an individual with 5382insC germline BRCA1 mutation
Figure 5 Analysis of 500K arrays in FPC cases and controls
Figure 6 Criteria for merging CNVs
Figure 7 CNV prioritization plan
Figure 8 Gains and losses identified in FPC cases by each algorithm/chip
Figure 9 Gains and losses identified in controls by each algorithm/chip
Figure 10 Duplications overlapping TGFBR3 gene
Figure 11 Pedigree of case ID-203, indicating results of qPCR testing for duplication G_97
Figure 12 Fine-mapping the breakpoint of duplication overlapping TGFBR3 using qPCR walk-along
method
Figure 13 PCR gel demonstrating amplification of ~1.5-2kb fragment containing G_97 duplication
breakpoint in case Id_203
Figure 14 G_97 duplication breakpoint mapping by Sanger sequencing
Figure 15 PCR gel illustrating amplification of test regions and duplication breakpoint in case Id-203 and
affected sister
Figure 16 FPC-specific losses and gains on autosomal chromosomes
Figure 17 Pedigree of FPC kindred investigated by exome sequencing
Figure 18 Average coverage of bases in target region of exome per subject
Figure 19 Read-depth per base in target region of exome in each subject
Figure 20 Genome-wide distribution of all SNVs identified in each exome subject
Figure 21 Genome-wide distribution of SNVs excluding synonymous variants in each exome subject
Figure 22 Genome-wide distribution of SNVs not reported in dbSNP131 in each exome subject
ix
List of Appendices Table S1 Primers for BRCA1 microsatellite markers
Table S2 BRCA1 mutations sequencing primers
Table S3 FPC cases in CNV study
Table S4 Controls (OFCCR and FGICR) in CNV study
Table S5 Primers for qPCR validation of CNVs
Table S6 Primers for qPCR breakpoint mapping of TGFBR3-transecting duplication
Table S7 High- and low-confidence losses on Affy500K array in FPC cases
Table S8 High- and low-confidence gains on Affy500K array in FPC cases
Table S9 High- and low-confidence losses on Affy500K array in controls
Table S10 High- and low-confidence gains on Affy500K array in controls
Table S11 High-confidence CNVs on Affy 6.0 array in FPC cases
Table S12 High-confidence CNVs on Affy 6.0 array in controls
Figure S1 qPCR of region D_180
Figure S2 qPCR of region D_19
Figure S3 qPCR of region D_128
Figure S4 qPCR of region D_152
Figure S5 qPCR of region D_234 (primer A)
Figure S6 qPCR of region D_234 (primer B)
Figure S7 qPCR of region D_143 (primer A)
Figure S8 qPCR of region D_143 (primer B)
Figure S9 qPCR of region D_220
Figure S10 qPCR of region D_30 & D_36
Figure S11 qPCR of region D_40
Figure S12 qPCR of region D_105 (primer A)
Figure S13 qPCR of region D_105 (primer B)
Figure S14 qPCR of region D_83
Figure S15 qPCR of region D_48
Figure S16 qPCR of region D_125
Figure S17 qPCR of region D_134
Figure S18 qPCR of region D_142 (primer A)
Figure S19 qPCR of region D_142 (primer B)
Figure S20 qPCR of region D_56
Figure S21 qPCR of region G_225
x
Figure S22 qPCR of region G_226
Figure S23 qPCR of region G_365 (primer A)
Figure S24 qPCR of region G_365 (primer B)
Figure S25 qPCR of region G_369
Figure S26 qPCR of region G_380
Figure S27 qPCR of region G_407
Figure S28 qPCR of region G_603/604
Figure S29 qPCR of region G_69
Figure S30 qPCR of region G_88
Figure S31 Region: G_97 (primer A) – ID_27
Figure S32 Region: G_97 (primer B) – ID_27
Figure S33 Region: G_97 (primer A) – ID_203 and family members
Figure S34 Region: G_97 (primer A) – ID_203’s family members
Figure S35 Region: G_97 (primer A) – ID_203 and family members
Figure S36 Region: G_97 (primer A) – ID_203’s family members
Figure S37 Region: G_97 (primer A) – ID_203’s family members
Figure S38 Region: G_97 (primer B) – ID_203 and family members
Figure S39 “T_Out_1” – Fine-mapping G_97 breakpoint in Id_203
Figure S40 “T_Out_2” – Fine-mapping G_97 breakpoint in Id_203
Figure S41 “T_Out_3” – Fine-mapping G_97 breakpoint in Id_203
Figure S42 “T_Out_4” – Fine-mapping G_97 breakpoint in Id_203
Figure S43 “O_In_2” – Fine-mapping G_97 breakpoint in Id_203
Figure S44 “O_Out_1” – Fine-mapping G_97 breakpoint in Id_203
Figure S45 “O_Out_5” – Fine-mapping G_97 breakpoint in Id_203
xi
Abbreviations AD – autosomal dominant
AGTC - Analytical Genetics Technology Centre
AJ – Ashkenazi Jewish
AML – acute myeloid leukemia
AR – autosomal recessive
BAC – bacterial artificial chromosome
BC – breast cancer
CCDS - Collaborative Consensus Coding Sequence
CGH – comparative genomic hybridization
ChIP-seq - chromatin immunoprecipitation sequencing
CIN – chromosomal instability
CNV – copy number variation
Conc – concordant
COSMIC - Catalogue of Somatic Mutations in Cancer
CRC – colorectal cancer
CSI – chromosomal structure instability
ddNTPs - dideoxy trinucleotide triphosphates
del - deletion
DGV – Database of Genomic Variants
Disc - discordant
EBV – Epstein-Barr virus
FAMMM - familial atypical multiple mole melanoma
FDR – first degree relative
FFPE – formalin-fixed paraffin-embedded
FGICR – familial gastrointestinal cancer registry
FISH – fluorescence in-situ hybridization
FN – false negative
FoSTeS - fork stalling and template switching
FP – false positive
FPC – familial pancreatic cancer
GB – gallbladder
GDB – human genome database
GST – glutathione-S-transferase
xii
GTC – genotyping console
GWAS – genome wide association study
HBOC - hereditary breast and ovarian cancer
Het - heterozygous
HMM – hidden Markov model
Homo - homozygous
HP – hereditary pancreatitis
HR – hazard ratio
ICGC - International Cancer Genome Consortium
IHGSC - International Human Genome Sequencing Consortium
Ins - insertion
IPMN – intraductal pancreatic mucinous neoplasm
LCL – lymphoblastoid cell lines
LD – linkage disequilibrium
LOD – logarithm of odds
LOH – loss of heterozygosity
MAF - minor allele frequency
MCN – mucinous cystic neoplasm
MEI – mobile element insertion
MLPA – multiplex ligation probe amplification
MMBIR - microhomology-mediated break-induced replication
MSKCC - Memorial Sloan Kettering Cancer Centre
NAHR – nonallelic homologous recombination
NBPF – neuroblastoma breakpoint family
NCBI – National Centre for Biotechnology Information
NFPTR - National Familial Pancreas Tumor Registry
NHEJ – nonhomologous end joining
NIH – National Institute of Health
NGS – next generation sequencing
NK – natural killer cell
nsSNV – nonsynonymous single nucleotide variants
OC – ovarian cancer
OFCCR - Ontario Familial Colon Cancer Registry
OHI – Ottawa Heart Institute
OMIM - Online Mendelian Inheritance in Man
xiii
OPCS – Ontario Pancreas Cancer Study
OR – odds ratio
OR genes – olfactory receptor genes
QC – quality control
PACGENE - Pancreatic Cancer Genetic Epidemiology Consortium
PanIN – pancreatic intraepithelial neoplasia
PARP – poly-(ADP-ribose)-polymerase
PC – pancreatic cancer
PCR – polymerase chain reaction
PGFE – pulsed gel field electrophoresis
PJS - Peutz-Jeghers syndrome
qPCR – quantitative polymerase chain reaction
qRT-PCR – quantitative reverese-transcription polymerase chain reaction
ROMA – representational oligonucleotide microarray analysis
RR – relative risk
SDR – second degree relative
SEER – surveillance, epidemiology and end results
SIR – standardized incidence ratio
SNP – single nucleotide polymorphism
SNV – single nucleotide variants
SPC – sporadic pancreatic cancer
TCAG – The Centre for Applied Genomics
TN – true negative
TP – true positive
UCSC - University of California, Santa Cruz
UPD – uniparental disomy
UTR – untranslated region
VNTR - variable nucleotide tandem repeat
WT - wildtype
1
Chapter 1 - Literature Review
1. Pancreatic Cancer
1.1 Pathology and epidemiology Pancreatic ductal adenocarcinoma (otherwise known as pancreatic cancer) is a highly lethal invasive
epithelial neoplasm with ductal differentiation, obscuring the lobular pattern of normal pancreatic
parenchyma. Pancreatic cancer grossly appears as a firm highly sclerotic mass with poorly circumscribed
borders. Microscopically, infiltrating gland-forming neoplastic cells are commonly surrounded by non-
neoplastic stroma in a characteristically intense desmoplastic reaction which often results in low tumor
cellularity.1
Pancreatic cancer is the fourth leading cause of cancer death in North America. The estimated number of
incident cases and deaths due to pancreatic cancer in the US in 2010 was 43,140 and 36,800,
respectively.2 In Canada, the estimated number of new cases and deaths from pancreatic cancer in 2011
was 4,100 and 3,800, respectively.3 Age-adjusted incidence in the U.S. based on SEER (Surveilance,
Epidemiology and End Results) data between 2004-2008 was 12 per 100,000 men and women; total
lifetime risk was 1.45% (approximately 0.5% by age 70).2
Due to the retroperitoneal location of the pancreas and lack of specific symptoms of early pancreatic
cancer, most patients present with advanced disease that precludes surgical resection. For those patients,
the only treatment option is palliation, and despite many trials of various chemotherapeutic and
molecular-target drugs and/or radiotherapy, median survival is 9-11 months.4 For patients who do
undergo surgical resection of localized pancreatic cancer, 80-85% ultimately recur locally and/or
systemically, resulting in 5-year survival of < 20% and overall 5-year survival for all pancreatic cancer
patients of <5%.5
1.2 Molecular biology Three distinct pre-invasive lesions have been identified as precursors for pancreatic adenocarcinoma:
pancreatic intraepithelial neoplasia (PanIN), intraductal papillary mucinous neoplasms (IPMNs), and
mucinous cystic neoplasms (MCNs). Each of these lesions has been associated with increased risk of
cancer and the arising cancer has been shown to develop from cells within the precursor. PanINs are
microscopic lesions in the smaller pancreatic ducts, and they are associated with a progressive spectrum
of cytologic and architectural atypia (corresponding to the classification of PanIN1-A, PanIN1-B, PanIN-
2, and PanIN-3).6 Mouse models of pancreatic cancer develop very similar lesions to human PanINs, and
2
molecular analyses have demonstrated that PanINs sequentially accumulate genetic alterations found in
invasive cancer, suggesting an “adenoma-to-carcinoma” progressive model akin to that of colorectal
cancer.7
However, the natural history of PanINs is not yet clear: while it is evident that advanced stage PanIN-3
lesions are tightly associated with cancer8, early-stage PanIN-1 lesions are quite common and are most
prevalent in older subjects.9 Moreover, PanINs are frequently multi-focal, and although endoscopic
ultrasound can detect parenchymal changes associated with PanINs, it does so at less than 100%
specificity.10,11 Therefore, deciding if and when to resect pancreata with suspected PanIN lesions is
contentious. IPMNs are grossly visible cystic lesions with direct communication to the main or branch
pancreatic ducts. The mutational spectrum of IPMNs differs somewhat from that of PanINs and invasive
adenocarcinoma, suggesting an alternate path of development.12 Main-duct IPMNs are associated with up
to 40% risk of malignant transformation and usually are resected, especially if they are growing and/or
larger than 3 cm, demonstrate mural nodularity on imaging, or are associated with main duct dilation.13
However, branch-duct IPMNs are more challenging to manage as their natural history is less clear. They
are associated with up to 15% risk of malignancy, and most authorities recommend resection if the
branch-duct IPMN exceeds 3 cm in size or has mural nodules or other suggestion of malignancy, but it is
unclear what to do with smaller lesions since most branch-duct IPMNs remain unchanged over long-term
follow-up.13,14 Since IPMNs are often multifocal, patients who undergo subtotal pancreatic resections
would need to continue surveillance for potential cancer recurrence. MCNs are rare, mucin-producing
cystic lesions not directly communicating with the pancreatic ducts and with a distinctive ovarian-type
stromal epithelium.15 They only account for approximately 1% of pancreatic cancers, but if detected they
should always be resected because they have a 40% chance of malignancy and have a 100% cure rate if
the MCN is resected before invasive carcinoma develops whereas the cure rate is only 50-60% if cancer is
present at time of resection.15
Molecular analyses have identified a variety of genetic, epigenetic, and genomic alterations in pancreatic
adenocarcinoma. The most common genetic mutation is Kras2 activation, present in 90-95% of cases; it
also appears to be one of the earliest changes that promote tumor development, as evidenced by its
presence in 36% of PanIN-1A and the fact that mice engineered to express the activated KrasG12D mutant
develop PanIN-like lesions and eventually invasive pancreatic carcinoma.7 Kras2 is a well-established
proto-oncogene, part of the RAS family of GTP-binding protein which are involved in proliferation, cell
survival, cytoskeletal modeling, motility, and other cellular functions.16 In pancreatic cancer, activating
mutations primarily occurring in codon 12 cause constitutive activation of the intracellular signal
transduction function of the expressed protein. This constitutive signaling appears to be necessary for
maintenance of pancreatic cancer, in addition to initiating its development.17 Other oncogenes activated
3
in pancreatic cancer include BRAF18, AKT219, cMYC17, and EGFR17. Moreover, constitutive activation of
the Hedgehog developmental signaling pathways has also been implicated in the development of
pancreatic cancer. The mammalian Hedgehog signaling pathway appears to play a critical role in
developmental patterning and mature tissue homeostasis, and it has been observed to be dysregulated in
many cancers, including pancreas.20 In fact, Hedgehog signaling activation appears to be one of the
initiating events in pancreatic cancer, as evidenced by ligand overexpression in PanINs21 and IPMNs22
and the fact that Hedgehog signaling cooperates with KrasG12D mutant in mouse models to promote
development of PanINs.23 Hedgehog signaling also appears to be important in regulating metastases.24
While the KrasG12D mutation is necessary for development of pancreatic cancer in mice, latency to tumor
development is significantly shortened if additional inactivating mutations of the tumor suppressor genes
TP53, p16, or BRCA2 are added.25 All three tumor suppressor genes, along with others, have been
identified in pancreatic adenocarcinoma. Inactivating mutations (homozygous deletions, intragenic
mutations plus loss of second allele, or epigenetic silencing) of p16 are found in approximately 90% of
tumors.26 This gene is a well-known tumor suppressor that codes for a cyclin-dependent kinase involved
in inhibiting progression through the G1-S checkpoint of the cell cycle. TP53, the “guardian of the
genome”, is involved in maintenance of genomic stability, apoptosis, and activation of DNA repair
(among its many functions), and is inactivated in 50-75% of pancreatic cancers (almost always via
intragenic mutations coupled with loss of the second allele).27 Another tumor suppressor gene commonly
inactivated in pancreatic cancer (in about 55% of cases) is SMAD4, a critical signaling intermediate in the
transforming growth factor (TGF)-beta pathway, providing selective growth advantage to affected cells.28
Patients who undergo resection and whose pancreatic cancer has loss of SMAD4 function have worse
prognosis than age- and stage-matched patients without SMAD4 mutations.29 Other tumor suppressor
genes inactivated at a lower frequency (5-10%) include BRCA2, STK11, TGFBR1, and TGFBR2.26 Of
note, p16 inactivation appears to be a relatively early event in tumor development, as it is detectable in
PanIN-2 lesions, whereas TP53, SMAD4, and BRCA2 mutations are not seen until the PanIN-3 stage.7
Genomic instability is a hallmark of most solid tumors, including pancreatic cancer. The types of
genomic rearrangements commonly identified in pancreatic adenocarcinoma are reviewed elsewhere (see
“Literature Review - CNVs and Cancer”). Telomere shortening, which predisposes to end-to-end
chromosomal fusions and breakage during anaphase thus generating amplifications and deletions in the
daughter cell genomes, is a very frequent and early event in pancreatic cancer development, demonstrated
in over 90% of the earliest stage PanINs.30 It is believed that the inactivation of TP53 allows the survival
of the pre-invasive cells which develop a heavy burden of genomic instability as a result of telomere
attrition, permitting them to progress through the activation of oncogenes and inactivation of tumor
suppressor genes to invasive status.31 It should be noted that most invasive pancreatic cancers appear to
4
reactivate telomerase, mitigating the degree of genomic instability and helping to stabilize the neoplastic
cells.32
In addition to genetic and genomic alterations, epigenetic silencing of tumor suppressor genes (via
methylation of CpG islands in the 5’ regulatory regions) is frequently observed in pancreatic
adenocarcinoma.33 Alternatively, hypomethylation of candidate oncogenes (which are overexpressed in
pancreatic cancer) has also been observed.34 MicroRNAs have also been implicated in pancreatic cancer
tumorigenesis, both as potential tumor suppressor as well as oncogenes.35 Furthermore, inflammation and
the tumor micro-environment appears to have a role in pancreatic tumorigenesis.36
Jones et al.37 examined the genomic profile of pancreatic adenocarcinoma in depth by sequencing the
coding regions of 20,661 genes in 24 pancreatic adenocarcinoma as well as hybridizing tumor DNA to a
high-resolution single nucleotide polymorphism (SNP) array to detect genomic rearrangements. The
authors identified 1,562 somatic mutations in 1,007 genes, of which 74.5% mutations were missense,
nonsense, small insertions/deletions, or splice-site/untranslated region (UTR) changes. The average
number of mutated genes per tumor (48) was much less than the number of mutations discovered in breast
cancer (101) or colorectal cancer (77) in previous studies, and one potential explanation given is that the
cells which initiate pancreatic tumorigenesis are likely to have undergone fewer divisions than tumor
initiating cells in breast or colorectal cancer. Gene-set analyses of the genes mutated in pancreatic cancer
identified 69 gene sets that were altered in most pancreatic tumors, of which 31 gene sets can be grouped
in 12 core signaling pathways with discernible functional relevance to neoplasia, which were affected in
67-100% of the pancreatic tumors. Notably, although the 12 core pathways were altered in almost all
cancers, the specific genes that are mutated in each tumor differed significantly across patients, aside
from the few frequently mutated genes discussed above.
These results emphasized the importance of the pathway approach to understanding tumorigenesis, and
suggest that successful anti-cancer therapy may depend more on targeting pathways rather than individual
genes. A subsequent study applied massively parallel sequencing to sequence the entire genome of
metastases from seven of the subjects included in the previous study.38 On average, two-thirds of
mutations detected in each metastasis were also present in the paired primary tumor and were called
“founders”, while the remaining mutations that were only identified in metastases were termed
“progressors”. Subclones that led to the development of metastases were identified within each primary
tumor. The authors devised a mathematical model for calculating the timing of different stages of
pancreatic cancer development and estimated that it takes an average of 11.7 years from the initiation of
tumorigenesis until the generation of the cell that develops into the parental clone; another 6.8 years were
estimated for the evolution into subclones with metastatic capacity, and 2.7 years until the death of the
5
patient. It should be noted that most of the tumors in this study were not from familial cases, and tumors
with highly-penetrant germline predisposing mutations may follow a different evolutionary timeline and
pathway. Nonetheless, it appears that a significant window of opportunity for screening and curative
intervention exists, if it is possible to identify tumors before metastatic subclones develop.
1.3 Risk factors The list of putative risk factors for pancreatic cancer is long, with wide variability in degree of risk
conferred and strength of evidence for the association. Age is strongly correlated with increased risk of
pancreatic cancer, with the median age for diagnosis at 72 years and more than two-thirds of cases
occurring after age 65.2 Race is also a factor, with African-Americans having substantially higher rates of
pancreatic cancer than white, Asian, or Hispanic Americans.2 Perhaps the strongest association of a risk
factor exists for tobacco use, as numerous studies have demonstrated that smoking can double lifetime
risk and the estimated population attributable risk is 25%.39 Other risk factors with low-to-moderate
contribution to pancreatic cancer include alcohol consumption40, obesity40, occupational exposure to
certain chemicals41, long-standing diabetes mellitus42, and Helicobacter pylori infection43. However, only
smoking has been consistently associated with pancreatic cancer. Chronic pancreatitis is associated with
up to 13-fold increased risk in pancreatic cancer, and even higher risk in patients with hereditary
pancreatitis, caused by genetic mutations (e.g. PRSS1, SPINK1).44 Possible protective factors include
allergies45, Vitamin D intake46 (although this is contentious47), and consumption of citrus fruit48 and
“Mediterranean diet”49.
The role of germline genetic factors predisposing to pancreatic cancer is a subject of numerous studies
and ongoing collaborations. Polymorphisms in the following genes have been associated with increased
or decreased risk of sporadic pancreatic cancer: GCKR (odds ratio (OR) = 2.14 )50, IGF1 and IGF1R (OR
= 0.6-0.7)51, IGFPB1 (OR = 1.46)51, SSTR5 (OR = 1.62)52, [MGMT (OR = 0.6), PMS2 (OR = 1.44),
PMS2L3 (OR = 5.54)]53, HNF1A (OR = 1.16-1.22)54, SDF1 (OR = 2.74)55, [FTO (OR = 1.12), MNTR1B
(OR = 1.11), MADD (OR = 1.14)]56, ALDH2 (OR = 1.37)57, HK2 (OR= 0.68 in diabetic/3.69 in non-
diabetic)58, [PPARG (OR = 0.21), NR5A2 (OR = 0.57-0.77), ADIPOQ (OR = 0.67), GGT1 (OR = 1.86)59,
CASP9 (OR = 4.09-16.26)60, CAPN10 (OR = 1.57)61, p21 (OR = 1.70)62, CYP1B1 (OR = 0.67)63, CFTR
(OR = 1.4; OR = 1.83 if diagnosed under age 60)64, GSTP1 (OR = 3.09 if diagnosed under age 50)65,
CYP17A1 (OR = 0.63-0.77)66, PPARG in conjunction with high-dose Vitamin A (OR = 2.80)67, PTGS2
(OR = 1.34-1.63)68, MMS19L (OR = 0.7/1.34)69, IL1beta (OR = 2.0 for unresectable cancer)70, [LIG3 (OR
= 0.23), ATM (OR = 2.55)]71, IGF2 (OR = 0.07)72, [MTHFR (OR = 4.50), MTR (OR = 2.65), MTRR (OR
= 3.35) in heavy drinkers]73, MTRR (OR = 1.44-1.52)74, [FasL (OR = 0.35-0.73), CASP8 (OR = 0.56-
0.65)]75, NAT2 (slow-type, OR = 5.7)76, XRCC2 in smokers (OR = 2.32)77, ERCC2 in smokers (OR =
6
0.46)78, [MTHFR (OR = 2.6-5.12), TYMS (OR = 2.19)]79, NAT1-rapid type (OR = 1.5)80, RNASEL (OR =
2.12-3.5)81, UGT1A17 (OR= 1.98-4.7)82, XRCC1 in smokers (OR = 7.0 in women/OR = 2.4 in men)83.
Pathways affected by those genes include diabetes mellitus type II and glucose metabolism, insulin
growth factors, somatostatin, DNA repair, tumor growth, alcohol metabolization, obesity, glutathione
metabolism, cytochrome P450, cystic fibrosis transductance regulator, fatty acid storage,
cyclooxygensase-2, nucleotide excision repair, inflammation, folate metabolism, cell cycle and cell death,
and toxin detoxification. Many of the aforementioned studies suggest gene-environment interactions.
To date, four genome-wide association studies (GWAS) of pancreatic cancer have been published: two
related GWAS were conducted on subjects drawn from 12 cohort studies and 9 case-control studies
(mostly of European ancestry)84-85, a study performed in a Japanese population86, and the most recent
study was in a Chinese population.87 While SNPs in several loci were observed to be associated at
sufficiently low p-values to suggest statistical significance (7q36-SHH, 15q14-gene desert)84, (13q22.1-
near KLF5 and KLF12,1q32.1-NR5A2, 5p15.33-CLPTM1L-TERT)85, (6p25.3-FOXQ1, 12p11.21-BICD1,
7q36.2-DPP6)86, (21q21.3 – BACH1, 5p13.1 – DAB2, 10q26.11 – near PRLHR, 21q22.3 – near TFF1,
22q13.32 – near FAM19A5)87, to date only one association has been successfully replicated in additional
studies: the ABO blood group locus at 9q34. In the GWAS by Amundadottir et al.84, the ABO locus was
identified as a potential associated locus in the initial phase of the study and confirmed in a replication
case-control set (odds ratio (OR) per non-O allele = 1.20). This association of non-O blood group with
pancreatic cancer risk was further replicated in other case-control studies (OR 1.33-2.4288, OR 1.3789, OR
1.4390, protective O-blood type OR 0.5391). Furthermore, Wolpin et al.92 reported a higher risk of
pancreatic cancer for carriers of the A(1) variant of the A-allele, which has a higher glycosyltransfrase
activity than the A(2) allele (OR 1.38). In addition, Risch et al.89 observed increased risk of pancreatic
cancer in non-O blood group subjects who are seropositive for H.pylori but negative for its virulence
protein CagA (OR 2.78). Analyses in non-Caucasian populations found similar risk effects of the non-O
alleles (OR 1.37-1.3993; OR 1.67-3.2894). Wang et al.95 also found evidence for an additive effect of A
blood type with Hepatitis B infection. It should be noted that the association of non-O blood type with
pancreatic cancer predates these GWAS; one of the earliest reports suggesting an association was in The
British Medical Journal in 1960.96 How blood type mediates pancreatic cancer risk and tumorigenesis is
unknown97, but it appears that approximately 20% of pancreatic cancers in European populations is
attributable to having a non-O blood type status.88
Higher-penetrant genes may also predispose to pancreatic cancer, as shown by the co-occurrence of
pancreatic cancer with several known cancer syndromes. The highest-known risk is associated with
Peutz-Jeghers syndrome (PJS), caused by germline mutations of STK11. This autosomal dominant
syndrome is associated with melanocytic macules on the lips and buccal mucosa, gastrointestinal
7
hamartomas, and cancer. The lifetime risk of pancreatic cancer in PJS patients is up to 132-fold relative
to the general population, or about 66% by age 70.98,99 Another condition associated with up to 80-fold
higher risk of pancreatic cancer is hereditary pancreatitis, most commonly caused by mutations in PRSS1
in an autosomal dominant fashion (although SPINK1 mutations have also been implicated).100-101 Familial
atypical multiple mole melanoma (FAMMM) is an autosomal dominant syndrome characterized by
multiple nevi and increased risk of cancers, predominantly melanoma and pancreatic adenocarcinoma.
The primary genetic cause of FAMMM is mutations in CDKN2A/p16, and carriers (particularly of the
p16-Leiden founder) have up to 47-fold increased risk of developing pancreatic cancer.102 Some genes
that cause hereditary breast and ovarian cancer also raise risk of pancreatic cancer. To date, the gene
contributing to the largest proportion of hereditary pancreatic cancer is BRCA2, which is estimated to
raise lifetime risk of pancreatic cancer by 3.5- to -10-fold and accounts for up to 19% of high-risk
families103-107 (although the contribution of BRCA2 may be population dependent, as it appears to be
significantly lower in German, Korean, and Spanish populations108-111). Although most BRCA2 families
with pancreatic cancer also cluster breast and/or ovarian cancer, some families are characterized by
exclusive presence of pancreatic cancer112, and even apparently sporadic cases have been demonstrated to
carry deleterious germline BRCA2 mutations.113 Interestingly, while the BRCA2 locus was first proposed
to contain a cancer-associated gene via linkage to familial breast cancer,114 the localization of the gene
itself and suggestion of its tumor-suppressor role was facilitated by discovery of a homozygous deletion
at 13q12 in a pancreatic adenocarcinoma.115-116 Germline mutations of other Fanconi-anemia pathway
genes have been reported in pancreatic cancer families but the magnitude of risk associated with these
genes is unclear: PALB2 in ~0.9-4% of families117-120), BRCA1 in 2.6-4.4% of families121-122 (although
Axilbund et al. failed to find mutations in a series of 66 familial pancreatic cancer patients123), ATM in
2.4% of families124, and mutations in FANCC and FANCG have been reported in young-onset pancreatic
cancer subjects125 although these genes do not appear to contribute significantly to familial pancreatic
cancer.126-128
Several other syndromes associated with risk of pancreatic cancer include Lynch syndrome (caused by
mutations of the mismatch repair genes MLH1, MSH2, MSH6, PMS2 or TACSTD1-3’ deletion),129- 132 Li-
Fraumeni syndrome (caused by mutations of TP53)133, Familial Adenomatous Polyposis (caused by
mutations of APC)134, and cystic fibrosis (caused by mutations of CFTR)135.
However, the contribution of known genetic syndromes to the overall heritability of pancreatic cancer is
limited; approximately 10% of all pancreatic cancer cases appear to be familial or hereditary and most do
not have a known genetic explanation.136 Perhaps the earliest indications that a familial pancreatic cancer
syndrome exists were several case reports and case series in the 1970s and 1980s describing clusters of
pancreatic cancer in first- and second-degree blood relatives.(137-143). Subsequently, both retrospective
8
case-control and prospective cohort studies suggested increased risk of pancreatic cancer in close relatives
of patients compared to the general population. (Table 1)
Table 1- Studies estimating risk of pancreatic adenocarcinoma in relatives of affected patients
Paper Type of Study
Description Risk of pancreatic cancer in relatives of patients
Ghadirian et al.144 Case-control 179 cases vs 179 controls (French Canadian)
OR in subjects with positive family history = 13 (p<0.001)
Fernandez et al.145 Case-control 362 cases vs. 1408 controls (Italian)
OR in FDR of affected cases = 3.0 (95% CI 1.4-6.6)
Silverman et al.146 Case-control 484 cases vs. 2099 controls (US)
OR in FDR of affected cases = 3.2 (95% CI 1.8-5.6)
Schenk et al.147 Case-control 247 cases vs. 420 controls (US)
OR in FDR of affected cases = 2.49 (95% CI 1.32-4.69)
Ghadirian et al.148 Case-control 174 cases vs. 136 control s (Canada)
OR in FDR of affected cases = 5.0 (p=0.01)
Inoue et al.149 Case-control 200 cases vs. 2000 controls (Japan)
OR in subjects with positive family history = 2.09 (95% CI 1.01-4.33)
Rulyak et al.150 Nested case-control
251 members of 28 families (US)
OR with each affected FDR = 1.8 (95% CI 1.1-2.7)
Cote et al.151 Case-control 247 cases vs. 420 controls (US)
OR in subjects with positive family history = 2.49 (95% CI 1.32-4.69)
Hassan et al.152 Case-control 808 cases vs. 808 controls (US)
OR in FDR of affected cases = 3.3 (95% CI 1.8-6.1); OR in SDR of affected cases = 2.9 (95% CI 1.3-6.3)
Jacobs et al.153 Case-control 1,183 cases vs. 1,205 controls (US,Europe,China)
OR in FDR of affected cases = 1.76 (95% CI 1.19-2.61)
Matsabuyashi et al.154
Case-control 577 cases vs. 577 controls (Japan)
OR in FDR of affected cases = 2.5 (p=0.02)
Coughlin et al.155 Cohort 1.1 million US RR for PC mortality in FDR of affected cases (males) = 1.5 (95% CI 1.1-2.1); (females) = 1.7 (95% CI 1.3-2.3)
Tersmette et al.156 Cohort Prospectively followed 150 FPC kindreds and 191 SPC kindreds from NFPTR
SIR in FPC relatives if 2 or more affecteds = 18.3 (95% CI 4.74-44.5); SIR in FPC relatives if 3 or more affecteds (56.6 (12.4-175) [no significant elevated risk in SPC relatives – SIR in FDRs = 6.5 (0.78-23.3)]
Hemminki et al.157 Cohort 10.2 million Swedish (21,000 PC cases)
SIR for children of affected cases = 1.73 (95% CI 1.13-2.54)
Klein et al.158 Cohort Prospectively followed 370 FPC kindreds and 468 SPC kindreds from NFPTR
SIR in FDRs of FPC affecteds = 9.0 (4.5-16.1) if 1 FDR affected, SIR = 4.5 (95% CI 0.54-16.3); if 2 FDRs affected, SIR = 6.4 (95% CI 1.8-16.4); if 3 or more FDRs affected, SIR = 32 (95% CI 10.4-74.7) [no significant elevated risk in FDRs of SPC affecteds, Sir =1.8 (95% CI 0.2—6.42) or spouses/unrelated relatives, SIR =2.4 (95% CI 0.06-13.5)
Jacob et al.159 Cohort 1.1 million (US) RR for PC mortality in FDR of affected cases = 1.66 (95% CI 1.43-1.94)
Brune et al.160 Cohort Prospectively followed SIR in FDR of FPC affected = 6.79 (95% CI
9
1,718 kindreds from NFPTR
4.59-9.75) if 1 FDR affected, SIR = 6.86 (95% CI 3.75-11.04); if 2 FDRs affected, SIR = 3.97 (95% CI 1.59-8.2); if 3 or more FDRs affected, SIR = 17.02 (95% CI 7.34-33.5) Young-onset (< 50 years) in FDR associated with SIR=9.31 (95% CI 3.42-20.28); Late-onset (> 50 years) in FDR associated with SIR=6.34 (95% CI 4.02-9.51)
OR = odds ratio; 95% CI= 95% confidence interval; FDR= first-degree relative; SDR = second-degree relative; PC = pancreatic cancer; SIR = standardized incidence ratio; RR = relative risk; FPC = familial pancreatic cancer (at least 1 pair of affected FDRs); SPC = sporadic pancreatic cancer (no affected FDR pairs); NFPTR = National Familial Pancreas Tumor Registry at Johns Hopkins University (http://pathology.jhu.edu/pc/nfptr/index.php)
Segregation analysis of 287 families with an index case of pancreatic cancer recruited by Johns Hopkins
Medical Institutions supports the hypothesis that a major gene is involved in pancreatic cancer risk, with
the most likely model including the autosomal dominant inheritance of a rare allele.161 The degree of risk
is linked to the number of affected relatives, the degree of relation, as well as the age of onset of disease
in relatives. Three large cohort studies following kindreds recruited by the National Familial Pancreas
Tumor Registry (NFPTR) at Johns Hopkins Medical Institutes found risk in first-degree relatives (FDR)
of affected patients in families with at least one pair of affected first-degree relatives of 4.5-6.79 if only
one FDR is affected, 3.97-18.3 if two FDRs are affected, and 17.02-56.6 if three or more FDRs are
affected.156,158,160 Moreover, the younger the age of onset of cancer in the affected relative, the higher the
risk in first-degree relatives (hazard ratio (HR) 1.55 per decreased year of onset).160
It is not clear whether the average age of onset of pancreatic cancer is significantly lower in FPC, as many
studies found no difference in age of onset of disease between FPC and sporadic cases143,144,156,162,163 and
even the few studies that identified a difference found it to be rather small (65-68 yrs in FPC vs. 70 yrs in
SEER database).160,164,165 However, there is evidence for genetic anticipation in FPC families, with
members of each successive generation developing cancer on average 6-15 years younger than the
previous generation.166,167;168,169 There is strong evidence for gene-environment interaction in FPC,
particularly with respect to tobacco use; FPC kindred smokers developed pancreatic cancer a decade
earlier than non-smokers168 and the relative risk of developing cancer is approximately 19-fold that of the
average population in smokers from FPC families.158
In some cancer syndromes, there is a significant difference in survival between familial and sporadic
cases (e.g. colorectal cancer), but it is not clear that there is such a difference in FPC. Several studies
have found no difference in survival between sporadic and familial pancreatic cancer.143,164,170,171 Ji et
al.172 found that familial cases had worse outcome than sporadic cases (HR=1.37) in a Swedish Family
Cancer database, while Yeo et al.173 identified significantly worse survival in unresected FPC cases
compared to unresected sporadic cases but no significant difference for resected cases. Interestingly,
10
recent anecdotal reports and small series of FPC patients with mutations in BRCA-related genes who were
treated with platinum-based chemotherapy, topoisomerase inhibitors, or poly-ADP-ribose-polymerase
(PARP1)-inhibitors suggest that this subset of familial cases may have good chemotherapy responses and
improved survival compared to sporadic cases.174-178
Aside from the difference in inactivation of BRCA-related pathway between familial and sporadic cases
(up to a fifth of FPC tumors vs. less than 10% in sporadic cases), there has been limited investigation into
molecular genetic and pathologic differences between familial and sporadic pancreatic cancers. Pancreata
from FPC subjects appear to have increased prevalance of precursor lesions (PanINs and IPMNs)
compared to sporadic pancreatic cancer.179,180 Studies analyzing the rate and genome-wide distribution of
loss-of-heterozygosity (LOH) have shown conflicting results: Abe et al.181 identified LOH at
approximately 50% of informative markers in 20 FPC tumors while a similar study in 82 sporadic tumors
found the average LOH rate to be 25%182, but a third study that used a SNP array to identify LOH in 26
pancreatic cancer cell lines found a rate of LOH similar to that in familial tumors (average 43%).183
Differences in LOH rates aside, the pattern of LOH across the genome appeared similar across all three
studies. Brune et al.184 analyzed familial tumors for Kras mutations, Tp53 and SMAD4 expression, and
methylation rate of seven genes previously shown to be hypermethylated in sporadic tumors, and found
no significant difference between familial and sporadic tumors.
Given all the evidence supporting the existence of at least one major gene explaining the heritability of
pancreatic cancer in high-risk families, much effort has been directed at attempting to identify the
responsible gene, including genetic linkage. Linkage analysis is a statistical tool which uses family-based
data and the likelihood of recombination between loci on a chromosomal arm to identify genomic regions
that appear to be transmitted to affected members of the family more frequently than by chance alone.
Since linkage analysis was successful in mapping the location of and facilitating the identification of
highly-penetrant genes in many cancer syndromes (e.g. APC in Familial Adenomatous Polyposis185;
BRCA1 and BRCA2 in Hereditary Breast and Ovarian Cancer syndrome114,186), this technique has been
applied to the study of FPC. Familial registries fostered the collection of high-risk families, and a large
North American consortium has pooled the resources of six major sites: the Pancreatic Cancer Genetic
Epidemiology Consortium (PACGENE).165 This National Institute of Health (NIH)-funded collaboration
includes the University of Toronto, Mayo Clinic, Johns Hopkins University, MD Anderson Cancer
Centre, Dana Farber Cancer Institute, and Karmanos Cancer Institute. Each site prospectively identifies
pancreatic cancer patients with a family history of at least two affected members. If a pedigree is deemed
suitable for linkage analysis (with the help of linkage simulation programs), probands are asked to
consent to contact their relatives for recruitment to the study. Consenting individuals complete
questionnaires about clinical and family history and provide blood samples for DNA extraction.
11
Linkage efforts in FPC have yielded limited results. The linkage work by PACGENE is ongoing, but to
date no highly significant loci have emerged. Investigators at the University of Washington (not
connected to PACGENE) published results of a linkage analysis conducted in a single FPC family
(identified as “Family X”) characterized by four generations of affected members with an autosomal
dominant pattern of inheritance suggesting high penetrance, young age of onset (median age 43), and
concomitant endocrine and/or exocrine pancreatic insufficiency.187 Based on a genome-wide screen using
373 microsatellite markers, significant linkage with LOD (logarithm of odds) scores 4.56-5.36 was
identified on chromosome 4q32-34. Although other centres failed to find a significant association at this
locus in European188 or North American189 FPC kindreds, the University of Washington group
subsequently claimed to have pinpointed PALLD, coding for palladin, a cytoskeleton scaffold protein.190
They demonstrated a variant (P239S) that segregated only with the affected members of the family linked
to 4q32-34, and they further presented evidence of PALLD overexpression in premalignant and cancerous
pancreatic tissue. However, significant doubt has been cast on the likelihood that PALLD is the
responsible gene for FPC, or at least that it is a significant cause of this cancer syndrome. Due to the
large number of candidate genes in the 4q32-34 locus, Pogue-Geile et al.187 were unable to screen all
candidates for mutations in Family X. Rather, they used a custom expression microarray to analyze RNA
extracted from whole tissue PanIN in one of the affected members of Family X and in another 10 sporadic
pancreatic cancers. PALLD appeared to have the highest expression, and it was based on this finding that
this gene was sequenced in Family X. However, Salaria et al.191 used immunohistochemistry of 177
pancreatic adenocarcinomas to show that palladin overexpression was primarily localized to non-
neoplastic stroma, with 96.6% of tumors demonstrating overexpression in the stroma and only 12.4% of
tumors had overexpressed palladin in neoplastic cells. Furthermore, three studies of Canadian, US, and
European families found no deleterious PALLD mutations in any other FPC families. Zogopoulos et al.192
genotyped the P239S variant in 51 familial cases, 33 early-onset cases, and 555 controls and found only
one familial case diagnosed at age 74 (they did not have DNA available for the other family members)
and in one 91-year-old unaffected control. Slater et al.193 sequenced the locus containing the variant in 74
FPC families and found no mutations. Finally, Klein et al.194 performed sequencing on 92% of the coding
region of the entire PALLD gene in 48 FPC cases and found no deleterious mutations.
Since the PACGENE linkage study has not yet been completed, it is not known if any other loci will be
reliably linked to FPC. Some of the challenges associated with applying linkage analysis to FPC are: (1)
small number of affected individuals per family and rapid mortality, precluding recruitment and limiting
the number of meioses available to perform the analysis; (2) penetrance of the FPC gene(s) is likely lower
than in previously mapped hereditary cancer syndromes, reducing the power of linkage analysis; (3) there
is increasing evidence for locus heterogeneity in the etiology of FPC. To date, only BRCA2 has been
12
shown to account for a substantial portion of familial cases, while all other identified genes appear to be
responsible for fewer than 5% of cases each. Locus heterogeneity is a significant confounder of linkage
analysis, and the lack of distinguishing phenotypic or pedigree characteristics among families makes it
very difficult to confidently separate cases that are likely caused by different genes; (4) reduction of
power in linkage analysis due to phenocopies. Given all these challenges, it is evident that other
techniques are needed in the effort to identify germline genetic alterations that predispose to FPC.
2. Copy Number Variation
2.1 Copy Number Variation – a novel paradigm Our understanding of the nature and degree of variation in the human genome has accelerated in the past
few years. Until recently, single nucleotide polymorphisms (SNPs) appeared to be the most frequent and
important source of genomic variation in humans. Significant efforts have been directed at identifying
and genotyping SNPs in different populations, and numerous disease association and linkage studies have
been conducted using SNPs as genomic markers. Yet, the development of higher-resolution genomic
scanning technologies has highlighted a previously under-recognized but clearly significant
submicroscopic structural variation in the human genome. Structural variants encompass copy-number
variants (CNVs) (defined as genomic segments which are present in variable copy numbers when
comparing two or more genomes) as well as inversions, novel sequence or mobile element insertions, and
translocations.195 The original definition of CNVs used 1,000 base pairs as a lower-limit size threshold, to
differentiate from smaller “insertions/deletions”. However, more recently the spectrum of CNVs has
been expanded to include any variants larger than 50bp, reflecting the identification of smaller variants
using sequencing technologies.195
Although CNVs at certain loci had long been recognized as polymorphisms in normal individuals (e.g.
alpha-globin gene family; Rhesus blood group) as well as the cause of genomic disorders (e.g. Charcot-
Marie-Tooth neuropathy type IA; Williams-Beuren syndrome; Potocki-Lupski syndrome),196 the
ubiquitous presence of CNVs in normal human genomes first became apparent with the publication of
two genome-wide studies in 2004.197-198 Since that time, more CNV-detection surveys, with continually
improving genomic coverage and resolution, have reported thousands of CNVs affecting all human
chromosomes in apparently normal individuals.199-249 (See Table 2) While the number of known SNPs
(~11 million) exceeds that of CNVs, the proportion of genomic sequence that is different between any
two genomes due to indels/CNVs is approximately 12-fold that of SNPs (1.2% vs. 0.1%).238
13
Table 2 - Summary of published studies reporting germline genomic copy-number variation in non-disease samples
Study (Year Published)
Population Primary CNV detection method
Reference genome
Source of DNA
Number of CNVs
Size of reported CNVs
Proportion of CNVs detected in > 1 sample
Number of CNVs confirmed within same study
CNV confirmation methods
Sebat et al. (2004)197
20 ethnically diverse individuals
aCGH: ROMA (85,000 probes, 35kb apart; Bgl II restricti-on enzyme)
12 samples (mostly from a single male sample); single ref per hybridizati-on experiment
Blood, sperm, cell lines
76 Average = 465kb
41% 11/12 FISH, hybridization to HIND III ROMA platform
Iafrate et al. (2004)198
55 ethnically diverse individuals (39 unrelated healthy controls + 16 individuals with known chromoso-mal imbalances)
aCGH: BAC array (2632 clones, 1Mb apart)
Pooled male or female normal samples
Whole blood + cell lines
255 Average = 150kb
40% 19/19 qPCR, FISH
Sharp et al. (2005)199
47 ethnically diverse individuals
aCGH: BAC array (2194 clones, targeting 130 segment-al duplicat-ion regions)
Single male sample
Cell lines
160 (represe-nt 119 regions if merge BACs <250kb apart)
Average BAC insert size = 164kb, some CNVs involve > 1 clone
55% 7/11 FISH
Tuzun et al. (2005)200
Single female NA15510 (fosmid library)
In-silico Fosmid end sequence pair mapping
NCBI reference human genome Build 35 (hg17)
n/a 297 Median = 15.7 kb (8-329kb)
n/a 16/57 33/40 7/11
BAC array (comparing 97 genomes) Sequencing of fosmid inserts PCR
Conrad et al. (2006)201
30 YRI trios + 30 CEU trios (HapMap)
In-silico: Assessm-ent of Mendeli-an inconsis-tencies in trios
n/a n/a 586 (396 in YRI; 228 in CEU)
YRI median = 8.5kb (0.5-1200kb) CEU median = 10.6 kb (0.3-404kb)
61% 92/105 qPCR, hybridization to custom high-density oligo array
McCarroll et al. (2006)202
269 HapMap individuals (4 ethnic groups)
In-silico: Analysis of Mendeli-an
n/a n/a 541 Median = 7 kb (1-745kb)
51% 90/541 FISH, allele-specific fluorescence measure, PCR, qPCR
14
transmis-sion errors, HW disequili-brium, null genotyp-es
Hinds et al. (2006)203
24 ethnically diverse individuals (Discovery panel)
aCGH: High-density oligo custom array
NCBI reference human genome (build not indicated)
Cell lines
215 Median = 0.75kb (70bp – 10kb)
67% 100/215 PCR
Locke et al. (2006)204
269 HapMap individuals
aCGH: BAC array (2007 clones, targeting 130 segment-al duplicat-ion regions)
Well-characteriz-ed single male sample (GM15724)
Cell lines
384 (in 222 regions, if merge BACs < 250kb apart)
Average = 436kb (145kb-1.4Mb)
67% 136/207 Custom high-density oligo array
Mills et al. (2006)205
36 individuals (different ethnic groups)
In-silico: Computa-tional alignme-nt of DNA reseque-ncing traces from SNP studies to reference genome
NCBI reference human genome Build 35 (hg17)
n/a 294,498 2bp-9989bp
183/189 PCR, sequencing
Redon et al. (2006)206
270 HapMap individuals (4 ethnic groups)
aCGH: Whole Genome Tiling Path array (26,574 BACs) + SNP array intensity comparison: 500K SNP platform
Single male reference (NA10851) for aCGH; pairwise comparison between all samples for 500K
Cell lines
1447 merged CNVRs (913 on WGTP platform; 980 on 500K platform)
Average = 341kb (WGTP) 206kb (500K SNP)
~50% 173/1447 43% of all CNVs
Locus-specific quantitative assay Replicated on both platforms
Simon-Sanchez et al. (2007)207
276 well-phenotyped Cauasians, from NINDS study
SNP array intensity comparison: 1)109,365 gene-centric SNP array
Reference genotyping clusters (used in Illumina-specific CNV-detection algorithms)
Cell lines
340 ~20kb – 3Mb (for non-heteros-omic CNVs)
5 13/24
qPCR replication of CNV detection in DNA from whole blood
15
2) 300K SNP array
Wong et al. (2007)208
95 samples (include healthy blood donors, cancer screening program participants, 16 distinct ethnic groups)
aCGH: BAC array (26,363 clones)
Single male reference
Whole blood, cell lines
3654 >40 kb 22% detected in >2 samples
265 Confirmed in 5 cases on oligo array
Levy et al. (2007)209
Single diploid genome of Craig Venter
In-silico: Random shotgun sequenc-ing, compari-son to NCBI reference genome aCGH: 244K oligo array; 385 oligo array; 2 different SNP array platforms
NCBI reference genome Build 36 for one-to-one mapping of insertions/ deletions Single male reference (NA10851) for aCGH and SNP array compariso-ns
Whole blood
919,584 indels (600 ≥ 1kb in size) + 62 CNVs
Indels = 1-82,711 bp (average 2.4-11.7bp) CNV (~8kb-2Mb)
n/a 37/40 indels
Comparison to fosmid clones from 8 other individuals
Korbel et al. (2007)210
2 previously analyzed female subjects: NA15510 (presumed European ancestry) and NA18505 (YRI)
In-silico: Paired-end sequence mapping (generat-ed by next-generati-on massive parallel sequenc-ing)
NCBI reference human genome Build 36
Cell lines
1175 total (422 in NA15510; 753 in NA18505)
Majority <10kb, but variants up to >1Mb detected
89% of 249 variants tested in individuals from 4 population
132/261 (NA15510) 328/616 (NA18505) 95 (NA15510) 97 (NA18505) 31/48 (NA15510)
PCR (+ sequencing breakpoints in a subset of amplicons) Also present in Celera assembly aCGH with oligo tiling arrays comparing NA15510 to NA18505
Pinto et al. (2007)211
506 controls of North German descent (PopGen study)
SNP array intensity comparison: 500K SNP array
Multiple references
Cell lines
1023 CNVRs (430 high-confiden-ce; i.e. detected by ≥ 2 algorith-ms)
Average size of “high-confiden-ce” CNVRs = 369kb
4% of CNVRs in >2% of population
217/1010 Overlap with CNVRs called in 269 HapMap samples analyzed with identical algorithms to PopGen
Wang et al. (2007)212
112 HapMap individuals (4 ethnic groups)
SNP array intensity compari-son: 550K
Reference genotyping clusters (used in Illumina-specific
Cell lines
2633 Average 31.5kb-61.2kb (depend-ing on ethnic
52.6-74.8% of CNVs were also detected in parents
Assumes high heritability of CNVs, compares to CNVs called in parents
16
SNP array
CNV-detection algorithms)
group) 3 CNVs
PCR, re-sequencing of breakpoints
Zogopoulous et al. (2007)213
1190 controls from Ontario Familial Colorectal Cancer Registry (Canada); mostly Caucasian
SNP array intensity compari-son: 100K and 500K arrays
Multiple references
Blood 578 CNVRs
Average = 408kb (12bp – 4.5Mb)
< 7% are detected in >1% of population
4 qPCR
deSmith et al. (2007)214
50 males (north French origin)
aCGH--2-stages: 1) 185K oligo genome-wide array (in 35 individu- als) 2) custom high-density 244K array
Pooled references for 185K array; single female reference (NA15510) for 244K array
Blood 9244 multi-probe CNVs (1469 CNVRs) 6089 single-probe CNVs (4705 CNVRs)
Median 4.4kb
45% 90-95% of common CNVRs detected on 185K array 21
Replication on 244K array PCR, MLPA
Jakobsson et al. (2008)215
485 individuals, from 29 populations (Human Genome Diversity Project)
SNP array intensity comparison: Illumina Infinium Human HapMap 500 Beadchip
Reference genotyping clusters (used in Illumina-specific CNV-detection algorithms)
Cell lines
3552 (map to 1428 loci)
Average = 82.7kb (deletion) 130.4kb (duplication) (2kb-998kb)
Perry et al. (2008)216
30 HapMap individuals (4 populations)
aCGH: Custom oligo array (470,163 probes) targeting CNVs previously detected by Redon et al. (2006)
Single male reference (NA10851)
Cell lines
2664 (map to 1153 loci)
15-33% smaller CNVs than detected by Redon et al. (2006) in same sample
50% 23/51 Sequencing over breakpoints
Takahashi et al. (2008)217
80 healthy Japanese offspring of atomic bomb survivors
aCGH: 2238 BAC custom array
One male and one female Japanese
Cell lines
251 (mapping to 30 regions)
Average: 120kb (deletion) 160kb (duplication)
53% 14/14 rare CNV regions
qPCR, FISH, PGFE-Southern Blot, sequencing)
Wheeler et Single In-silico: (sequence Blood 163,608 (2bp- n/a Excellent aCGH
17
al. (2008)218 diploid genome of James Watson
Next-generati-on sequenc-ing, compari-son to NCBI reference human genome + aCGH: 244K oligo array + 2.1 million probe array (3 experim-ents with 2 different referenc-es)
mapping) NCBI reference human genome Build 36 (aCGH) a) standard Caucasian male ref and b) NA10851
indels (by sequence compari-son) 23 CNVs (by aCGH)
38,896bp) 26kb-1.6Mb
concordan-ce in CNV calls when using same reference on different oligo arrays (data not shown)
experiments against NA10851 on 244k and 2.1 million probe arrays
McCarroll et al. (2008)219
270 HapMap
SNP microarray (Affy6.0)
270 HapMap
Cell lines
3048 CNVs (1320 CNVRs)
50% 27 loci qPCR
Cooper et al. (2008)220
9 HapMap SNP microarr-ay (Illumina)
Reference genotyping cluster
Cell lines
368 64-67% Fosmid sequence alignment date
Kidd et al. (2008)221
8 HapMap samples (4 ethnic groups)
In-sliico: Fosmid-end sequence pair mapping
NCBI reference human genome Build 35
Cell lines
7184 predicted non-redunda-nt CNVs
>6kb 50% 1471 MCD analysis (multiple complete restriction enzyme digest); High-density oligo arrays and SNP arrays; Correlation to SNP genotyping data for 130 deletions; Full-length sequencing of fosmid clones
Bentley et al. (2008)222
Single YRI male (NA18507)
In-silico: Paired reads of massive-ly parallel sequenc-ing
NCBI reference human genome Build 36
Cell line
4116 n/a
Wang et al. (2008)223
Single Asian male (Han Chinese)
In-silico: paired-end reads of massive-ly
NCBI reference human genome Build 36
blood 2474 Median = 492 bp
n/a
18
parallel sequenc-ing
Gusev et al. (2009)224
3000 individuals from Kosrae island (Micronesia)
In-silico: Uses novel algorithm to identify gaps in “identity-by-state” stretches of SNP genotyp-es
215 52 Used other computational methods and compared to previous reports
Itsara et al. (2009)225
2493 SNP microarr-ays (Illumina)
Cell lines; blood
13,843 (map to 3476 CNVRs)
77% Cross-platform comparison (to CGH array)
Shaikh et al. (2009)226
2026 (1320 Caucasian; 694 African-American; 12 Asian-American)
SNP microarr-ay (Illumina HumanHap550)
Reference genotyping clusters (used in Illumina-specific CNV-detection algorithms)
Blood 54,462 (non-unique CNVs map to 3272 CNVRs)
Median = 8kb
77.8% 16/20 1753/2409 19/21
qPCR array-based comparison (affy vs illumina) comparison to previously published data of a HapMap samples (Kidd et al)
Kim et al. (2009)227
Single Korean male (AK1)
In-silico: paired-end reads of massive-ly parallel sequenc-ing and end-sequenc-es of BAC clones aCGH: custom 24M microarr-ay; SNP arrays
NCBI reference human genome Build 36 Reference for CGH arrays not identified
Blood, sperm
315 277bp-2Mb
n/a Sequence data complement-ed microarray data
Ahn et al. (2009)228
Single Korean male
In-silico: paired-end reads of massive-ly parallel sequenc-
NCBI reference human genome Build 36
Blood 2920 0.1-100Kb
n/a 2344 Detected in DGV (no direct confirmation)
19
ing
Matsuzaki et al. (2009)229
90 HapMap YRI samples
aCGH: Custom oligonuc-leotide microarr-ays
Signal compared to normalized signal of all 90 samples
Cell lines
6578 Median = 4.9kb
3850 31/40 qPCR (also compared to findings of previous studies – 87-99.97% agreement))
McKernan et al. (2009)230
Single YRI male (NA18507)
In-silico: ABI SOLiD paired-end and split-reads (ligation-based sequenc-ing assay)
NCBI reference human genome
Cell line
565 2-937kb n/a n/a n/a
McElroy et al. (2009)231
385 African Americans and 435 White Americans
SNP array (Affy 500K)
50 African Americans females (derived from blood)
Cell lines + Blood
1362 in African America-ns + 1972 in White America-ns (map to 412 African-American unique CNVRs; 580 White-unique CNVRs; 76 shared CNVRs)
Mean duplicat-ion = 827kb; mean deletion = 703kb
174 CNVRs
3 loci qPCR
Conrad et al. (2009)232
Discovery in 40 females (19 CEU + 20 YRI + 1 diversity panel); genotyping in 450 HapMap
Discove-ry: Nimble-Gen 42M arrays Genotyp-ing: Custom Agilent 105k arrays; SNP array (Illumina Infinium Human660W)
Discovery: NA10851 Genotyping: pooled DNA of 10 European samples (9 males + 1 female)
Cell lines
11,700 Median = 2.7kb
49% 79/99 (qPCR) 15% FDR (microarray)
qPCR; other microarrays
20
Alkan et al. (2009)233
3 individuals
Read-depth of massive-ly parallel sequenc-ing reads
Reference human genome
Cell lines
725 97% of all variants
17/25 aCGH FISH
Lin et al. (2009)234
813 Taiwanese individuals
Illumina 550K Bead-Chip
Reference genotyping cluster
Blood 4452 (map to 1025 CNVRs)
Mean = 497kb
365 CNVRs
279/365 CNVRs
Identified on Affy 500K array
Li et al. (2009)235
1000 Caucasians and 700 Han Chinese
SNP array (Affyme-trix 500K)
Half the samples were used as references for the other half and vice-versa
Blood 2381 Median = 195kb
27.6% 680/985 overlap DGV
Compared to DGV No experimental validation
Altshuler et al. (2010)236
1184 (HapMap3-11 populations)
SNP arrays (Affyme-trix 6.0 and Illumina 1M arrays)
Reference genotyping clusters
Cell lines
856 Median = 7.2 kb
All CNPs detected in ≥ 1% of population
n/a FDR of algorithms determined by comparing to CGH data for 34 individuals
Ju et al. (2010)237
Single Caucasian male (HapMap NA10851)
Data from previous aCGH studies that used NA10851 as reference + read-depth of NA10851 massive-ly parallel sequenc-ing
73 individuals (from Conrad et al, 2010 and Park et al. 2010)
Cell line
1309 Median = 2.7kb
n/a n/a n/a
Pang et al. (2010)238
Single diploid genome of Craig Venter
In silico: de novo assembly comparison; paired-end reads; split-reads aCGH: Agilent 24M + Nimble-Gen 42M arrays SNP arrays: Affyme-
NA15510 for Agilent 24M and NimbleGen 42M arrays
Whole blood
808,179 insertions or deletions (2641 ≥ 1kb)
(1-1.7Mb)
n/a 89/96 SVs identified by sequence analysis 20/25 CNVs identified by microarrays 11,140 SVs in common to this study and Levy et al
Compared to SVs called in previous analysis of same genome (Levy et al) PCR/qPCR
21
trix 6.0 + Illumina 1M
Park et al. (2010)239
30 females (10 Korean; 10 HapMap Chinese; 10 HapMap Japanese)
aCGH: 24M custom Agilent arrays
Single male reference (NA10851)
Cell lines
20,099 (map to 5177 loci)
Median = 2.7kb (438bp-1.1Mb)
39% 106/116 loci
qPCR
Teague et al. (2010)240
NA15510, NA10860, NA18994
Optical Mapping (single-molecule restriction mapping)
NCBI reference human genome Build 35
Cell lines
5416 3kb-megabases
>1/3 all variants
42-61% (depends on platform being compared against)
Compared to fosmid-end sequencing, paired-end sequencing, SNP array (Affy6.0), tiling arary CGH
Kidd et al. (2010)241
9 HapMap individuals
Identifyi-ng fosmid-end clones that did not map to reference genome
NCBI reference human genome Build 35
Cell lines
2363 novel insertion sites (corresp-ond to 720 loci)
Median = 1kb (1-20kb)
192 loci Sequencing, genotyping
22
Kidd et al. (2010)242
17 individuals
Capillary end sequenc-ing of fosmid clones
NCBI reference human genome Build 35
Cell lines
973 n/a n/a n/a n/a
Schuster et al. (2010)243
5 individuals
Read depth aCGH
NCBI reference human genome
Blood 187 n/a n/a n/a n/a
Yim et al. (2010)244
3578 Korean individuals
SNP array (Affy5.0)
NA10851 + pooled 100 Korean females
Blood 144207 (map to 4003 CNVRs)
Median 18.9kb
656 CNVRs in ≥ 1% of samples
14/16 loci qPCR
Gayan et al. (2010)245
801 Spanish individuals
SNP array (Affyme-trix 250 NspI array)
25 female samples from other studies
Blood 11,743 Median 150.7kb
623 CNVs present in >2 individuals
519 CNVs previously described
Comparison to DGV (no experimental validation)
The 1000 Genome Project Consortium (2010)246; Mills et al. (2011)247
Three pilots: (1) 3 trios from 2 families – deep sequencing (avg 42x) (2) 179 unrelated – low depth (2-6x) (3) deep sequencing
Paired-end mapping, read-depth analysis, split-read analysis, and sequence assembly of massive-ly parallel
NCBI reference human genome
Cell lines
14,327 50bp - ~1Mb
<10% FDR PCR aCGH
23
of exons of 1000 genes in 697 individuals (avg >50x)
sequenc-ing
Chen et al. (2011)248
2789 individuals from three European populations
SNP array (Illumina Infinium Human-Hap 300)
Reference genotyping cluster
Blood 4016 (map to 743 CNVRs)
Mean = 205kb
406 649 CNVRs
Overlap with reported CNVs in DGV (no experimental validation done)
Moon et al. (2011)249
Discovery: 100 Korean individuals Genotyping: 8842 Korean individuals
aCGH array (NimbleGen 3 x 720K) + SNP array (Affy 5.0)
NA10851 Blood 8779 (576 CNVRs chosen for frequen-cy analysis)
Median length of 576 CNVRs = 113kb (1kb-4.56Mb)
807 CNVRs (576 chosen for frequency analysis in larger sample set)
66.7%-100% positive predictive values for 20 randomly chosen CNVRs
TaqMan assays
Studies listed in chronological order by publication date. CGH, comparative genomic hybridization; oligo, oligonucleotide; FISH, fluorescence in situ hybridization; ROMA, representational oligonucleotide microarray analysis; qPCR, quantitative polymerase chain reaction; BAC, bacterial artificial chromosome; YRI, Yoruba in Ibadan, Nigeria; CEU, Utah residents with ancestry from northern and western Europe; NCBI, National Centre for Biotechnology Information; PGFE, pulsed gel field electrophoresis; MLPA, multiplex ligation-dependent probe amplification
2.2 CNV Databases The Database of Genomic Variants (DGV) (http://projects.tcag.ca/variation/) was founded in conjunction
with the publication of the first few CNVs in 2004 by Sebat et al.197 and Iafrate et al.198, to catalogue
former and future discoveries of structural variants in the human genome. Curated by The Centre for
Applied Genomics (TCAG) in Toronto, the objective of this database is to summarize published data on
structural variation detected in healthy control samples, and it is periodically updated as new data
becomes available.198 At this time, the DGV presents data from each study separately, only merging
overlapping CNV calls (in the same direction) across samples within the same study. Moreover, calls
made by different platforms in the same study are also presented separately. Regions are displayed in
24
relation to the human genome reference assembly (Build 35/May 2004 or Build 36/March 2006 or
GRCH37/Feb 2009). The latest version of the DGV (updated Nov 02, 2010) contains 101,923 entries
mapped to the human genome Build 36, corresponding to 66,741 CNVs >1kb (mapping to 15,963
genomic loci), 34,229 InDels (relative gains or losses between 100bp-1000bp in size), and 953 inversions.
Forty-two published articles are cited as the source of data in the DGV. A beta-version of the database
has been released (October 2011) which provides access to data in partner databases at European
Bioinformatics Institute (DGVa) and National Centre for Biotechnology Information (dbVar). The DGVa
repository has been the primary supplier of data to the DGV. dbVar includes structural variants from
multiple species and also includes data from clinical studies (non-healthy populations). Future
submission of CNV data will be managed by DGVa and dbVar, while the role of DGV will be to
manually curate and visualize selected studies to allow better interpretation of the clinical significance of
CNVs.
Clinically significant CNVs (mainly those linked to genomic syndromes) are catalogued in DECIPHER250
(DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources,
https://decipher.sanger.ac.uk) and ECARUCA251 (European Cytogeneticists Association Register of
Unbalanced Chromosome Aberrations, http://umcecaruca01.extern.umcn.nl:8080/ecaruca/ecaruca.jsp).
In addition, there are several data sources for copy number alterations that are detected in tumors or
cancer cell lines. Those include The Wellcome Trust Sanger Institute Cancer Genome Project252
(http://www.sanger.ac.uk/cgi-bin/genetics/CGP/conan/search.cgi) and the Pancreatic Expression
Database253 (http://www.pancreasexpression.org/).
2.3 Discovery and Genotyping of CNVs A variety of platforms and algorithms have been applied for CNV detection, with a wide range of
resolution, coverage, and signal-to-noise ratio, resulting in significant non-overlap in the CNVs detectable
between different platforms used to study the same samples. The earliest studies mapping CNVs in the
human genome were based on flourescent in situ hybridization (FISH) and spectral karyotyping and were
limited in resolution to variants of large size (>500kb), most of which were associated with disease.254
Later, genome-wide CNV mapping became possible with array comparative genomic hybridization
(aCGH), a technique involving competitive hybridization of flourescently labeled DNA samples from two
sources on a single array that contains immobilized target DNA sequences and use of computational
algorithms to analyze the hybridization ratio of the test and reference samples. The DNA targets on the
arrays originally comprised Bacterial Artificial Chromosome (BAC) clones but later were made of long
oligonucleotides.195 Early CGH arrays were of low resolution (typical CNV size detectable by these
platforms was greater than 100kb), and they significantly overestimated the true number of bases affected
25
by CNVs.197,198 Later, high density oligonucleotide tiling CGH microarrays became available, allowing
more accurate determination of CNV breakpoints and detecting many more CNVs of smaller size.232 One
important consideration in the use of CGH arrays for CNV detection is the reference sample. In any
given aCGH experiment, it is not possible to distinguish between a copy number loss on the test sample
versus a gain on the reference sample in the same region (or vice-versa), since both scenarios would
generate the same hybridization signal ratio. Moreover, a loss or gain present in both samples would be
entirely missed (since the signal ratio would appear to be 1). Ideally, the reference sample genome should
be well characterized using a variety of methods, and the same reference sample should be hybridized
against all test samples in an experiment to allow better comparison of the results. To date, several
individuals have had their genomes extensively mapped and have been used repeatedly in CNV studies
(HapMap NA10851, NA18507, NA15510).
Another type of microarray used for CNV detection is the SNP array. Originally designed to genotype
SNPs for genome-wide association studies, these arrays contain multiple probes corresponding to each
selected SNP, and a single test DNA sample is hybridized to each array. Various computational
algorithms have been developed to analyze the hybridization intensity data to estimate copy number at
each SNP location, and the two primary methods are the Hidden-Markov-Model and Segmentation.
Earlier SNP arrays had lower resolution and coverage for CNV detection due to the nature of SNP
selection (focused on “tag SNPs” with minimal allele frequencies of ≥ 1% to maximize coverage of the
genome while minimizing cost, and avoiding SNPs in regions that increase genotyping error due to
violation of Hardy-Weinberg Equilibrium or Mendelian inheritance errors).206,213 More recent SNP arrays
from Affymetrix and Illumina not only have a higher density of SNPs distributed genome-wide
(approximately 1 million) but also include probes for known CNV regions, hence allowing discovery of
smaller CNVs and the genotyping of polymorphic CNVs.219,220 Compared to CGH arrays, SNP arrays
have the added advantage of SNP genotype information which can be used to detect CNVs (by analyzing
“B-allele frequency”, which represents the proportion of total allele signal that is represented by a single
allele) as well as provide information on loss-of-heterozygosity (LOH) and uniparental disomy (UPD).
Both CGH and SNP microarrays are limited by detecting CNVs that map to regions known in the
reference genome that was the basis for the microarray build. Moreover, neither of those platforms
distinguishes between tandem and interspersed duplications, and they tend to be more sensitive in
detecting deletions than duplications (due to a higher signal ratio differential between 2 and 1 copies vs. 2
and 3 copies, for example).195 Furthermore, even the highest resolution arrays available lose sensitivity in
genome-wide detection of CNVs smaller than 10kb.219 Sequence-based methods have become used
increasingly to bridge the gap in mapping the full extent of variability of the genome. Even in the early
days of CNV discovery, several CNV papers were published based on mining of genotyping errors 219-220,
26
fosmid paired-ends200,221, and paired massively parallel sequencing of paired-ends of 3-kb fragments.210
Since then, many more studies have utilized the data from next-generation sequencing technologies to
identify CNVs, although there remain substantial bioinformatic challenges associated with analyzing this
data. The four main methods of using sequencing data to identify CNVs are255: (1) identifying read-pairs
whose mapping span is inconsistent with the reference genome; (2) identifying regions with significantly
increased or reduced read-depth compared to the distribution of read-depth across the (presumed diploid)
genome; (3) identifying “split-reads”, whereby there is a break in the alignment of a read relative to the
reference genome; (4) sequence assembly. To date the most commonly used method has been read-pair
mapping. All four approaches are limited in their sensitivity, specificity, and breakpoint accuracy
depending on read length, insert size, and physical coverage.
Future direction in CNV detection includes nascent technologies like optical mapping256, nanochannel
flow cells257, and emulsion picolitre droplet PCR258 that are being developed to allow high-throughput
detection of CNVs on an individual cellular and/or molecular level.
Multiple studies have demonstrated significant non-overlap between different platforms and algorithms
when analyzing the same samples.211,259 Given the variability in sensitivity and specificity of CNV
detection by the various platforms to date, validation is essential. Validation of detected CNVs has taken
two main forms in most studies: detection of the same (or overlapping) variants by different studies, and
replication within the same study (different array platform, PCR, qPCR, FISH, other experimental
methods). Overlap with regions identified in previous studies lends support to the variability of those
specific regions in the human genome, although many of the non-overlapping regions are also real (as
demonstrated by other replication methods). Similarly, replication on different platforms or with different
calling algorithms adds validity to detected CNVs in any tested sample, but regions identified by a single
approach can also be real. Experimental replication of CNVs provides the highest level of validation, but
those methods are often time-consuming and not optimized for high-throughput testing of multiple
regions and samples. As a result, most studies experimentally validated only a subset of their detected
CNVs (Table 2). However, high-throughput validation techniques have become available (e.g.
Sequenom©)260, so most CNVs published in the future should be confirmed more readily.
While most early CNV studies focused on variant discovery, determination of disease association with
specific CNVs requires accurate genotyping of the CNVs of interest. A number of techniques have been
employed for genotyping, including PCR based (e.g. PCR across breakpoints; quantitative PCR;
multiplex methodologies that assay multiple loci at once), SNP-array based (e.g. customizing arrays using
Illumina GoldenGate© assay for specific CNVs; using tag SNPs to impute common CNVs that are in high
linkage disequilibrium (LD) with the tag SNP), aCGH-based (e.g. customized high-density tiling arrays
27
with probes for known CNVs), and sequencing-based (e.g. building a library of breakpoints discovered
and validated from previous sequencing-based studies and comparing future de novo sequences against it
to rapidly genotype CNVs in those locations; calibrating aCGH data using sequencing-based data to
obtain absolute copy numbers).195 Accurate genotyping is easier for deletions than duplications, and is
particularly challenging in multi-allelic regions.
2.4 Structure and mechanism of CNV formation Several mechanisms of genomic rearrangement have been identified predisposing to duplications and
deletions, driven by structural motifs in the genome. One of the earliest observations in CNV surveys
was the association of CNVs with segmental duplications.197,198,199,200,206,208,209,212 Segmental duplications
(also called low-copy repeats or duplicons) are genomic regions ≥ 1kb in size and with ≥ 90% sequence
homology, present in multiple copies and covering approximately 5% of the human genome.261
Segmental duplications, particularly those with 97% or greater sequence identity and less than 10Mb
distance between them, can cause misalignment of homologous chromosomes or sister chromatids and
mediate non-allelic homologous recombination (NAHR), thus producing genomic duplications and
deletions of regions flanked by the segmental duplications.262 In addition, segmental duplications
themselves may be CNVs if they are not yet fixed in the human genome and they vary in copy number
between individuals.199 Most recurrent CNVs appear to be caused by NAHR mediated by segmental
duplications.
However, not all CNVs are associated with segmental duplications and other mechanisms have been
implicated in CNV formation. Different repetitive elements found in the breakpoint junctions of CNVs
include Alu SINES, L1 LINES, and long terminal repeats.210,247 Other mechanisms associated with CNV
formation include non-homologous end-joining (NHEJ), retrotransposition events (otherwise known as
mobile element insertion, or MEI), Variable Nucleotide Tandem Repeat (VNTR) expansion/contraction
events, replication Fork Stalling and Template Switching (FoSTeS), and microhomology-mediated break-
induced replication (MMBIR).263 In some cases, a parental inversion may predispose to de novo
unbalanced variants in the children, such as in the example of 17q21.31 microdeletion syndrome.264
Multiple studies have noted certain genomic locations as “hotspots” for CNVs, including 6cen, 8pter,
15q13-14, 11q11, 19q13, and 7q11.197,212,210,221 Some regions, such as 8p23, appear to be hotspots for
recombination as well as sequence variation, containing an enrichment of both structural variants as well
as SNPs205,221. In a recent report analyzing next-generation sequencing data for 1000 Genomes project,
structural variants were found to cluster into hotspots by the mechanism of their formation, with VNTR
clustering near the centromeres and NAHR near the telomeres247. Possible explanations for genomic
28
variation hotspots include: older evolutionary age of the target genomic segments; biological functional
effect of involved regions driving selective pressure to maintain diverse alleles; or complete lack of
functional importance and selective pressure.205,247
2.5 Population Genetics of CNVs Population genetics of CNVs are somewhat more complex than that of SNPs. Both forms of variation
may occur de novo or be inherited, but the de novo mutation rate for CNVs has been estimated to be 2-4
orders of magnitude greater than for single base mutations. Certain genomic regions are indeed
susceptible to recurrent rearrangements due to their structure (e.g. flanked by segmental duplications), but
when Mendelian inheritance was specifically investigated most common CNVs were indeed inherited
from a parent.219
Different studies have been differentially powered to detect common versus rare CNVs, thus yielding
conflicting data on the proportion of CNVs in the genome that are polymorphic (>1%). Earlier SNP
arrays and lower-resolution CGH arrays tended to be biased against common CNVs, so the majority of
CNVs identified using those platforms were rare in the general population. However, higher resolution
SNP arrays (such as Illumina 1M and Affymetrix 6.0) as well as very high-density CGH custom arrays
succeeded in detecting and genotyping a significant proportion of common CNVs over 1kb, and it is
evident that most of the variation between any two individuals at that resolution is due to common CNVs
that obey Hardy Weinberg Equilibrium.219,232 Sequencing based technologies have been identifying more
CNVs at a smaller size, and the data is a mix of rare and common CNVs.247
Most common CNPs are biallelic (with a bias for detecting deletions on the platforms used), and most of
those were found to be tagged well by SNPs of similar frequencies, suggesting that they are ancestral
events.219 CNPs that are in strong LD with tagging SNPs can be easily genotyped in association studies,
thus facilitating their study. However, SNP “taggability” depends on the frequency as well as density of
nearby SNPs, meaning that some CNVs of lower frequency or present in regions not populated by many
SNPs will need to be genotyped directly. The same is true for complex CNVs or CNVs that have
multiple copy number alleles, as those tend to be in poor LD with nearby SNPs as well.
Studies in populations of different ethnicities have suggested population differentiation in the frequency
of some CNVs, and some CNVs do appear to be population-specific.227-229,232 In keeping with the “out of
Africa” hypothesis, African populations have been found to have a higher number of rare or low-
frequency CNVs than non-African populations.229 These findings emphasize the importance of matching
the ethnicity of cases and controls in association studies to minimize spurious associations of population-
specific CNVs with disease.
29
2.6 Phenotypic impact of CNVs The earliest known CNVs, usually large genomic deletions and duplications often encompassing many
genes, were invariably linked to significant genomic disorders. With the discovery of ubiquitous CNVs
in healthy controls, interpreting the functional significance of such genomic alterations became more
complex. Of note, many studies have observed a general bias against genic CNVs in general, and large
genic deletions in particular265,232, suggesting that genomic alterations negatively impact fitness and
undergo purifying selection. Interestingly, there is also some evidence of positive selection (or potentially
reduced purifying selection266) acting on some genes, such as the salivary amylase gene AMY1 which
appears in higher copy number in humans than in other primates and which is found in higher copy
number in human populations with high-starch diets relative to populations with traditionally low-starch
diets.267 Alternatively, many common CNVs have been identified at high frequencies in all human
populations and appear to have only a modest effect, if any, on phenotype.
Early CNV surveys identified a large number of genes as copy number variable, but care must be
exercised in interpreting those results given the propensity of those early platforms to overestimate the
size of CNVs, and hence the actual number and identity of involved genes reported in earlier studies may
be inaccurate. However, even more recent studies, with the power to identify smaller CNVs with more
accurate breakpoints, have detected thousands of genes that are affected at least in part by deletions or
duplications. For example, Pang et al.238 reported an extensive analysis of the diploid genome of Dr.
Craig Venter based on multiple microarray and sequencing platforms, and they identified 189 genes
completely encompassed by gains or losses and an additional 4,867 genes whose exons were impacted by
CNVs. While they did find an overall paucity of CNVs affecting genes associated with autosomal
dominant or recessive diseases, cancer syndromes, imprinted and dosage-sensitive genes, 573 of the CNV
genes were in the Online Mendelian Inheritance in Man (OMIM) database. Conrad et al.232 used a
discovery cohort of 20 CEU and 20 YRI HapMap individuals to detect common CNVs using a high-
density CGH array, then genotyped 450 HapMap samples at approximately 5,000 common CNVs. On
average, they found 445/1,098 CNVs overlapping 622 genes between any two individuals, and they
identified 2,698 genes affected by CNVs in the total sample set. Over half of partial gene deletions were
predicted to induce frameshifts, and 267 genes appeared to be affected by unambiguous loss of function
CNVs. Genes affected by CNVs appeared to be enriched for extracellular functions such as cell adhesion,
recognition, and communication, whereas they appeared to be biased away from intracellular functions
such as metabolic and biosynthetic pathways. These results extended those of previous as well as
subsequent CNV surveys, which also reported enrichment of immune and defense responses as well as
neurological system processes.239,268,247 Those latter functions are also proposed to have been involved in
the adaptive differentiation of humans and chimpanzees.269
30
The exact contribution of CNVs to gene expression variability, and how they relate to SNPs, is unclear.
Stranger et al.270 interrogated the contribution of CNVs detected by Redon et al.206 on BAC-CGH array
and Affymetrix 500K array to gene expression variability in lymphoblastoid cell lines from 210 HapMap
samples (within a 2Mb CNV-gene), and found that 17.7% of 1,061 genes with expression variability were
associated with CNVs, with over half of the associations appearing to be long-range (i.e. the CNV did not
overlap the gene whose expression it appeared to impact). While 83.6% of variability was attributed to
SNPs, only 1.3% of genes were associated with both CNVs and SNPs. Schlattl et al.271 extended this
analysis of CNV-expression association by comparing normalized transcriptome data for lymphoblastoid
cell lines (LCLs) from 60 CEU and 69 YRI HapMap samples to CNV data published in the same samples
on multiple platforms (high-resolution tiling CGH array232, high-resolution SNP array219, and next-
generation sequencing data247). By concentrating on common CNVs and restricting to effect range of
200kb or less, they found a significant association between CNVs and the expression of 110 genes.
Despite an abundance of deletions in the CNV set, Schlattl et al.271 found enrichment of duplications
among CNVs associated with variable expression, suggesting purifying selection acting against deletions
that impact gene expression. While comparing results from this analysis to previously published studies,
the authors were able to confirm several CNV-gene expression associations, including 6/13 that were
identified by Stranger et al.270 within the same effect range. Most of the CNV associations (70%)
occurred without overlap of the CNV with the respective gene, although the range of effect appeared to be
<100 kb in most cases. Interestingly, several intronic deletions were associated with gene expression, but
expression was decreased in only half of the cases, whereas it was increased in the other half. Such a mix
of positive and negative CNV effect on expression was also observed for the CNVs which did not directly
overlap genes. CNVs that overlapped exons or completely encompassed CNVs usually affected
expression in the same direction as the copy number change. Unlike Stranger et al.270 , Schlattl et al.271
found that most CNVs associated with gene expression (70%) overlap previously published SNP-
expression associations. This discrepancy in overlap likely reflects the differences in CNV characteristics
detectable by earlier platforms (more rare than common CNVs, biased away from common SNPs) relative
to the platforms used by Schlattl et al.271 Conrad et al.232 proposed that since most common genotyped
CNVs were well tagged by SNPs, it would be expected that SNP-based genome-wide association studies
would have already screened most common CNVs for association with common diseases. Based on the
finding by Conrad et al.232 that less than 5% of trait-associated SNPs in 279 publications were in linkage
disequilibrium > 0.5 with a nearby CNV and the additional finding by the Wellcome Trust Case Control
Consortium that only three CNV loci reliably associated with one or more of eight common diseases (all
of which are tagged by SNPs that were previously detected in genome-wide association studies), the
authors of those papers argued that common genotyped CNVs do not explain a significant proportion of
heritability in common diseases. Nonetheless, the findings of Schlattl et al.271 indicate that a non-
31
negligible proportion of CNVs associated with gene expression variability do not link to SNPs, and
moreover 57% of genes with expression associated with CNVs were found to have a greater correlation
with their most strongly associated CNV than with any nearby SNP. This was especially true for CNVs
that overlap exons (10/10). Other studies of CNVs in mice, rats, and Drosophila have observed similar
impact of CNVs on gene expression.272-274
Many diseases have been associated with CNVs. Recurrent de novo microdeletions and
microduplications are linked to many sporadic genomic disorders such as Williams-Beuren syndrome,
Angelman syndrome/Pradel-Willi syndrome, Charcot-Marie-Tooth disease 1A, and idiopathic mental
retardation.195 Rare CNVs (de novo or heritable) have been associated with neuropsychiatric disorders
such as autism spectrum disorder and schizophrenia; neurodegenerative diseases such as Parkinson
Disease275; and metabolic disorders such as obesity276, among others. Common heritable CNVs have
been associated with autoimmune and infectious diseases such as Crohn’s disease277, rheumatoid
arthritis278, diabetes mellitus278, psoriasis279, lupus280, and susceptibility to HIV infection281. Both rare as
well as common CNVs have also been associated with susceptibility to cancer, as discussed below.
Determining the pathogenicity of CNVs, and delineating the responsible gene(s) or genomic elements,
can be challenging. CNVs may affect phenotype in a number of ways, including: increasing or
decreasing copy of dosage-sensitive genes; disrupting genes or producing fusion genes; position effect;
unmasking recessive alleles; affecting communication between alleles on homologous chromosomes.264
The effect of CNVs is also moderated by variable penetrance and expressivity.264 Some CNVs have been
associated with a wide range of phenotypes (e.g. 1q21.1 has been associated with dysmorphic features,
cardiac abnormalities, learning difficulties, mental retardation, autism, and schizophrenia)282; this may
reflect ascertainment bias due to the study design (e.g. phenotype-driven vs. genotype-driven)264 but may
also reflect variability in expressivity. Some studies have also demonstrated buffering effect in cells,
whereby the observed expression level of a given gene does not correspond linearly to the expected level
based on copy number.271,272 It should be noted that in addition to copy number, the phase information
and genomic context of CNVs is also important for understanding the potential effect of the variant.264
Other challenges in CNV research include distinguishing germline from somatic alterations. Many
studies used DNA from immortalized lymphoblast cell lines, and it has become apparent that some
structural variants occur exclusively in or may be amplified by the Epstein-Barr virus (EBV)
transformation process.278,283 Moreover, few studies addressed the issue of somatic mosaicism or
heterosomy (variants present in only a fraction of cells in the tissue/blood sample), since most
platforms/algorithms are not designed to identify the “partial” nature of these regions, and few studies
compared the genomes of different tissues from the same individual.207,212,284 One survey of large
32
structural variations in blood-derived DNA in 957 controls and 1,034 bladder cancer patients identified
mosaic structural variations in 1.7% of all individuals with no significant difference between cases and
controls.285 The regions most commonly found to be somatic or cell-line artifact are T cell receptors or
immunoglobulin genes, including loci at 2q11200, 2p11.2208,212, 22q11.2200,208,212, 14q32.3200,208,212, and
14q11.2212 as well as chromosomes 9 and 20.285 Interestingly, some studies identified copy-number
variation within monozygotic twin pairs, both phenotypically concordant as well as discordant,
suggesting post-twinning somatic development of CNVs.286,287,288
2.7 CNVs and cancer Chromosomal aneuploidy, whether involving entire chromosome, chromosomal arms, or segments of
chromosomes, is a characteristic feature of most solid malignant tumors. Chromosomal instability (CIN)
is the high rate of loss and gain of whole chromosomes and has been attributed to various mechanisms
that interfere with correct segregation of chromosomes during mitotic division.289 Chromosomal structure
instability (CSI) is another hallmark of most solid cancers, involving multiple chromosomal segmental
breakages and fusions associated with telomere shortening, inappropriate DNA repair of double-strand
breaks, and chromosomal fragile sites, resulting in amplifications or deletions of the involved genomic
regions. A “chicken-vs-egg” debate has revolved around the relationship of CIN and CSI with the
development of cancer: not all aneuploid cells are unstable or tumorigenic and certainly many copy
number alterations in tumors appear to be “passengers” rather than driver mutations. Nonetheless, there
is evidence for CIN and CSI in cancer development, such as generating LOH at loci of inactivated tumor
suppressor genes or amplified oncogenes.290 Two decades ago, comparative genomic hybridization
(CGH) was developed to facilitate identifying regions of copy number gain and loss by hybridizing
biotinylated DNA from paired tumor and normal samples to metaphase chromosome spreads. Several
years later, array-based CGH was introduced and became a commonly used tool in the study of cancer
genomes. Later, SNP microarrays also came into use, providing the added advantage of detecting regions
of copy-neutral LOH and uniparental disomy. Very recently, the drop in cost of whole-genome and
exome sequencing has allowed the use of these technologies to identify a wide range of variants in
tumors, from single base to large structural variants.
In keeping with the classical Knudson two-hit hypothesis for inactivation of tumor suppressors, a number
of well-known tumor suppressor genes were first identified by analyzing focal homozygous deletions in
cancer in combination with linkage and/or LOH results (e.g. CDKN2A/B, PTEN, WT1, BRCA2). Those
discoveries spurred the identification of numerous candidate tumor suppressors by characterizing
recurrent deletions in tumors or cancer cell lines. Mouse studies have even suggested that
haploinsufficiency of some cancer genes can be sufficient to cooperate with other oncogenic alterations in
33
initiating tumor development (e.g. LKB1 and BRCA2 heterozygosity have been reported to accelerate
pancreatic tumor development in mice with activated Kras mutations). Similarly, genomic amplifications
in cancer can help identify candidate oncogenes. Moreover, some deletions and amplifications carry
prognostic significance (e.g. MYCN amplification in neuroblastoma, ERBB2 amplification in breast
cancer, 18q deletion in colon cancer), and whole-genome profiling of copy number alterations in tumors
can be diagnostic or prognostic (e.g. distinguishing gastrointestinal stromal tumors from
leiomyosarcomas291; aCGH classifier based on BRCA1-mutated breast cancer predicting sensitivity to
double-strand-DNA-break-inducing chemotherapy in patients without germline BRCA1/2 mutations292).
Structural rearrangements of pancreatic adenocarcinoma have been described in multiple studies, ranging
from cytogenetic karyotyping293 and microsatellite genotyping12,182,294 to CGH295-306 and SNP
microarrays37,307,308,309 to next-generation sequencing38. Certain patterns have emerged: all chromosomal
arms manifest genomic rearrangements, and the most frequently reported rearrangements are losses on
1p, 3p, 6p, 6q, 8p, 9p, 9q, 17p, 18q, 19p and gains on 8q. Some studies attempted to identify candidate
tumor suppressor genes or oncogenes, and while most results were of insufficient resolution to pinpoint a
target gene, certain genes were highlighted by multiple studies using a combination of genomic and
expression data (e.g. SMURF1 on 7q22.1301,303 and GATA6 on 18q11.2304,310 were proposed as novel
oncogenes.) LOH is a common event across the pancreatic cancer genome, often occurring in the form of
whole chromosome loss, and there was no significant difference in the pattern of LOH between sporadic
and familial tumors.12,182 One recent study that used massive parallel sequencing technology to detect
variants at fine resolution in 3 primary tumors and 10 metastases reported significant inter-patient
heterogeneity in the number, type, and distribution of rearrangements.38 Interestingly, one sixth of all
rearrangements were in a pattern they termed “fold-back inversions”, whereby regions are duplicated but
with the duplications facing in opposite directions. This appeared to be an early event in the development
of pancreatic cancer and is associated with telomere loss. Moreover, sequence analysis of metastases
indicated that this type of rearrangement did not continue occurring later in the pancreatic cancer
developmental pathway, suggesting a reactivation of telomere repair function. Other interesting findings
from this analysis of somatic rearrangements in pancreatic cancer metastases were: evidence of ongoing
clonal evolution in the primary tumor among cells capable of initiating metastases (based on identifying
finding some rearrangements only in some metastases), evidence for driver mutations involved in
metastatic spread (based on finding some rearrangements only in the metastases but not in the primary
tumor), and evidence for differences in evolution of metastases within each organ.
Less well studied than somatic genomic rearrangements in cancer is the relationship between germline
CNVs and cancer susceptibility. It is well known that moderate-to-high-penetrance rare germline CNVs
contribute to the heritability of familial cancer. Large germline genomic rearrangements that are absent
34
or rare in healthy populations have been reported as the cause of 15% of Familial Adenomatous Polyposis
(APC)311, 19% of Von Hippel Lindau disease (VHL)312, 4% of Hereditary Diffuse Gastric Cancer
(CDH1)313, 2-12% of Hereditary Breast and Ovarian Cancer (BRCA1 and BRCA2)314-320, 6-27% of Lynch
Syndrome (MSH2 & MLH1 genes)321,322, 16% of Peutz-Jeghers Syndrome (STK11)323, and 15% of
juvenile polyposis (SMAD4, BMPR1A, and PTEN)324 cases. Deleterious germline CNVs have also been
reported in non-BRCA1/2 associated familial breast cancer (PALB2325; BARD1326), Hereditary
Leiomatomatosis and Renal Cell Cancer (FH)327, Cowden disease (PTEN)328, Familial Atypical Multiple
Mole Melanoma (CDKN2A)329, Neurofibromatosis Type 1 (NF1)330, Ataxia Telangiectasia (ATM)331, Li
Fraumeni syndrome (TP53)332, familial retinoblastoma (Rb)333, and Multiple Endocrine Neoplasia Type 1
(MEN1)334. Interestingly, there are examples of copy number alterations at a distance from the coding
region of a gene influencing its expression, whether by affecting regulatory elements or by inducing
epigenetic changes that inactivate the gene. For example, in approximately 20% of suspected Lynch
syndrome cases with MSH2 loss but no detectable germline mutations or rearrangements in MSH2335
(about 1-3% of all Lynch Syndrome patients336), the causative mutation is a large heritable deletion at the
3’ end of the TACSTD1 gene, which causes transcriptional read-through and epigenetic silencing of the
adjacent MSH2 gene. In one juvenile polyposis kindred with 10 affected members who had no mutations
or rearrangements in the coding regions of SMAD4 and BMPR1A, Calva-Cerqueira et al.337 identified a
large deletion mapping 119kb upstream of the coding region of BMPR1A segregating with disease. The
deletion affected a promoter of BMPR1A and was demonstrated to diminish expression of the gene.
Common copy number polymorphisms at some genes linked to cancer have also been associated with
modest risk. For example, the glutathione-S-transferases (GSTs) constitute a family of genes involved in
drug and toxin metabolism and are thus hypothesized to protect cells against xenobiotics and oxidative
stress. Two of those genes, GSTT1 and GSTM1, have polymorphic deletions shown to correlate with
lowered enzyme activity. In one recent study that accurately quantified the copy number of those genes
in approximately 2,000 cancer patients and 8,000 controls, a gene dosage effect was demonstrated in
GSTT1 for prostate cancer in men and corpus uteri cancer in women, and in GSTM1 for bladder cancer.338
Another interesting association between a common copy number polymorphism and cancer was identified
in familial breast cancer for a deletion that eliminates exon 4 of MTUS1, a gene implicated as a tumor
suppressor. Interestingly, the common deletion was found to have a protective effect against breast
cancer, suggesting that the exon 4 deletion may paradoxically increase the tumor suppressor activity of
the gene (although this has yet to be demonstrated in functional studies).339
All of the aforementioned germline rearrangements were identified in targeted studies, commonly
utilizing PCR-based assays, which specifically searched for and/or quantified deletions or duplications at
or near known cancer genes in high-risk populations. The discovery of predisposition germline
35
rearrangements in cancer subjects without a priori knowledge of the region/gene of interest requires a
different approach. Most studies addressing this question have adopted two main strategies: genome-
wide CNV surveys in large cohorts of sporadic cancer patients and controls allow the identification of
statistically significant associations between common CNVs and a low-to-modest cancer risk;
alternatively, genome-wide CNV surveys in familial or hereditary cancer patients should facilitate the
detection of rare heritable CNVs (not previously published in controls nor present in a concurrently
studied control cohort) that potentially alter cancer genes and produce a modest-to-high risk of cancer.
Genome-wide case-control CNV association studies have identified candidate risk alleles for several
sporadic cancers: neuroblastoma in a Caucasian population (deletion at 1q21.1, OR=2.49, p=2.97 x 10-
17)340, aggressive prostate cancer in Caucasian populations (deletion at 2p24.3, OR=1.31, p=0.006;
deletion at 20p13, OR=1.17, 2.75 x 10-4)341,342, and nasopharyngeal carcinoma in Han Chinese males
(deletion at 6p21.3, OR=18.92, ).343 Most recently, Huang et al.344 identified a common 10,379bp
deletion at 6q13 that was found to be higher in frequency in sporadic pancreatic cancer Han Chinese
patients compared to controls, and confirmed via a qPCR assay to have an odds ratio of 1.31 for 1-copy
carriers compared to 2-copy carriers. All those studies replicated their results in a confirmation cohort
and used ethnicity-matched cases and controls, and all but Diskin et al.340 used a PCR-based assay as the
confirmation assay; Diskin et al.340 applied multiple correction testing to verify the statistical significance
of their results. Three of the identified CNVs overlapped genes: The neuroblastoma CNV overlapped a
novel transcript that demonstrated high sequence homology to the neuroblastoma breakpoint family
(NBPF) genes, was shown to correlate in expression with copy number, and was highly expressed in fetal
brains. The prostate cancer CNV at 20p13 differentially affects isoforms of the SIRPB1 gene, which
codes for a signal regulatory protein. The CNV at 6p21.3 encompassed MICA, a major histocompatibility
complex class (MHC)-A gene which functions to mediate natural killer (NK) cell activation and T-
lymphocyte costimulation and which has been associated with nasopharyngeal cancer in previous studies.
The pancreatic cancer CNV at 6q13 and the prostate cancer CNV at 2p24.3 are non-genic and are
hypothesized to impact risk through long-range regulatory effects on an unidentified gene. Indeed,
functional analysis of the non-genic deletion associated with pancreatic cancer suggested that it may be
involved in long-range regulation of CDKN2B, an established tumor-suppressor gene. While these results
are interesting, they remain to be further validated in future studies. Some analyses may be confounded
by inaccurate genotyping of the CNV of interest: for example, the Database of Genomic Variants has
reports of gains as well as deletions at several of these putative cancer-associated CNVs, suggesting that
they may not be simple biallelic variants. Moreover, previous studies of CNVs in Asian populations232,239
reported higher frequencies of the deletion at 6p21 in controls than was identified in the population
studied in the nasopharyngeal carcinoma study. This is particularly significant because the odds ratio
36
identified for the 6p21 deletion (~19) was much higher than for any other common CNV or SNP
associations, and it may in fact be an overestimation if the deletion was undercalled in controls.
A few studies have been published surveying germline CNVs in familial solid cancer patients, and
although they have proposed several candidate predisposition genes based on overlap with patient-
specific CNVs, none to date have been able to show a significant contribution or segregation with disease
of any one gene to those cancer syndromes. One of the earliest studies analyzed 57 predominantly
Caucasian pancreatic cancer patients from 56 high-risk kindreds (each containing at least a pair of
affected first-degree relatives) using an oligonucleotide-based CGH platform, filtering out losses or gains
that were also identified in 607 mostly Caucasian controls (372 were analyzed in the same study, and 235
were previously reported in two other studies).345 Twenty-five losses overlapping 81 genes and 31 gains
overlapping 425 genes were identified specific to the cancer patients, and those genes were presented as
potential candidate predisposition genes. Due to lack of sufficient related samples, the authors were
unable to demonstrate heritability or segregation with disease of the patient-specific CNVs. Moreover,
the resolution of the CGH array used in this study was relatively lower than current platforms
(approximately 30kb), which resulted in relatively large CNV calls that likely overestimated the actual
breakpoint boundaries of rearrangements. Furthermore, the available control data available at the time of
publication was limited, so some of the supposedly familial pancreatic cancer (FPC)-specific CNVs were
identified in control populations in subsequent studies. The abstract of the paper refers to two deletions
that were observed in two different patients and one deletion that was observed in three different
individuals, yet no discussion of these regions is found in the main text of the manuscript. If such regions
were truly found to be recurrent in patients and absent in controls, they would be of particular interest as
candidate predisposition CNVs, but we cannot draw any conclusions given the paucity of information
provided.
Two other studies similarly provided a list of candidate genes in familial cancer. Yoshihara et al.346
compared 68 Japanese subjects with germline BRCA1 mutations (including 51 subjects with ovarian
cancer), 34 sporadic ovarian cancer patients, and 47 healthy controls, and they identified 31 CNVs
specific to the BRCA1-mutation group. All 31 CNVs overlapped genes, and three CNVs segregated with
ovarian cancer in affected members of the same family (of which two CNVs were present in two different
families each). No significant difference was found in the per-genome total number of CNVs between
BRCA1-mutation carriers and controls, although the number of deletions was higher in the BRCA1-
mutation subjects. Otherwise, they found no evidence for differential clustering of the global CNV data
between groups, and no correlation of age at diagnosis with CNV frequency. Since the BRCA1 gene was
already identified as the primary genetic mutation in this study, the list of genes overlapped by CNVs
represented potential modifying genes that may contribute to the unique biological characteristics of
37
BRCA1-mutated ovarian cancer. Venkatachalam et al.347 studied 41 young-onset and/or familial
colorectal cancer with microsatellite-stable tumors and identified four losses and three gains in six
patients (one patient had a loss and a gain) which were not present in a large control cohort nor reported
in previous control studies. Each CNV overlapped at least one gene and each was detected in a single
patient only.
A study by Shlien et al.348 presented an intriguing perspective of the connection between germline CNVs
and somatic tumor development in TP53 germline mutation carriers. They studied 53 Li-Fraumeni family
members (20 with wildtype TP53, 23 with TP53 mutations and history of cancer, and 8 with TP53
mutations and no cancer) and 70 unrelated healthy controls, and demonstrated a significantly elevated
frequency of germline CNVs in the TP53 mutation carriers relative to controls with wild-type TP53.
There was also a trend for a higher frequency of germline CNVs in cancer patients carrying TP53
mutations relative to mutation carriers without a history of cancer, but this did not reach statistical
significance possibly due to the small sample size. Furthermore, not only was the number of individual
CNVs elevated in mutation carriers but the number of copy-number variable bases was also higher, even
when the absolute number of CNVs was not, due to a tendency toward larger CNVs in the TP53 mutation
cohort. Comparison between germline and choroid plexus tumor DNA in four patients identified 15/21
loci overlapping germline CNVs that became substantially larger in the paired tumors, and three of four
tumors had loci at which a germline hemizygous deletion had progressed to homozygous deletion. These
findings suggested a model of tumor development in Li-Fraumeni syndrome in which germline genomic
instability (manifested as a higher than average CNV frequency) predisposes to additional genomic
rearrangements and/or expansion of germline CNVs in somatic tissue, affecting genes that drive the
development of cancer. The authors also report a list of cancer-related genes overlapped by germline
CNVs in the TP53-mutation carriers which may act synergistically with the TP53 mutation in promoting
cancer development. Of course, the role of TP53 in maintaining the genome is well known349, and it is
not surprising to find that even non-malignant cells exhibit increased genomic instability in Li-Fraumeni
patients. However, it is unclear if this phenomenon applies to other tumor suppressor genes that
predispose to familial cancer. Future surveys of CNV burden in other cancer syndromes would shed
more light on this question.
3. Whole-Exome Sequencing The human genome is comprised of approximately 3 billion base pairs, of which less than 2% code for
proteins. The release of the first reference build of the human genome in 2003, after a 13-year
collaborative international effort, opened the door to significant advancements in understanding the
genetic and genomic makeup of individuals, populations, and cancers. The Human Genome Project
38
expanded understanding of the identity and population frequency of SNPs, the most frequently occurring
variant in the human genome, and efforts to determine haplotype structure (blocks of SNPs present in
different combinations and segregating in populations) have accelerated progress in the fields of
population genetics, human evolution, and disease-gene associations.
The original sequencing effort was based on the technique developed by Fredrick Sanger in the 1970s,
utilizing labeled dideoxy trinucleotide triphosphates (ddNTPs) as DNA chain terminators and separating
terminated chains of various lengths by gel electrophoresis to determine base order in the sequence.
High-throughput requirements of the DNA sequencing effort drove the development of automated
capillary electrophoresis and other laboratory process automation. The International Human Genome
Sequencing Consortium (IHGSC) employed a “hierarchical shotgun sequencing“ approach that involved
fragmenting and cloning DNA (initially using yeast artificial chromosomes, then subsequently bacterial
artificial chromosomes), mapping clones on the physical map of the genome with the help of established
genomic markers, shot-gun sequencing clones, and finally aligning sequenced fragments to the
developing map.350 In the last few years of the IHGSC project, a competing effort undertaken by Craig
Venter’s company CELERA utilized a “whole genome shotgun sequencing” approach which was
considered by Venter to be more efficient and faster, although CELERA did end up incorporating
publicly available data that was generated by the IHGSC to allow accurate mapping of sequenced
fragments due to the difficulty of mapping to highly repetitive regions of the genome (which constitute a
large portion of the human genome) without the use of additional genome map information.350,351 The
approximate cost of sequencing the first reference human genome was $3 billion. Importantly, neither the
IHGSC nor the CELERA genomes was the sequence of a single diploid genome but rather each was a
haploid consensus sequence of DNA derived from several anonymous individuals of different ancestries
(although the IHGSC sequence was primarily based on a single male individual, and the CELERA
reference sequence may have included Craig Venter’s genome). Building on the data discovered from the
reference human genome, the International HapMap Project set out to identify common SNPs (defined as
minor allele frequency (MAF) >1% frequency, but most identified by this project have a MAF >5%) and
their haplotype structure in members of different populations.352 This important source of information
allowed the development of genotyping arrays for genome-wide association studies.
Only four years after the release of the nearly complete human reference genome, the first diploid human
genome sequence to be published belonged to Craig Venter, using the CELERA whole-genome shotgun
sequencing method, costing $70-100 million and was completed in about 4 years. (The cost estimate
incorporates costs incurred during the development of the CELERA reference genome).209 While this
sequence presented an interesting perspective on the makeup of individual genomes, it is also clear that
39
many more genomes need to be sequenced before the full potential of genomic analysis and comparisons
among individuals can be realized.
Making whole-genome sequencing possible for many genomes required a dramatic reduction in cost and
increase in the speed of the process. To that end, the development of massively-parallel next-generation
technologies presented a breakthrough in genomics. Since publication of the first sequencing-by-
synthesis technology in 2005353, a number of different platforms have been developed. While they
employ different techniques of sequencing (Illumina and Roche/454 use DNA polymerase-based
sequencing-by-synthesis approaches while ABI SOLiD uses DNA ligase-based sequencing by ligation),
all are based on clonal cluster amplification of target molecules to generate a sufficiently strong signal.354
The first human genome to be fully sequenced by a massively-parallel platform belonged to James
Watson, co-discoverer of the DNA double helix.218 In a demonstration of the significantly increased
power of next-generation sequencers, the Watson genome was sequenced in 4.5 months and this effort
cost less than $1.5 million.355 Since then, many other individuals of different ancestries have been
sequenced.209,218,222,223,227,228,230,239,243,356,357,358,359 The 1000 Genomes project is an endeavour to sequence
the genomes of 2,500 unidentified individuals from 29 populations to discover, genotype, and accurately
identify haplotypes, with the overarching goal of characterizing 95% of variants with allele frequency of
1% or greater in genomic regions that can be sequenced by the most recently available next-generation
platforms.246 To date, three pilot projects have been completed: (1) low-coverage sequencing (2-4x) of
the whole genome of 180 individuals – provides data on 1% or higher frequency SNPs; (2) deep
sequencing (20-60x) of two mother-father-adult child trios whole genomes – allow quality control of data
from pilot project (1) and inferring haplotypes; (3) targeted capture and deep sequencing (50x) of ~8,000
exons from approximately 900 randomly selected genes -- to test the effectiveness of targeted capture
sequencing in identifying common, low-frequency, and rare variants in protein-coding regions of the
genome. The main project involves low-depth sequencing (4x) of the whole genome of 2,500 individuals
as well as deeper sequencing of their exomes by the target-enrichment method (See below for more detail
on exome sequencing).
Whereas the Sanger-based automated sequencers generated approximately 100 kbp of data per day on a
single machine, the earliest next-generation platform increased the output by two orders of magnitude and
this was very quickly surpassed by further developments of other platforms with larger output, and a
single sequencer in 2011 produces around 40 Gbp per day.360,361 An important distinction between
Sanger-based and next-generation sequencers is the read length: 700-1000 bp for capillary Sanger
sequencers compared to 75-400bp in next-generation sequencers, depending on the platform. The cost of
whole-genome sequencing has dropped significantly, currently as low as $5000-$10000. Interestingly,
while the cost of generating a genome sequence has dropped dramatically, the capacity to analyze the data
40
has advanced less rapidly. Some challenges have included the inadequate adaptation of software
originally designed for alignment and variant calling of Sanger sequencing and the need for newer
validated software packages that can handle the significantly larger quantity of data that is generated with
newer platforms.362 The relatively short reads have also posed a problem for de novo genome assembly
and correct alignment to repetitive or highly homologous regions. In recent years, “third-generation”
sequencing methodologies have been introduced, characterized by the ability to directly sequence single
molecules without needing to amplify the template.363 Those newest methods of sequencing may address
some of the limitations of next-generation sequencers (e.g. they appear to generate longer reads
approximating the length obtainable by the Sanger capillary sequencers) but they have their own
challenges, such as higher raw read error rate from the single molecule sequencing approach. As such,
ongoing improvements in both sequencing technologies as well as bioinformatic tools will be necessary
to achieve the most cost-effective means of sequencing large numbers of genomes for disease gene
discovery and clinical diagnostic purposes. (I am not addressing other applications of next-generation
sequencing such as transcriptomics, epigenomics, and chromatin immunoprecipitation sequencing (ChIP-
seq) as they are outside the scope of this thesis).
The cost of whole-genome sequencing has not yet reached the promised “$1,000-genome” level that has
been identified as a goal for the genomic community, particularly if post-sequencing analysis cost is taken
into consideration; moreover, much of the information identified in a whole genome remains difficult to
evaluate in terms for functional impact on disease or phenotype since only 1-2% of the entire genome has
been annotated as protein-coding. Indeed, to date, several reports of whole-genome sequencing in disease
cases have been published but invariably they focus on coding region variants to identify candidate
causative genes.364-371 These two current limitations of whole-genome sequencing (cost and functional
annotation of the genome) have made exome-sequencing an attractive alternative for researchers. Exome
sequencing is based on capturing and subsequently amplifying and sequencing the coding region of the
genome using massively-parallel sequencing. Since the target region in exome sequencing is less than
2% that in whole-genome sequencing, it is possible to obtain much greater read-depth per base per run.
This means that more samples can be sequenced in the same amount of time and for the same price as a
single whole genome. A number of methods of target enrichment have been introduced, including both
solid-phase (e.g. Nimblegen Sequence Capture Human Exome 2.1M array) as well as in-solution
oligonucleotide arrays (e.g. Agilent SureSelect System).372,373 The latest arrays can capture up to 44-
50Mb of genomic sequence, encompassing most of the annotation of the Collaborative Consensus Coding
Sequence (CCDS 2009)374 database and flanking base pairs of target regions as well as microRNAs and
other non-coding RNAs. It should be noted that, although the coverage of exome sequencing for coding
41
regions and adjacent regulatory sequences is excellent, it is not perfect and the success of capture varies
between arrays to some extent, as well as sequence-specific characteristics such as high GC-content.375
The first description of a human exome was based on the coding variants identified in the previously
published diploid genome of Craig Venter (HuRef).376 The authors reported that most nonsynonymous
SNPs are common (15-20% are rare and ~95% of the rare variants were heterozygous). They also
identified 105 premature-terminating codons, many of which are common and do not appear to be under
negative selection. They noted that many of these variants were present in duplicated genes and
hypothetical genes, suggesting that their impact in this setting may be less deleterious. They also noted
that half of all coding indels occurred in tandem repeats, and tended to occur at the C and N termini of
genes and/or near exon boundaries (which in some cases were considered likely mapping errors in the
reference genome). There was a bias toward indels composed of multiples of 3 bases (3n) in coding
regions that are likely to be functionally significant, suggesting purifying selection acting on frameshift
indels in those regions. Of additional importance, the authors noted that the Venter genome contained at
least 680 nonsynonymous SNPs affecting 443 genes with some association with disease, including 7 that
were in dbSNP and OMIM database, which foreshadowed the challenge that would be encountered in
interpreting the clinical significance of coding variants as more genomes and exomes are sequenced.
The first report of target-captured exome sequencing using next-generation sequencing was published in
2009 by Ng et al.377, describing the exomes of 8 HapMap individuals whose genomes were previously
characterized by sequencing fosmid-clones to identify structural variants. In addition, in a proof of
concept experiment, the exomes of four unrelated individuals with a rare autosomal dominant disorder
(Freeman-Sheldon Syndrome) caused by MYH3 mutations were sequenced to demonstrate a filtering
strategy that would identify the causative gene. The average depth of coverage was 51x, translating into
95% of coding bases in 78% of genes being successfully called (based on a threshold of ≥ 8x depth per
base required to reliably call a heterozygous variant). The estimated average number of truncating single
base variants per genome was higher in African than non-African genomes (20/African vs. 10/non-
African), and a similar ratio was observed for rare frameshift indels (17/African vs. 8/non-African). As
was observed in the Venter exome, most indels in coding regions were non-frameshift. To identify the
causative gene in the four Freeman-Sheldon Syndrome patients, the authors filtered variants to focus on
non-synonymous and/or splice-site variants or indels that were not previously reported in dbSNP or found
in the 8 HapMap exomes, and which were in the same gene in all four affected patients. This approach
reduced the number of candidate genes to precisely one, namely MYH3. A subsequent study applied the
same filtering strategy to successfully identify the unknown genetic cause of a rare autosomal recessive
Mendelian disorder (Miller Syndrome), the first of approximately 90 such studies to be published in quick
succession over a period of 24 months. (Table 3) Currently ongoing large-scale projects employing
42
exome sequencing include the 1000 genomes project (which aims to sequence the exomes of ,2500
anonymous individuals) as well as the Exome Sequencing Project, which aims to discover variants
relevant to heart, lung, and blood diseases and has to date sequenced the exomes of nearly 5,400
individuals from multiple study cohorts (the project plans to sequence approximately 7,000 exomes).
Table 3 – Studies using exome-sequencing to identify genetic cause of disease
Authors Year Journal Disease Autosomal dominant or recessive (AD or AR)
Description
Vissers et al.378 2010 Nat Genet Mental Retardation Sporadic Studied 10 trios; identified de novo mutations as potential cause for unexplained mental retardation
Walsh et al.379 2010 Am J Hum Genet
Nonsyndromic Hearing Loss
AR Combined homozygosity mapping in consanguinous family with exome sequencing to identify DFNB82 as cause
Lalonde et al.380 2010 Hum Mut Fowler Syndrome AR Identified compound hets in FLVCR2 in two fetuses from consanguinous families
Pierce et al.381 2010 Am J Hum Genet
Perrault Syndrome AR Identified compound hets in HSD17B4 in two sisters
Ng et al.382 2010 Nat Genet Kabuki Syndrome AD Studied 10 unrelated affected subjects; identified MLL2 as cause
Bilguvar et al.383 2010 Nature Malformation of Cortical Development
AR Combined homozygosity mapping and exome sequencing in family with two affected members; identified WDR62 as cause
Gilissen et al.384 2010 Am J Hum Genet
Sensenbrenner Syndrome
AR Identified compound hets in WDR35 in two unrelated affected subjects
Krawitz et al.385 2010 Nat Genet Hyperphosphatasia Mental Retardation Syndrome
AR Performed identity-by-descent filtering on exome data to identify PIGV as cause in 3 affected siblings of nonconsanguinous family
Anastasio et al.386
2010 Am J Hum Genet
Van Den Ende-Gupta Syndrome
AR Combined homozygosity mapping with exome sequencing to identify SCARF2 as cause in 4 affecteds from 3 consanguinous families
Johnson et al.387 2010 Am J Hum Genet
Brown-Vialetto-van Laere Syndrome
AR Identified C20orf54 as cause in three affected siblings
Sirmaci et al.388 2010 Am J Hum Genet
Michels Syndrome AR Combined homozygosity mapping with exome sequencing to identify
43
MASP1 as cause in 3 individuals from 2 consanguinous families
Haack et al.389 2010 Nat Genet Isolated complex I deficiency
AR Identified compound hets in ACAD9 in single affected individual
Wang et al.390 2010 Brain Spinocerbellar ataxia AD Combined linkage analysis with exome squencing in a Chinese family with 4 affecteds; identified TGM5 as cause
Musunuru et al.391
2010 NEJM Combined hypolipidemia
AR Identified compound hets in ANGPTL3 in 2 affected sibs
Johnson et al.392 2010 Neuron ALS AD Combined linkage analysis with exome sequencing in 2 affected relatives, identified VCP as cause
Bolze et al.393 2010 Am J Hum Genet
Autoimmune lymphoproliferative syndrome (ALPS)
AR Found homozygous variants in FADD
Liu et al.394 2011 PLoS One Moyamoa disease AD Combined linkage analysis with exome sequencing to identify RNF213
Zuchner et al.395 2011 Am J Hum Genet
Retinitis pigmentosa AR Identified homozygous variants in DHDDS
Glazov et al.396 2011 PloS Genet Anauxetic dysplasia-like condition
AR Identified compound hets in POP1
Worthey et al.397 2011 Genet Med Inflammatory bowel disease
AR Identified hemizygous variant on X chromosomes (XIAP)
Simpson et al.398 2011 Nat Genet Hajdu-Cheney Syndrome
AD Exome sequencing of 3 unrelated affecteds identified NOTCH2
Becker et al.399 2011 Am J Hum Genet
Osteogenesis imperfecta
AR Identified homozygous variants in SERPINF1 in 2 affected sibs
Ostergaard et al.400
2011 J Med Genet Primary lymphoedema
AD Combined linkage analysis with exome sequencing to identify GJC2
Caliskan et al.401 2011 Hum Mol Genet
Non-syndromic mental retardation
AR Combined homozygosity mapping with exome sequencing to identify TECR
Erlich et al.402 2011 Genome Res Hereditary spastic paraparesis
AR Combined homozygosity mapping with exome sequencing to identify KIF1A
Sundaram et al.403
2011 Ann Neurol Tourette syndrome/chronic tic phenotype
AD Identified OFCC1 as cause
Puente et al.404 2011 Am J Hum Genet
Hereditary Progeroid Syndrome
AR Identified homozygous mutations in BANF1
Vissers et al.405 2011 Am J Hum Genet
Chondrodysplasia and abnormal joint development syndrome
AR Identified homozygous variants in IMPAD1 in three affected unrelated individuals
44
O’Sullivan et al.406
2011 Am J Hum Genet
Amelogenesis imperfecta and gingival hyperplasia syndrome
AR Combined homozygosity mapping with exome sequencing to identify FAM20A
Gotz et al.407 2011 Am J Hum Genet
Infantile hypertrophic mitochondrial cardiomyopathy
AR Identified compound heterozygous mutations in mtAlaRS
Shi et al.408 2011 PLoS Genet Myopia AD Identified mutations in ZNF644 in 2 relatives
Klein et al.409 2011 Nat Genet Hereditary sensory neuropathy with dementia and hearing loss
AD Combined linkage with exome data to identify mutations in DNMT1
Barak et al.410 2011 Nat Genet Malformations of occipital cortical development
AR Identified homozygous mutation in single affected child of consang parents
O’Roak et al.411 2011 Nat Genet Autism Sporadic Identified 11 de novo protein-altering mutations, some genes previously connected to autism
Alvarado et al.412 2011 Bone Joint Surg Am
Distal arthrogryposis type 1
AD Identified MYH3 as cause
De Greef et al.413 2011 Am J Hum Genet
Immunodeficiency, centromeric instability, and facial anomalies
AR Combined homozygosity mapping with exome sequencing to identify ZBTB24
Yamaguchi et al.414
2011 J Bone Miner Res
Primary failure of tooth eruption
AD Combined linkage with exome sequencing to identify PTH1R as cause
Zhou et al.415 2011 Hum Mutat Hereditary hypotrichosis simplex
AD Combined linkage with exome sequencing to identify RPL21 as cause
Le Goff et al.416 2011 Am J Hum Genet
Geleophysic and acromicric dysplasia
AD Identified FBN1 as candidate gene in 5 patients
Hanson et al.417 2011 Am J Hum Genet
3-M syndrome AR Combined homozygosity mapping with exome sequencing to identify mutation in CCDC8
Vilarino-Guell et al.418
2011 Am J Hum Genet
Late-onset Parkinson AD Identified mutation in VPS35
Zimprich et al.419 2011 Am J Hum Genet
Late-onset Parkinson AD Identified VPS35 as cause (different patients from Vilarino-Guell)
Sergouniotis et al.420
2011 Am J Hum Genet
Leber congenital amaurosis
AR Combined homozygosity mapping with exome sequencing to identify KCNJ13 as cause
Albers et al.421 2011 Nat Genet Gray Platelet Syndrome
AR Identified NBEAL2 as cause
Sanna-Cherchi et al.422
2011 Kidney Int Steroid-resistant nephrotic syndrome
AR Combined homozygosity mapping with exome sequencing in 3 affected sibs
45
of consang parents to identify homozygous mutations in MYO1E and NEIL1
Liu et al.423 2011 J Exp Med Chronic mucocutaneous candidiasis disease
AD Identified mutations in STAT1 as cause
Yariz et al.424 2011 Fertil Seril Empty Follicle Syndrome
AR Identified homozygous mutation in LHGCR in 2 sisters
Xu et al.425 2011 Nat Genet Schizophrenia Sporadic Identified 40 rare de novo protein altering mutations in 40 genes (in 27 cases), including DGCR2, a gene in schizophrenia-predisposing region 22q11.2
Sirmaci et al.426 2011 Am J Hum Genet
KBG syndrome AD Identified ANKRD11 as cause
Shaheen et al.427 2011 Am J H um Genet
Adams-Oliver syndrome
AR Combined homozygosiy mapping with exome sequencing to identify homozygous mutations in DOCK6
Noskova et al.428 2011 Am J Hum Genet
Adult-onset neuronal ceroid lipofuscinosis
AD Identified 5 unrelated individuals with mutations in DNAJC5
Weedon et al.429 2011 Am J Hum Genet
Charcot-Marie-Tooth
AD Found DYNC1H1 as cause in 3 relatives
Ozgul et al.430 2011 Am J Hum Genet
Retinitis pigmentosa AR Identified homozygous mutation in MAK as cause
Doi et al.431 2011 Am J Hum Genet
Cerebellar ataxia AR Identified mutation in SYT14 as cause
Sloan et al.432 2011 Nat Genet Malonic and methylmalonic aciduria
AR Identified mutation in ACSF3 as cause
Aldahmesh et al.433
2011 J Med Genet Knobloch Syndrome AR Identified ADAMTS18 as cause
Murdock et al.434 2011 Am J Med Genet A
Recurrent polymicrogyria
AR Identified compound het mutations in WDR62 as cause in 2 sibs
Regalado et al.435 2011 Circ Res Thoracic aortic aneurysms leading to acute aortic dissection
AD Identified SMAD3 as cause
Dickinson et al.436
2011 Blood Dendritic cell, monocyte, B and NK lymphoid deficiency
AD Identified GATA2 as cause in 4 unrelated affecteds
Hor et al.437 2011 Am J Hum Genet
Familial narcolepsy with cataplexy
AR Combined linkage with exome sequencing to identify MOG as cause
Marti-Masso et al.438
2011 Hum Genet Early-onset generalized dystonia
AR Identified GCDH as cause in 2 affected siblings
Tariq et al.439 2011 Genome Biol heterotaxy AR Combined homozygosity mapping with exome
46
sequencing to identify SHROOM3 as candidate cause
Takata et al.440 2011 Genome Biol Progressive external ophthalmoplegia
AR Combined homozygous mapping with exome sequencing to identify RRM2B as cause in patient from consang family
Theis et al.441 2011 Circ Cardiovasc Genet
Dilated cardiomyopathy
AR Combined homozygosity mapping with exome sequencing to identify GATAD1 mutations in 2 affected sisters
Pierson et al.442 2011 PLoS Genet Spastic ataxia-neuropathy syndrome
AR Identified AFG3L2 as cause in 2 brothers of consang family
Al Badr et al.443 2011 J Pediatr Urol Ochoa (urofacial) syndrome
AR Combined homozygosity mapping with exome sequencing to identify HPSE2 as cause in child of consang parents
Cullinane et al.444
2011 J Invest Dermatol
Oculocutaneous albinism and neutropenia
AR Combined homozygosity mapping with exome sequencing to identify two candidate genes (SLC45A2 and G6PC30
Ovunc et al.445 2011 J Am Soc Nephrol
Intermittent nephrotic-range proteinuria
AR Identified CUBN as cause in 2 sibs of consang parents
Bowne et al.446 2011 Eur J Hum Genet
Retinitis pigmentosa with choroidal involvement
AD Combined linkage analysis with exome sequencing to identify RPE65 as cause
Kitamura et al.447 2011 J Clin Invest Autoinflammation and lipodystrophy
AR Identified PSMB8 as cause in patients from 2 consang families
Tyynismaa et al.448
2011 Hum Mol Genet
Progressive external ophthalmoplegia with multiple mitochondrial DNA deletions
AR Identified TK2 as cause
Bjursell et al.449 2011 Am J Hum Genet
hypermethioninemia AR Identified ADK as cause
Zangen et al.450 2011 Am J Hum Genet
XX female gonadal dysgenesis
AR Combined homozygosity mapping with exome sequencing to identify PSMC3IP/HOP2 as cause
Galmiche et al.451
2011 Hum Mutat Mitochondrial cardiomyopathy
AR Identified compound hets in MRPL3 as cause in 4 affected sibs
Bredrup et al.452 2011 Am J Hum Genet
Ciliopathies with skeletal anomalies with renal insufficiency
AR Identified compound hets in WDR19 as cause
Saitsu et al.453 2011 Am J Hum Hypomyelinating AR Identified POLR3A and
47
Genet leukoencephalopathy POLR3B as cause Clayton-Smith et al.454
2011 Am J Hum Genet
Say-Barber-Biesecker variant of Ohde syndrome
sporadic Identified KAT6B as cause in 4 individuals
Aldahmesh et al.455
2011 Am J Hum Genet
Ichthyosis, intellectual disability, and spastic quadriplegia
AR Combined homozygosity mapping with exome sequencing to identify ELOVL4 as cause in 2 individuals
Chen et al.456 2011 Nat Genet Paroxysmal kinesigenic dyskinesia
AD Identified PRRT2 as cause in 8 families
Logan et al.457 2011 Nat Genet Early onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD)
AR Identified MEGF10 as cause
Dauber et al.458 2011 J Clin Endocrinolo Metab
Severe infantile hypercalcemia
AR Identified CYP24A1 as cause
Shamseldin et al.459
2011 J Med Genet Split hand and foot malformation
AR Combined homozygosity mapping with exome sequencing in consang family to identify DLX5 as cause
Sergouniotis et al.460
2011 Am J Hum Genet
Benign Flack Retina AR Combined homozygosity mapping with exome squencing to identify PLA2G5 as cause
Berger et al.461 2011 Mol Genet Metabol
Early prenatal ventriculomegaly
AD Combined linkage with exome sequencing to identify AIFM1 as cause
Bhat et al.462 2011 Clin Genet Primary microcephaly
AR Identified WDR62 as cause
Wang et al.463 2011 Hum Mutat Leber congenital amaurosis
AR Identified ALMS1, IQCB1, CNGA3, MYO7A as candidates
To date, most successful exome-based studies were in monogenic Mendelian disorders. The first filtering
step in most studies was to exclude variants reported in dbSNP and any other exome data available to the
investigators. Depending on the version of dbSNP used and the number of available exomes, this step
usually eliminates at least half of the called variants. Furthermore, only variants that cause potential
protein change or truncation are included in the analysis (i.e. nonsynonymous single nucleotide variants;
splice-site variants; nonsense variants; and indels). At this point, studies diverge in their strategies,
depending on the nature of the condition being studied and the available samples for sequencing. A
notable characteristic of most exome studies published to date is that the diseases being investigated are
recessive (Table 3). This allows the application of homozygosity mapping or identity-by-descent analysis
to family data, or even simply filtering out all genes except those that have homozygous variants or
compound heterozygous variants in the exome samples. If multiple affected relatives and/or more than
48
one family are available for a rare, fairly homogeneous condition, this strategy is very successful at
narrowing down the list of candidate genes to just one or at most a few genes. Even if only one sample is
available, it is possible to identify the causative gene for an autosomal recessive condition using this
method. For autosomal dominant conditions, where the causative variant is heterozygous, the use of
family linkage data can aid in significantly reducing the number of candidate genes. Alternatively, for
diseases caused by mutations in a single gene in most affected cases, identifying genes with novel
variants in more than one subject also helps pinpoint the causal gene. Additional filtering by predicted
effect of variants (using such tools as Polyphen-2464 (http://genetics.bwh.harvard.edu/pph2/index.shtml)
and SIFT465 (http://sift.jcvi.org/) and/or conservation scores (using PhyloP and GERP) may help in
ranking multiple candidate genes. However, those latter tools have their limitations and are often not
consistent in ascribing functional importance to the same variant. Some investigators have presented
statistical attempts at ranking variants and genes identified in such exome studies, but their applicability
and success rates are not known as of yet.468-470 Regardless, almost all studies provide further evidence in
support of the gene identified by sequencing the gene in other patients with the disease and/or presenting
functional analysis of the gene in the disease process.
The somatic genomes of many cancers have been sequenced, shedding light on important genes and
pathways involved in driving tumorigenesis and/or metastasis. The earliest of those involved a laborious
approach of sequencing coding regions exon-by-exon using the conventional Sanger method.37,471-472 The
first cancer genome to be sequenced using next-generation platforms was that of a cytogenetically normal
acute myeloid leukemia (AML)473; subsequently, additional genomes of AML474-475; breast cancer476-477;
lung cancer478-479; uveal melanoma480; colorectal cancer481; multiple myeloma482; hepatocellular
carcinoma483; hairy cell leukemia484; diffuse large B-cell lymphoma485; pancreatic neuroendocrine
tumor486; and gastric cancer487. An international collaboration under the auspices of the International
Cancer Genome Consortium (ICGC)488 is currently undertaking a large-scale integrative analysis of 50
different cancer types and/or subtypes at the genomic, epigenomic, and transcriptomic levels.
In addition to investigating the somatic genome of cancer, germline sequencing can help identify genes
that predispose to Mendelian cancer syndromes and/or familial cancer clustering. The first such study
used paired germline-tumor exome data to identify PALB2 as a new FPC gene in a patient who did not
carry mutations in known predisposition genes.117 The paired tumor variants allowed Jones et al.117 to
narrow the search down to genes that had a germline truncating mutation as well as a somatic “second-
hit” deleterious mutation, thus excluding all but three genes, two of which were previously reported to
have truncating mutations in healthy controls. Resequencing the full PALB2 coding region in a cohort of
96 FPC subjects identified an additional three families with protein-truncating mutations in the gene,
whereas truncating mutations in PALB2 are rare in control populations, further supporting PALB2 as an
49
FPC predisposition gene. In addition, the function of PALB2, a partner of BRCA2 which is already
implicated in pancreatic tumorigenesis, provided further weight to this discovery.
Despite the success of this initial report, few familial and/or syndromic cancer exome studies have been
published to date. Two studies, investigating the cause of childhood classic Kaposi Sarcoma489 and
mosaic variegated aneuploidy490, were able to take advantage of apparently recessive inheritance to filter
the exome data and identify the causative genes. In the case of Kaposi Sarcoma, variants were filtered for
homozygosity, protein-altering effect, and absence in dbSNP129, 1000 Genomes, or 49 in-house exomes,
leaving only 1 splice-site variant and 11 missense variants. The splice-site variant affects a gene (STIM1)
that is also mutated in a recessive immunodeficiency syndrome, and given the previous link of Kaposi
Sarcoma to immunodeficiency, this was considered a strong candidate. The investigators of mosaic
variegated syndrome sequenced two siblings of non-consanguinous parents and attempted to identify a
gene with two loss-of-function mutations shared by both siblings (as compound heterozygotes).
Interestingly, they did not initially identify a single causal gene, and rather identified 12 genes with a
single loss-of-function mutation in common to the siblings. Focusing on a gene with a putative functional
connection to the disease (CEP57 -centrosomal localization), Snape et al. sequenced its full coding region
in both siblings and identified a second mutation, an 11-bp deletion that was not called in the exome data.
This highlights current limitations of sensitivity and specificity of exome analysis. Two additional
unrelated patients were also found to carry compound heterozygote mutations in CEP57.
Two studies of autosomal dominant hereditary cancer were able to harness the power of sequencing
multiple unrelated individuals or linkage analysis to narrow down the list of susceptibility gene
candidates. In a study of hereditary pheochromocytoma491, three unrelated patients were sequenced and
the variants filtered to only include heterozygous protein-altering mutations shared by all three subjects
and absent in dbSNP and 1000 Genomes data. This reduced the list of candidates to just two genes, of
which only one segregated with disease in the respective families (MAX). By demonstrating LOH at the
MAX locus and absence of MAX expression in tumors from the affected families, Comino-Mendez et al.491
presented strong evidence for the role of MAX as a tumor suppressor gene in pheochromocytoma.
Moreover, they identified five additional unrelated patients with mutations in this gene (2 truncating and 3
missense). To identify susceptibility genes for familial nodular Hodgkin’s lymphoma, Saarinen et al.492
used information from linkage analysis of a large family in conjunction with exome sequencing of one
family member to narrow the list of candidates with a deleterious mutation segregating in the affected
family members and not present in controls to one gene: a 2-bp deletion in NPAT. Further sequencing of
this gene in other unrelated patients identified no other rare deletrious mutations in NPAT but they did
find a common amino-acid deletion that seemed to be significantly more frequent in Hodgkin’s patients
than controls (4.2% vs. 1.1%, OR 4.11, p=0.018). Gene expression array demonstrated decreased NPAT
50
mRNA in carriers of the 2-bp deletion. These findings, in addition to the fact that NPAT shares a putative
promoter with another known tumor suppressor gene (ATM) and is thought to have a role in cell cycle
regulation, suggest that NPAT germline mutations predispose to nodular Hodgkin’s lymphoma.
One of the promises of whole-genome and exome sequencing is the power to bridge the gap occupied by
low-frequency moderately penetrant variants in explaining disease heritability which until recently could
not be identified by family-based studies (because they usually do not segregate with disease) nor by
genome-wide association studies based on common SNPs.493 Such variants have been identified in the
past through candidate gene sequencing in cases, and require relatively large case-control studies to
demonstrate significant enrichment in the disease population. (e.g. BRIP1 in prostate cancer494; CHEK2 in
breast cancer495). With the increasing number of exomes or whole genomes being sequenced, it is
possible to capture those functional variants on a genome-wide level. For example, a recent report
describes whole-genome sequencing of approximately 450 Icelandic individuals then imputes the
genotype of detected variants in a large cohort of Icelandic ovarian cancer cases and controls, thus
identifying the most significant association to be for an intronic SNP in BRIP1. Subsequent fine-mapping
of the associated regions revealed a 2-bp deletion in exon 14 of BRIP1 that was in partial linkage
disequilibrium with the intronic SNP, and which had an odds ratio > 8 for ovarian cancer. Alternatively,
exome or whole-genome data itself may reveal the functional variant directly in family-based studies,
although the challenge lies in determining which non-segregating rare/low-frequency variant is causally
important. In a recent study by Yokoyama et al.496, whole-genome sequencing of a single member of a
large familial melanoma kindred identified over 400 germline variants, one of which was a missense
variant in a gene called MITF. Genotyping of this variant in the remaining family members demonstrated
non-segregation (only three of eight affected members carried the variant). However, due to interest in
the previously reported role of MITF in development of melanoma, the investigators genotyped this
variant in two large case-control cohorts and identified a significantly elevated frequency of the MITF
variant in cases, with an odds ratio of approximately 2, supporting the hypothesis that this low-frequency
variant is enriched in familial cases and confers a moderate risk of melanoma. In a similar study by Park
et al.497 in which members of four early-onset, multiple-case breast cancer pedigrees underwent exome
sequencing, a functionally interesting gene (FAN1) with two deleterious-predicted missense variants in
two families (one family segregated while the second did not segregate the variant) was identified, but
Parks et al.497 reported no statistically significant association of the variant with breast cancer in two case-
control analyses.
51
Chapter 2 - Loss of Heterozygosity at BRCA1 Locus in Pancreatic Adenocarcinoma
The contents of this chapter have been published in Human Genetics 2008 Oct;124(3):271-8.
PMID: 18762988 [http://www.springerlink.com/content/9723278j89678256/] The final publication is
available at www.springerlink.com. (I am first author).
1. Abstract Although the association of germline BRCA2 mutations with pancreatic adenocarcinoma is well
established, the role of BRCA1 mutations is less clear. We hypothesized that loss of heterozygosity at the
BRCA1 locus occurs in pancreatic cancers of germline BRCA1 mutation carriers, acting as a “second-hit”
that contributes to tumorigenesis. Seven germline BRCA1 mutation carriers with pancreatic
adenocarcinoma and 9 patients with sporadic pancreatic cancer were identified from clinic- and
population-based registries. DNA was extracted from paraffin-embedded tumor and non-tumor samples.
Three polymorphic microsatellite markers for the BRCA1 gene, and an internal control marker on
chromosome 16p, were selected to test for loss of heterozygosity. Tumor DNA demonstrating loss of
heterozygosity in BRCA1 mutation carriers was sequenced, to identify the retained allele. The loss of
heterozygosity rate for the control marker was 20%, an expected baseline frequency. Loss of
heterozygosity at the BRCA1 locus was 5/7 (71%) in BRCA1 mutation carriers; tumor DNA was available
for sequencing in 4/5 cases, and three demonstrated loss of the wild-type allele. Only 1/9 (11%) sporadic
cases demonstrated loss of heterozygosity at the BRCA1 locus. Loss of heterozygosity occurs frequently
in pancreatic cancers of germline BRCA1 mutation carriers, with loss of the wild-type allele, and
infrequently in sporadic cancer cases. Therefore, BRCA1 germline mutations likely predispose to the
development of pancreatic cancer, and individuals with these mutations may be considered for pancreas
cancer screening programs.
2. Introduction As discussed in the Literature Review section of the thesis, identifying genes implicated in predisposition
to FPC is important for developing early-detection and prevention strategies as well as more effective
therapeutic options. Several hereditary syndromes due to mutations in tumor suppressor/caretaker genes
cause an elevated risk of pancreatic cancer. These syndromes contribute to a small proportion of familial
cases, and it is expected that other genes play an important role136. Both BRCA1 and BRCA2 were
initially identified as highly penetrant genes in familial breast and ovarian cancer, but germline mutations
of these genes are also associated with several other malignancies498. Studies of cancer risks in BRCA2
52
germline carriers have reported a relative risk of 3.51 – 6.61 for pancreatic cancer498-500, and it is
estimated that BRCA2 mutations contribute to 6-19% of FPC cases103,121,501,502. Molecular genetic studies
have confirmed the role of BRCA2 inactivation in the development of pancreatic cancer115,503-507.
As with BRCA2, clinic-based studies have suggested an increased risk of pancreatic cancer in germline
BRCA1 mutation carriers508,509. There is also evidence for downregulation of BRCA1 expression in
sporadic pancreatic cancer tumors510. However, the aforementioned levels of evidence are much weaker
for BRCA1 compared to BRCA2. Inactivation of the wild-type BRCA1 allele in breast and ovarian cancer
most commonly occurs by loss of heterozygosity (LOH)511. We hypothesized that LOH at the BRCA1
locus occurs in pancreatic cancers of germline BRCA1 mutation carriers, acting as a “second-hit” event
contributing to pancreatic tumorigenesis. In this study, we compared the rate of LOH at BRCA1 in
pancreatic tumors in mutation-carriers and patients with sporadic pancreatic cancers.
3. Materials & Methods Ethical approval for this study was obtained from the Mount Sinai Hospital Research Ethics Board.
Microdissection and DNA extraction from formalin-fixed paraffin-embedded (FFPE) tissue, primer
design and optimization for sequencing, PCR amplification, and interpretation of genotyping and
sequencing results was performed by W. Al-Sukhni. Microsatellite genotyping and Sanger sequencing
was performed by the Analytical Genetics Technology Centre (AGTC) at Princess Margaret Hospital,
Toronto.
3.1 Tissue Specimens Germline BRCA1 mutation carriers were identified by: (1) clinic-based recruitment of incident cases of
pancreatic cancer at the University of Toronto, as described in a previous report by our group121; and (2)
population-based recruitment of pancreatic cancer cases through the Ontario Pancreas Cancer Study
(OPCS)45. BRCA1 testing was performed at provincial labs in most cases due to a strong history of
breast/ovarian; in one case, a BRCA1 mutation was identified by our research group as part of 102
unselected hereditary pancreatic cancer patients screened for several germline mutations. This latter
mutation was subsequently confirmed by testing in an offsite provincial lab121. All seven mutation
carriers included in this study had pathologically-confirmed adenocarcinoma of the pancreas. Pancreatic
tumor resection or biopsy specimens were obtained for all patients. Non-tumor tissue and/or blood
samples were also obtained for each patient. Microdissected, formalin-fixed paraffin-embedded samples
were prepared from each tumor (≥ 70% cellularity) and non-tumor specimen, and DNA was extracted
using the QIAmp DNA FFPE Tissue Kit, as per the manufacturer’s recommendations (QIAGEN Inc.,
Mississauga, Ontario, Canada). Blood lymphocyte DNA was extracted using standard Ficoll-Paque
53
technique, as per the manufacturer’s recommendations (Amersham Biosciences, Baie d’Urfe, Quebec,
Canada).
Nine patients recruited through the clinic-based Familial Gastrointestinal Cancer Registry (FGICR)121
with newly-diagnosed pancreatic cancer and no known BRCA1 germline mutations or family history of
breast/ovarian syndrome were selected for comparison. Tumor and non-tumor/lymphocyte DNA was
similarly extracted for each patient.
All patients were deceased before this study was performed; tissue specimens were previously banked for
research after obtaining consent from patients or from family members.
3.2 LOH Assay Three microsatellite markers linked to the BRCA1 locus were used for LOH analysis: D17S855,
D17S1322, and D17S579. The first two markers are intragenic. (See Figure 1 for locations of
microsatellite markers on chromosome 17)
Figure 1 - Location of BRCA1 microsatellite markers on chromosome 17
Figure 1 Legend: D17S1322 and D17S855 are intragenic (in introns 19 and 20, respectively), while
D17S579 is distal to BRCA1. The distance in base pairs between markers is identified.
Primer pair sequences were published in previous studies576-578, and primers were purchased from
Invitrogen Canada Inc. (Burlington, Ontario, Canada). Primer sequences are listed in Appendix Table S1.
A microsatellite marker on 16p (D16S2616) was selected as an internal control. The expected allelic loss
rate on this chromosomal arm in sporadic and FPC is 20-25%.181,182
For each primer pair, a (FAM-6) 5’-labeled forward primer and an unlabeled reverse primer were used.
Platinum Taq DNA Polymerase from Invitrogen was used for polymerase chain reaction amplification.
For each reaction, 20-25ng of genomic DNA were amplified in 25 µL reaction volume containing 10X
54
PCR buffer (Invitrogen Canada Inc.), 2mM MgCl2, 0.5µL of 10mM dNTP, 1-1.5µL of 10mM primers,
and 0.2µL of Invitrogen Platinum Taq DNA Polymerase. Initial denaturation was performed at 95°C x 2
minutes; followed by 35 cycles of (a) 94°C x 30 seconds, (b) primer-specific annealing temperature x 30
seconds, and (c) 72°C x 30 seconds; and final extension at 72°C x 5 minutes.
Automated DNA fragment analysis was performed using the ABI 3100 Prism sequencer (Applied
Biosystems), and GeneMapper Software version 3.7 was used to measure the allelic peak intensities. A
case was informative for a particular marker if two distinct alleles were amplified in the non-
tumor/lymphocyte DNA. Allelic peak ratio was calculated in informative cases as (T1/T2)/(N1/N2),
where T1, N1 = peak intensities for larger alleles; T2, N2 = peak intensities for smaller alleles; T = tumor
DNA; N = non-tumor or lymphocyte DNA (Figure 2).
Figure 2 - Sample electropherogram of microsatellite marker fragment analysis
Figure 2 Legend: T=tumor DNA; N=non-tumor/lymphocyte DNA; T1,N1=peak intensities of larger alleles;
T2,N2=peak intensities of smaller alleles; Allelic peak ratio = (T1/T2)/(N1/N2); LOH = 0.70 > allelic ratio > 1.43
An allelic ratio of < 0.70 or > 1.43 was considered evidence of LOH in tumor DNA. Results were
confirmed with at least 2 separate PCRs.
3.3 Tumor DNA Sequencing in BRCA1 Mutation Carriers For carriers of germline BRCA1 mutations who demonstrated LOH in their pancreatic tumors, the DNA
of the pancreatic cancer tissue was sequenced to determine if the wild-type or mutated allele was retained.
Since paraffin-extracted DNA was being amplified, unique primers were designed for each BRCA1
mutation to obtain amplification products < 110 bp. Appendix Table S2 lists primer sequences. Non-
tumor/lymphocyte DNA was sequenced for comparison for each case. Unlabeled primers were purchased
from Invitrogen. The ABI Prism 3130 XL Genetic Analyzer (Applied Biosystems) was used to perform
automated sequencing. The forward primer was used for sequencing, and results were confirmed by
sequencing two independently amplified PCR products for each sample.
55
4. Results
4.1 Patient Characteristics Table 4 compares the characteristics of BRCA1 mutation carriers and sporadic pancreatic cancer patients.
Table 4 - Characteristics of BRCA1 mutation carriers and sporadic pancreatic cancer patients
Patient Characteristic BRCA1 Mutation Carriers (N=7)
Sporadic Pancreatic Cancer (N=9)
Gender (F:M) 0:7 4:5 Age at diagnosis with pancreatic cancer, years (mean +/- SD)
65.4 +/- 12.2
63.6 +/- 10.9
Ethnicity: (n;(%)) Ashkenazi Jewish
Caucasian Other
5 (71%) 2 (29%)
0
0
8 (89%) 1 (11%)
Source of specimen: (n;(%)) Whipple resection
Biopsy Autopsy
2 (29%) 4 (57%) 1 (14%)
6 (67%) 3 (33%)
0 BRCA1 mutation:
5382insC 185delAG 2318delG
3 3 1
N/A N/A N/A
Families with BRCA1 mutations demonstrated a history of breast +/- ovarian cancer, and four families
also had ≥ 2 pancreatic cancer cases (one of these cases has been previously reported)121. Most BRCA1
mutation carriers were of Ashkenazi Jewish descent, whereas we excluded patients with Jewish ancestry
from the sporadic cancer group due to the elevated prevalence of BRCA1 mutations in this population.
The two founder Ashkenazi Jewish BRCA1 mutations, 5382insC and 185delAG, were present in the
majority of mutation carriers (6/7 families). Table 5 summarizes the pedigree information for the seven
mutation carriers.
Table 5 - Pedigree summary for BRCA1 mutation carriers
BRCA1 mutation carrier ID
Ethnicity Mutation Age at diagnosis of PC (years)
Number of relatives with
PC
Number of relatives with
BC and/or OC
Tumors at other sites
BRC-1 AJ 5382insC 79 2 (brother, 1st cousin)
6 CRC
BRC-2 Caucasian 5382insC 57 1 (1st cousin) 5 -
BRC-3* AJ 5382insC 52 1 (son) 1 (sister; dx age 42)
-
BRC-4 AJ 185delAG 77 0 1 (daughter; dx age 39)
Prostate
56
BRC-5 AJ 185delAG 76 0 3 Prostate
BRC-6 Caucasian 2318delG 51 0 6 -
BRC-7 AJ 185delAG 66 2 (sister, 1st cousin)
3 -
AJ = Ashkenazi Jewish; PC = pancreatic cancer; BC = breast cancer; OC = ovarian cancer; CRC = colorectal cancer *This patient did not have molecular testing to confirm mutation; his brother and son both have confirmed 5382insC mutation
The mean age at diagnosis was similar for the two groups: 65.4 years in mutation-carriers vs. 63.6 years
in sporadic patients. Three BRCA1 mutation carriers had a history of other malignancies: two prostate
cancer and one colorectal cancer. No sporadic cancer patient had a history of multiple primary tumors.
4.2 LOH Analysis All cases (BRCA1 mutation carriers and sporadic cancers) were informative for at least one BRCA1
marker. D17S855 was informative in 11/16 (69%) cases; D17S1322 and D17S579 were each informative
in 13/16 (81%) cases. The internal control marker D16S2616 was informative in 10/16 (63%) of all
cases. Two BRCA1 mutation carriers did not have enough tumor DNA to test for LOH with D16S2616;
tumor DNA from one sporadic cancer patient could not be amplified when testing for LOH with
D17S855.
Table 6 shows the LOH results for each case with each marker.
Table 6 - LOH results for BRCA1 mutation carriers and sporadic pancreatic cancer cases
BRCA1 Mutation Carriers
Sporadic Pancreatic Cancer Cases
Case ID
Marker
BRC 1
BRC 2
BRC 3
BRC 4
BRC 5
BRC 6
BRC 7
SPR 1
SPR 2
SPR 3
SPR 4
SPR 5
SPR 6
SPR 7
SPR 8
SPR 9
D17S855 + + + U + + U + U * - - U - - -
D17S1322 - + U - U + - + - - - - - U - -
D17S579 - U U - + + - + U - - - - - - -
D16S2616 U + * - * U - - - + - - - - U U
(+) = LOH [1.43 < allelic peak ratio < 0.70] (-) = No LOH [1.43 > allelic peak ratio > 0.70] (U) = uninformative sample (homozygous at the tested microsatellite marker in germline DNA) (*) = DNA unavailable for amplification/DNA did not amplify
57
Ten cases in total were successfully tested with D16S2616, and only 2/10 (20%) demonstrated LOH.
Five of seven (71%) BRCA1 mutation carriers demonstrated LOH with at least one marker, whereas only
one of nine (11%) sporadic cancer cases demonstrated LOH with any BRCA1 marker (p = 0.035, 2-tailed
Fisher’s Exact test). In four of the five BRCA1-mutated cases with LOH, the allelic peak ratio was < 0.5
or > 2.0. (See Figure 3 for representative genotyping results).
Figure 3 - Three representative matched-pair electropherograms for microsatellite LOH
Figure 3 Legend: T=tumor DNA; N=non-tumor DNA. (a) and (b) represent LOH; (c) represents no LOH
The histopathologies of pancreatic tumors from BRCA1 mutation carriers were moderately- and poorly-
differentiated ductal adenocarcinoma, with no distinguishing pathologic characteristics of tumors with
LOH compared to tumors without LOH.
4.3 Sequencing to Identify Retained Allele in LOH Tumors Four of five BRCA1-mutation carriers demonstrating LOH had sufficient tumor DNA for sequencing.
Three cases (BRC-1, BRC-2, and BRC-3) had the 5382insC mutation, and one (BRC-6) the 2318delG
mutation. Three of four sequenced cases (BRC-2, BRC-3, and BRC-6) demonstrated loss of or decrease
in wild-type allele, while BRC-1 was inconclusive. (Figure 4 demonstrates a sample sequencing result)
58
Figure 4 - Representative sequencing result for an individual with 5382insC germline BRCA1 mutation
Figure 4 Legend: T=tumor DNA; N=non-tumor/lymphocyte DNA. The top panel demonstrates
sequencing of two alleles in non-tumor DNA (mutant and wild-type allele); the bottom panel demonstrates only the mutant allele sequence in tumor DNA of the same individual.
Of note, patient BRC-3 who did not have molecular confirmation of the germline mutation was
successfully sequenced for the 5382insC mutation carried by his brother and son, confirming that he is a
carrier.
5. Discussion This analysis sheds light, at the molecular level, on the putative role of BRCA1 in pancreatic cancer
tumorigenesis. The importance of LOH as a “second-hit” in tumorigenesis is well-established in many
cancers. Since BRCA1 inactivation occurs via LOH in the majority of breast and ovarian tumors in
BRCA1-mutation carriers, we hypothesized that LOH also plays a primary role in inactivation of BRCA1
in mutation-positive pancreatic cancer. Indeed, we found that the majority of our mutation-positive
pancreatic cancer subjects (5/7) did demonstrate LOH in tumor DNA. In comparison, we found that only
1/9 sporadic cancer patients demonstrated LOH at the BRCA1 locus in tumor DNA. It is possible that the
remaining two subjects had inactivation of their wild-type allele by epigenetic methylation of the
promoter; promoter hypermethylation of the wild-type allele in a minority of BRCA1 mutation-positive
breast tumors has been previously reported512. Due to the limitations of quantity and quality of our
59
paraffin-embedded specimens, we were not able to correlate LOH with decreased BRCA1 expression.
However, our sequencing results did confirm loss of wild-type in most of the cases with LOH, suggesting
that only the truncated protein product from the mutated allele would be expressed in those cases.
The link between BRCA2 mutations and pancreatic cancer is well-established, and most recommend
including this gene in mutational screening for high-risk pancreatic cancer individuals and their relatives.
However, the contribution of germline BRCA1 mutations to increased risk of pancreatic cancer is less
clear. Both BRCA1 and BRCA2 have important roles in the repair of double-stranded DNA breaks.513 A
number of anecdotal reports have described pancreatic cancer in association with BRCA1 mutations.514,515
Our group previously identified 38 individuals from a group of 102 pancreatic cancer patients who were
considered to have intermediate/high-risk families, of whom one Ashkenazi Jewish patient screened
positive for a deleterious BRCA1 mutation.121 A study by Tonin et al.516 screened 220 Ashkenazi Jewish
breast cancer families for BRCA1 and BRCA2 mutations, and reported pancreatic cancer in 11/91 families
with a BRCA1 mutation compared to 5 cases in 120 families without BRCA1 mutations. More recently,
Skudra et al.122 screened 90 consecutive Latvian patients presenting with pancreatic cancer and 640
controls for several germline BRCA1 mutations, including two Latvian founder mutations (5382insC,
4154delA) and two less common mutations (300T>G, 185delAG) in the BRCA1 gene. Four of 90 (4.4%)
pancreatic cancer patients were found to carry a BRCA1 mutation compared to 1/640 (0.15%) controls. It
was noted, however, that the rate of mutation in controls likely underestimates the true prevalence of the
founder mutations in the general Latvian population since control subjects were relatively older, hence
selecting against highly penetrant mutations.
Two large studies used family-based designs to study cancer risk at sites other than breast or ovary in
families with multiple breast/ovarian cancers or with young age of onset of breast cancer. There was some
overlap in the families used between the two studies, but different analytical methods were used.508,509,517
Both studies found a statistically significant association for pancreatic cancer, albeit lower than the
association with BRCA2: Brose et al.509 reported a three-fold increase in pancreatic cancer risk among
BRCA1 carriers (3.6%, compared to 1.3% estimated general population risk); Thompson et al.508 reported
a relative risk of 2.26 (95% CI 1.26-4.06) for developing pancreatic cancer in BRCA1 mutation carriers,
with a greater association in individuals diagnosed under age 65 (RR 3.10, 95% CI 1.43-6.70). One
limitation of these studies was the family-based design, which may overestimate cancer risks due to
possible confounding effects of other genetic and/or environmental factors shared by members of a
family. To circumvent this problem, Risch et al.498 performed a population-based study of 1171
unselected women from Ontario, Canada who presented with new-onset ovarian carcinoma. Subjects
were screened for BRCA1 and BRCA2 mutations, and information about other cancers in their first-degree
relatives was used to estimate cancer risk at other sites in mutation carriers, and compared to estimated
60
cancer incidence rates in Ontario. Seventy-five BRCA1 mutation carriers were identified, and a relative
risk of 3.1 was calculated for pancreatic cancer; however, this was not statistically significant (95% CI
0.45-21).
More recently (and subsequent to completion of our study), Ferrone et al.502 published an analysis of
unselected Ashkenazi Jewish patients who underwent pancreatic cancer resection and found no significant
increase in BRCA1 frequency relative to the general Ashkenazi population (1.3% vs. 1.1%); however, the
BRCA1 mutation rate was based on previous reports and not directly assessed in a control cohort in this
study, and the authors acknowledged that the small size (145 subjects) may have resulted in insufficient
power to detect a statistically significant difference. Axilbund et al.123 did not find carriers of BRCA1
mutations in 66 FPC patients (defined as having at least two additional relatives with pancreatic cancer),
but most of the subjects did not report Ashkenazi Jewish ancestry. In the non-Jewish North American
population, the estimated frequency of BRCA1 mutations is 1/500-1/800518,519; this suggests that Axilbund
et al.’s study was underpowered to identify an association of BRCA1 with FPC unless the effect size was
at least 15-fold, a value exceeding the estimated risk of BRCA2. Kim et al.520 reported a statistically
lower age of onset for pancreatic cancer in BRCA1-mutation carriers than in non-carriers.
For our study, we identified seven unrelated individuals with pathologically-confirmed pancreatic
adenocarcinoma whose families have BRCA1 mutations. In all but one of these cases, a molecular
confirmation of the mutation was previously available. The patient without molecular confirmation had a
brother and son who carried the identical 5382insC mutation; we later confirmed the presence of the same
mutation in this patient when we sequenced his tumor DNA to identify the remaining allele. The age at
diagnosis of pancreatic cancer did not differ significantly between the mutation carriers and sporadic
cases; this is similar to findings of other studies.515,521 Though further studies are needed to definitively
determine if BRCA1 is associated with increased pancreatic cancer risk, current data suggests that the
penetrance of BRCA1 mutations for pancreatic cancer is lower than that of BRCA2.498 Moreover, some
studies have suggested that some pancreatic cancer patients with BRCA2 mutations may not have a family
history of breast or ovarian cancers.501,522 It is not clear if the same may be true for pancreatic cancer
patients with BRCA1 mutations; most studies to date have characterized families selected for breast or
ovarian cancer.
Possible sources of experimental artifact include contamination of microdissected tumor cells with
adjacent stromal cells and potential bias from PCR-based microsatellite assay. Measures to reduce the
impact of such bias included using microdissected tumor samples with minimum 70% cellularity, as
identified by an experienced pathologist), and confirming PCR-based results with at least two separate
PCR experiments. Since FFPE-specimens often yield DNA of variable quality as a result of nucleic acid
61
cross-linking by the fixation process, we minimized potential bias from degraded DNA by selecting
primers for microsatellite markers that amplify small fragments (125-150bp). Due to the limitation of
available DNA, and the amplicon size restriction in selecting microsatellite markers, we were limited to
just three BRCA1 markers for our experiments. However, every sample produced informative results for
at least one marker, and most generated results for two or more markers. We also attempted to include an
internal control, an unrelated microsatellite marker at chromosome 16 with a previously reported LOH
frequency of 20-25%. Due to technical reasons and inadequate DNA for further testing, only three of the
seven familial samples successfully amplified this marker, with 1/3 demonstrating LOH. In comparison,
seven of nine sporadic cases amplified this internal control marker, with 1/7 showing LOH. Overall, 2/10
(20%) of samples showed LOH at this locus, consistent with previous reports. Although the inadequate
number of informative samples among the familial cases reduced the value of this control in our
comparison, our results remain valid given the confirmatory Sanger sequencing that demonstrated
decreased signal for the functional allele in tumors from samples that demonstrated LOH.
Our small sample size (seven germline BRCA1 mutation carriers with pancreatic cancer) reflects the
challenges inherent in studying a malignancy as lethal as pancreatic cancer, in which only 15% of cases
are resectable. To our knowledge, this is the first molecular genetic study investigating BRCA1 LOH in
pancreatic cancer of germline BRCA1 mutation carriers. Two previous studies have investigated BRCA1
in sporadic pancreatic tumors. Beger et al.510 used quantitative reverse-transcription PCR (qRT-PCR) and
immunohistochemistry antibody staining to analyze BRCA1 and BRCA2 gene expression in 13 normal
pancreas samples, 30 chronic pancreatitis samples, and 53 sporadic pancreatic adenocarcinomas. They
found decreased BRCA1, but not BRCA2, mRNA and protein expression in 50% of pancreatic cancer
samples, and also found decreased BRCA1 mRNA expression in chronic pancreatitis samples, whereas
normal expression was observed in normal pancreatic tissue. Correlation of these findings with clinical
information demonstrated worse 1-year survival in patients whose tumors had reduced BRCA1
expression, compared to patients with normal BRCA1 expression. Another study by Peng et al.523 found
that BRCA1 was frequently methylated in sporadic pancreatic adenocarcinoma as well as in ductal cells
showing inflammatory background without histologic change. The authors suggested that promoter
methylation of the BRCA1 gene may be the mechanism explaining the reduced gene expression reported
by Beger et al.510 in pancreatic cancer and in chronic pancreatitis. However, they noted heterogeneity of
methylation in different sections of the same tumor, and they did not directly measure gene expression
level, so it is not clear how promoter methylation impacted expression. Moreover, they found
methylation of BRCA1 even in normal ductal cells. Our study adds to the evidence for BRCA1 in
pancreatic tumorigenesis by specifically demonstrating an inactivating mechanism in the pancreatic tumor
62
DNA of BRCA1 mutation carriers, likely akin to the role of BRCA1 in breast and ovarian cancer
tumorigenesis.
Determining the association between BRCA1 and pancreatic cancer has diagnostic and therapeutic
implications. The implication of BRCA2 in pancreatic cancer has allowed incorporation of this gene in
mutational screening panels and identification of kindreds at risk; the same can be done for BRCA1. As
for treatment, current chemotherapeutic protocols for pancreatic cancer are based on 5-FU and
gemcitabine.524 Interestingly, in-vitro and in-vivo studies have found BRCA1-deficient tumors to be
particularly sensitive to certain chemotherapeutic agents that take advantage of the impaired DNA repair
mechanism that characterizes these tumors, such as cross-linking agents (e.g. Mitomycin C), type II
topoisomerase inhibitors (e.g. etoposide), and PARP1 (Poly ADP-ribose polymerase family, member 1)
inhibitors.525-527 Recently, case reports and small series have shown that patients with BRCA1 or BRCA2
mutations respond to such therapies.174,178,528,529,530
In conclusion, we demonstrate that LOH occurs at the BRCA1 locus in pancreatic cancers of BRCA1-
mutation carriers, suggesting that this gene is inactivated in these tumors and may play a role in
pancreatic tumorigenesis. Further research into the role of BRCA1 in pancreatic cancer is needed to
assess the expression of this gene in pre-invasive and invasive pancreatic lesions. Subjects with germline
BRCA1 mutations should be considered for inclusion in pancreas cancer screening programs, and they
may benefit from chemotherapies that target the DNA repair pathway.
63
Chapter 3 - Germline Genomic Copy Number Variation in Familial Pancreatic Cancer
The contents of this chapter have been published in Human Genetics 2012 Jun 5 (Epub ahead of print).
PMID: 22665139 [http://www.springerlink.com/content/6665070t28854647/]. The final publication is
available at www.springerlink.com. (I am first author).
1. Abstract Adenocarcinoma of the pancreas is a significant cause of cancer mortality, and up to 10% of cases appear
to be familial. Heritable genomic copy number variants (CNVs) can modulate gene expression and
predispose to disease. We hypothesized that genes overlapped by rare germline genomic losses or gains
identified exclusively in pancreatic cancer patients from high-risk families are candidate FPC genes. A
total of 120 FPC cases and 1194 controls were genotyped on the Affymetrix 500K array, and 36 cases and
2357 controls were genotyped on the Affymetrix 6.0 array. Detection of CNVs was performed by
multiple computational algorithms and partially validated by quantitative PCR. We found no significant
difference in the germline CNV profiles of cases and controls. A total of 93 non-redundant FPC-specific
CNVs (53 losses and 40 gains) were identified in 50 cases, each CNV present in a single individual.
FPC-specific CNVs overlapped the coding region of 88 RefSeq genes. Several of these genes have been
reported to be differentially expressed and/or affected by copy number alterations in pancreatic
adenocarcinoma. Further investigation in high-risk subjects may elucidate the role of one or more of these
genes in genetic predisposition to pancreatic cancer.
2. Introduction As illustrated in Chapter 1 of this thesis, a small proportion of familial pancreatic cancer cases can be
attributed to known cancer genes, such as Hereditary Breast and Ovarian Cancer (HBOC);
BRCA2/BRCA1/PALB2;Peutz-Jeghers Syndrome (PJS), STK11; Familial Atypical Multiple Mole
Melanoma (FAMMM), p16/CDKN2A; and Hereditary Pancreatitis (HP), PRSS1. However, most cases of
Familial Pancreatic Cancer (FPC) have an unknown genetic etiology.136 Segregation analysis of families
with multiple affected members suggests that FPC is caused by heritable alterations in at least one rare
“major gene”, likely in an autosomal dominant manner.161 Moreover, multiple case-control and cohort
studies have demonstrated that members of FPC families, particularly those with an affected first-degree
relative, have a significantly elevated lifetime risk of developing the disease (up to 32-56 fold).156;158,160
However, to date traditional methods of linkage analysis for identifying predisposition genes have met
with challenges in studying FPC, due in part to probable genetic heterogeneity as well as difficulty in
64
collecting DNA specimens on multiple affected members in a family due to the rapid mortality of the
disease.
Recently, it has become clear that submicroscopic copy number variants (CNVs) are prevalent throughout
all genomes, accounting for at least 1.2% of nucleotide variation between any two individuals.238 CNVs
have been linked to rare genomic disorders531 as well as common neurodevelopmental196, psychiatric532,
autoimmune533 and metabolic534 diseases. Some studies have suggested an association between common
CNVs and sporadic cancers (e.g. pancreatic cancer (6q13)344, neuroblastoma (1q21.1)340, prostate cancer
(2p24.3; 20p13; GSTT1)338,341,342, nasopharyngeal carcinoma (6p21.3)343, and endometrial cancer
(GSTT1)535). The recent paper by Huang et al.344 is the first to describe an association of a germline CNV
with pancreatic cancer risk: a common 10,379bp deletion at 6q13 was found to be higher in frequency in
sporadic pancreatic cancer patients compared to controls, with an odds ratio of 1.31 for 1-copy carriers
compared to 2-copy carriers. Interestingly, functional analysis of this non-genic deletion suggested that it
may be involved in long-range regulation of CDKN2B, an established tumor-suppressor gene.
In addition, it is well known that rare germline CNVs contribute to the genetic basis of familial cancer.
Indeed, large germline genomic rearrangements cause 15% of Familial Adenomatous Polyposis (APC
gene)311, 2% of breast and ovarian cancer (BRCA1 gene)536, and 5% of Lynch Syndrome (MSH2 & MLH1
genes)321 cases. In 1-3% of Lynch Syndrome patients, the causative mutation is a large heritable deletion
at the 3’ end of the TACSTD1 gene, which causes transcriptional read-through and epigenetic silencing of
the adjacent MSH2 gene.336 Furthermore, a report by Shlien et al.348 identified an elevated frequency of
germline CNVs in individuals with Li Fraumeni syndrome (TP53 mutation), and suggested that the
increased predisposition to cancer in this syndrome may be proportional to the frequency of germline
CNVs, many of which overlap known cancer genes.
Since germline CNVs implicated in familial cancers to date are rare with relatively high penetrance, we
hypothesized that familial and young-onset pancreatic cancer patients have a distinctive germline
genomic copy number variation (CNV) profile compared to non-cancer controls and that tumor
suppressor genes or oncogenes predisposing to pancreatic cancer may be overlapped by one or more
CNVs that are detected exclusively in patients. Here we present an analysis of germline CNVs detected
in 120 high-risk pancreatic cancer patients and compare them to CNVs in a large cohort of unaffected
controls.
3. Materials & Methods This study was approved by the Research Ethics Boards at Mount Sinai Hospital and University Health
Network in Toronto, Canada; Office for Human Research Studies at Dana Farber/Harvard Cancer Centre
65
in Boston, Massachusetts; Institutional Review Board at Mayo Clinic in Rochester, Minnesota;
Institutional Review Board at M.D. Anderson Cancer Centre in Houston, Texas; Office of Human
Subjects Research at Johns Hopkins University in Baltimore, Maryland; and Human Investigation
Committee at Karmanos Cancer Institute, Wayne State University in Detroid, Michigan.
DNA extraction from blood or EBV-transformed cell lines was performed by technicians at each
participating site and provided to W. Al-Sukhni. Genotyping of samples and ancestry verification on
STRUCTURE was performed by W. Al-Sukhni. Computational analysis of Affy 500K data on dChip,
CNAG, and Partek was performed by W. Al-Sukhni, with assistance from S. Joe in script-writing for
organization and filtration of data (as directed by W. Al-Sukhni). To standardize the analysis of Affy6.0
chips in the same manner used for the POPGEN and OHI controls, computational analysis of Affy6.0
data on Birdsuite and iPattern was performed by A. Lionel at TCAG. Filtration and annotation of all
CNV data was performed by W. Al-Sukhni. Validation of CNVs by qPCR was performed by W. Al-
Sukhni with technical assistance from N. Zwingerman, A. Gropper, and S. Moore. Breakpoint-mapping
of CNV by qPCR and Sanger sequencing entirely performed by W. Al-Sukhni. Comparison of case and
control CNVs and statistical analysis performed by W. Al-Sukhni.
3.1 DNA extraction DNA was extracted at each centre from either whole blood (white blood cells/lymphocytes) or EBV-
transformed cell lines. Cells were purified from whole blood using Ammonium Chloride-Tris lysis of red
blood cells. DNA was extracted using MaXtract Low Density tubes, which is an adaptation of the
standard organic solvent method of DNA extraction using phenol and chloroform. Purified DNA was
precipitated with 95% ethanol and dissolved in low TE buffer.
3.2 FPC cases recruitment Genomic DNA was extracted from peripheral blood or EBV-transformed cell lines of 133 pancreatic
cancer patients from 131 high-risk families recruited by PACGENE (Pancreatic Cancer Genetic
Epidemiology Consortium; PI, G Petersen, Mayo)165, a six-centre consortium that recruits kindreds
containing two or more blood relatives affected with pancreatic cancer for genetic studies. Inclusion
criteria in the current study included: subjects with two or more affected relatives (“3+ FPC”; N=79);
subjects with only one affected relative diagnosed at age 49 years or younger (“2 FPC”; N=22); and
subjects without affected relatives who were diagnosed at age 49 years or younger (“single young”;
N=32). (Some of the families were reassigned based on updated information after analysis – see Results
section). We included young cases with no family history of pancreatic cancer because they may have de
novo mutations in the gene(s) of interest, although we acknowledge that the definition of FPC involves
66
more than one affected member in the family. Subjects were excluded if they carried known mutations or
were in families with syndromes which predispose to pancreatic cancer (BRCA2, BRCA1, p16/FAMMM,
STK11/PJS, PRSS1/HP, Lynch Syndrome). The majority of DNA samples were extracted from blood
(N=97) and the remaining samples were from EBV-transformed lymphoblast cell lines. (Appendix Table
S3 (excel sheet on attached CD) for details.)
3.3 Controls recruitment Control samples of matched ancestry (> 95% of cases and controls reported Caucasian ancestry) were
obtained from two sources: 45 samples were healthy controls recruited by the Familial Gastrointestinal
Cancer Registry (FGICR)537 at Mount Sinai Hospital, Toronto, and 1,153 samples were recruited by the
Ontario Familial Colon Cancer Registry (OFCCR)538. Almost all control DNA samples were extracted
from blood (only 12 OFCCR controls were from lymphoblasts). (Appendix Table S4 (excel sheet on
attached CD) for details.)
In addition, we had access to CNV data for 1,234 controls recruited through the Ottawa Heart Institute
(OHI)539 and 1,123 controls of German descent recruited by the POPGEN project540. Most of the OHI
and POPGEN DNA samples were extracted from blood, and the platform for CNV detection was the
Affymetrix 6.0 array.
3.4 SNP genotyping For primary CNV discovery, 128 cases and all 1,198 FGICR + OFCCR controls were genotyped at
approximately 500,000 genome-wide SNPs on the Affymetrix GeneChip Human Mapping 500K Array
(NspI and StyI chips) according to Affymetrix standard protocol. The cases and 45 FGICR controls
genotyping was performed at The Centre for Applied Genomics (TCAG) in Toronto, while the 1,153
OFCCR controls were previously genotyped at Genome Quebec Innovation Centre as part of the
ARCTIC case-control colorectal cancer GWAS study. Briefly, whole genomic DNA was digested with
restriction enzyme (NspI or StyI) and ligated to universal adaptors, and adaptor-ligated fragments were
PCR-amplified with preference for 200bp-1,100bp size range. Subsequently, PCR amplicons were
fragmented, labeled, and hybridized to NspI or StyI chips. Chips were scanned using GeneChip Scanner
3000 7G, and Affymetrix GeneChip Command Console (AGCC) files were produced for further
processing. Intensity files (CEL) and genotype files (CHP) were converted from AGCC files using
GeneChip Operating Software (GCOS) and GeneChip Genotyping Analysis (GTYPE) software,
respectively. Genotype calls were made by Affymetrix Genotyping Console (GTC 2.1), which
implements the BRLMM genotype calling algorithm (Bayesian Robust Linear Model with Mahalanobis
67
distance classifier), using default settings (Score Threshold = 0.5, Block Size = 0, Prior Size = 10,000,
DM Threshold = 0.7).
GTC 2.1 performs a quality control (QC) analysis of the SNP genotype call rate, to estimate overall
quality of the chip hybridization, based on the Dynamic Model genotype calling algorithm. For 500K
arrays, Affymetrix considers QC < 93% call rate to suggest poor hybridization. However, QC call rate in
the range of 88-93% can also produce useable data for CNV analysis, in the experience of collaborators at
TCAG. Therefore, if we were unable to obtain rehybridized chips for some samples, we retained arrays
with QC call rate> 88% in the CNV analysis but inspected the raw calls made from those arrays to verify
if they appear to be false.
A subset of the original FPC cohort (33 samples) plus five new cases (Appendix Table S3) were
genotyped on the Affymetrix 6.0 array according to standard protocol to validate CNVs detected on the
Affymetrix 500K array as well as detect new CNVs. Arrays meeting Affymetrix quality control
guidelines of Contrast QC > 0.4 were used for further analysis. The Affymetrix Power Tools platform
was used to extract normalized intensities for each array and inter-array intensity correlation was
calculated; arrays with average correlation of > 0.9 were considered suitable for joint analysis.
3.5 Ancestry verification Subject ancestry was verified using STRUCTURE software
(http://pritch.bsd.uchicago.edu/structure.html), which infers population structure using genotype data of
unlinked markers541. We used 1,089 unlinked genome-wide autosomal SNPs that map to the Affymetrix
500K array (NspI and StyI chips), with differing minor allele frequencies across three major HapMap
populations (Caucasian (CEU), African (YRI), and Asian (CHB/JPT)). The observed alleles (major and
minor) at each SNP in HapMap populations were obtained using UCSC genome browser “Tables”
function. To determine the population cluster (assuming three ancestral populations), 270 unrelated
HapMap samples were used (90 CEU, 90 YRI, 90 CHB/JPT) as reference of known ancestry. Ancestries
were assigned using a coefficint of ancestry threshold > 0.9.
3.6 CNV discovery Figure 5 is a summary flow chart of the primary CNV discovery on the Affy500K arrays.
68
Figure 5 – Analysis of 500K arrays in FPC cases and controls
128 FPCcases
1153 OFCCR controls
Affymetrix 500K SNP arrays
(TCAG)
Affymetrix 500K SNP arrays
(Genome Quebec)
dChip CNAG Partek Genomics Suite(HMM)
Merged overlapping CNVs per sample Merged overlapping CNVs per sample
LOW CONFIDENCE CNVs(single algorithm/chip)
HIGH-CONFIDENCE CNVs(≥2 algorithms or chips)
HIGH-CONFIDENCE CNVs(≥2 algorithms or chips)
FPC-specific CNVs(HIGH-CONFIDENCE SET cases vs. controls)
LOW CONFIDENCE CNVs(single algorithm/chip)
45 FGICR controls
500K ARRAYANALYSIS PIPELINE
dChip CNAG Partek Genomics Suite(HMM)
120 Cases
8 cases excluded(noise, no longer FPC) 1194 controls
4 controls excluded (personal PC or family history suggests FPC)
CNVs in 45 controls
Figure 5 Legend: Cases and controls were analyzed in a parallel fashion on three independent computational algorithms. A high-confidence CNV set (based on support by at least two algorithms or chips) was obtained for each of cases and controls and compared.
Copy number at each SNP position was estimated using three validated Hidden Markov Model (HMM)-
based CNV-calling algorithms (dChip 2006542, CNAG 2.0543, and Partek Genomics Suite v6.3©). NspI
and StyI chips were analyzed separately for each individual. After conducting several trials of different
analysis approaches, we identified the following as the method that best addresses the noise level in our
data: for dChip and Partek, samples were analyzed in batches corresponding to the grouping of samples
during chip hybridization (to minimize “batch effect” differences in hybridization that may lead to false
differences in intensity between samples): FPC cases and FGICR controls were analyzed in two batches
(batch 1 contained 47 cases and 22 controls; batch 2 contained 81 cases and 23 controls); OFCCR
controls were analyzed on dChip and Partek in 10 batches of approximately 100 samples each. For
CNAG, use of a maximum number of samples improves CNV detection, so the full group of FPC cases
and FGICR (173 samples) were analyzed concurrently, while the ARCTIC controls were analyzed in 6
random batches of approximately 200 samples each. Default analysis settings were used for each of the
computational programs: invariant-set probe normalization and hidden markov model copy inference
69
method for dChip; “non-paired reference/test sample” category and “automated analysis” option for
CNAG; 2-probe minimum used for calling CNV on Partek Suite (HMM method). The Partek CNV
coordinates were based on hg18 genome build and were converted to hg17 to merge with dChip and/or
CNAG.
A loss was defined by two or more consecutive SNPs with estimated copy number of < 2; a gain was
defined by two or more consecutive SNPs with estimated copy number of > 2. CNVs whose size was
less than 1,100bp were excluded to avoid the bias of PCR artifact causing false calls (since the fragment
size of amplified fragments was 200-1,100bp). Losses larger than 2 Mb and gains larger than 7 Mb were
also excluded (the cut-off was based on the largest CNVs seen in cases, with intention of maximizing
sensitivity in detecting case CNVs while removing excessively large CNVs in controls that are likely
false calls and/or represent somatic events). CNVs that crossed the centromere were removed because
they were incompatible with chromosomal stability and expected to be false calls. For any given chip and
algorithm, if the number of CNVs (losses + gains) called in a sample exceeded 40 (after above filters),
that sample was eliminated from the analysis for that given algorithm and chip (i.e. considered too noisy).
For each sample on a given chip, CNVs identified by two or more algorithms with overlapping
breakpoints (same direction on all algorithms) are merged if the length of the overlap area corresponds to
at least 20% of the length of any of the overlapping CNVs (Figure 6).
Figure 6 – Criteria for merging CNVs
For each sample, CNVs identified on both chips of the 500K array with overlapping breakpoints (same
direction on both chips) are merged if the length of the overlap area corresponds to at least 20% of the
length of either of the overlapping CNVs (Figure 6). “High-confidence calls” were identified as CNVs
called by at least two different algorithms and/or on both chips. Note, if a CNV is called by a different
algorithm on each chip, it was not considered “high-confidence”. For the purpose of identifying “CNV
70
loci”, CNVs in multiple samples with overlapping CNVs are merged (using the above-described 20%
threshold).
CNV calling on Affy6.0 arrays was performed using the Birdsuite tools (Canary + Birdseye algorithms)544
and iPattern545 algorithms, using a reference set that included the 38 FPC cases in addition to 100 other
closely-correlated Affy6 arrays previously analyzed at TCAG (based on correlation coefficient > 0.9).
(Samples were also analyzed on GTC 4.1, but this data was only used to support calls made on Birdsuite
or iPattern). For each of these algorithms, we required CNVs to span 5 or more consecutive array probes
and be at least 20 kb in length. Detection by either Birdsuite or iPattern was sufficient for the purpose of
validating 500K array CNVs. Only “high-confidence” calls (i.e. called by at least two of Birdsuite,
iPattern, and/or GTC 4.1 software – boundaries of overlapping regions were determined in the same
manner as for 500K data) were included as novel FPC-specific CNVs. Samples with number of calls
greater than three times the standard deviation from the mean number of calls for an analysis batch were
excluded from the study. The combined results of Birdsuite (Canary and Birdseye) were filtered to
remove CNVs with the following: excluded centromere jumpers; excluded X chromosome variants; tag
of “loss” with a copy number of > 1 or tag of “gain” with a copy number of < 3. The iPattern results were
filtered to remove CNVs in X chromosome and CNVs tagged as “complex”.
3.7 PCR validation of CNVs Quantitative PCR validation of a subset of CNVs was performed using Invitrogen Platinum SYBR Green
qPCR Supermix – UDG, with primers designed within the CNV of interest, and MSH2-exon2 used as a
reference gene. (Appendix Table S5 for primer sequences). Standard PCR conditions were used: (50C x
2mins; 95C x 2mins; (95C x 15sec; 60C x 32sec) x 40 cycles). Reactions were performed in replicates of
4-8x per sample. A standard curve was performed on each plate using control DNA (From a single
sample for all experiments) to ensure primer efficiency is between 90%-110% (slope = -3.6 – 3.1) and the
correlation coefficient (R2) of the standard curve samples is > 0.99. Dissociation curve was checked for a
single peak (indicating a single product). Data was analyzed on the ABI 7500 real-time machine, setting
the baseline and threshold manually to reflect the exponential phase of amplification. Finally, data from
each plate was analyzed using the ddCt method546: for each sample with at least 4 replicates, one sample
may be excluded from the calculation if it falls outside the range of Mean +/- 2*SD of Ct values (range
calculated after removal of uppermost or lowermost value); a “validation” curve of dCt vs. log input DNA
amount was done for each primer set to prove that the absolute slope is <0.1, signifying that the
efficiencies of the test gene and reference gene primer sets are approximately equal. The calculations for
ddCt are made as follows:
71
dCt = mean Ct (test gene) – mean Ct (control gene (MSH2))
Standard deviation (SD) of dCt = SquareRoot[(SD Ct(gene of interest))2 + (SD Ct (MSH2))2]
ddCt = dCt (test sample) – dCt (control sample)
Fold difference in copy number = 2ddCt
SD of fold difference in copy number = Ln(2)*SD of dCt*2ddCt
3.8 Prioritization of CNVs Figure 7 illustrates the priority order for investigating CNVs detected in cases.
Figure 7 – CNV prioritization plan
Figure 7 Legend: CNVs segregating with disease in a family or de novo in single case are highest priority,
followed by recurrent CNVs in unrelated affected individuals that are not found in unaffected controls. Single-affected disease-specific CNVs are lower in priority, and least likely to yield candidate genes are CNVs found in both affecteds and unaffecteds.
We defined “FPC-specific CNVs” as losses or gains detected in FPC cases on the 500K or Affymetrix 6.0
array, and which did not overlap (by 20% or more) with losses or gains in FCIGR, OFCCR, OHI, or
POPGEN controls, nor overlapped CNVs reported from non-BAC based platforms in the Database of
Genomic Variants (DGV)547 (http://projects.tcag.ca/variation -updated Nov 2010). Although we did not
72
control for ancestry in this analysis, we did note which FPC-specific CNVs were detected in non-
Caucasian samples.
3.9 Annotation of CNVs Affymetrix 500K and Affymetrix 6.0 array coordinates were aligned to the NCBI hg17 and NCBI hg18
human genome builds, respectively. Genes overlapped by CNVs were identified through the University
of California, Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/), using the respective human
genome build. Information about CNV-overlapped genes was obtained from Entrez Gene
(http://www.ncbi.nlm.nih.gov/gene) and Pubmed (http://www.ncbi.nlm.nih.gov/pubmed/). The Memorial
Sloan Kettering Cancer Centre (MSKCC) CancerGenes database
(http://cbio.mskcc.org/CancerGenes/Select.action)548 was used to identify genes with reported pathways
or functions linked to cancer development. The Wellcome-Trust Sanger Catalogue of Somatic Mutations
in Cancer (COSMIC version 55) database (http://www.sanger.ac.uk/genetics/CGP/cosmic)549 (used
Biomart to identify all genes with mutation type “complex-compound substitution; complex – frameshift;
deletion-frameshift; insertion-frameshift; substitution-missense; substitution – nonsense; unknown”. To
get all COSMIC genes fitting these categories, the “gene” field was left empty; otherwise the desired gene
lists were used) and the Pancreatic Expression Database – version 2.0
(http://www.pancreasexpression.org)253 identified genes that had previously reported point mutations or
copy number alterations in tumors or cancer cell lines, or which were reported to be differentially
expressed in pancreatic cancer according to published gene expression studies.
3.10 Comparing Affy500K CNV profile between cases and controls Only “high-confidence” CNVs from non-EBV samples were included in the CNV profile comparison to
minimize potential cell line artifacts and false calls.278 As well, only controls with data available for both
NspI and StyI chips were included in this comparison to minimize bias of undercalling CNVs in single-
chip samples. To minimize CNV calling errors for “complex” CNVs (i.e. losses and gains in different
samples overlapping the same region), we performed the “rare CNV” analysis only on regions reported as
either losses or gains only. CNV loci that are present in fewer than 1% of the total number of samples
(cases + controls) were considered “rare”, excluding EBV samples and the complex CNVs. For losses,
32 cases and 235 controls (total 267 samples) were included in the “rare loss” analysis, so a rare loss was
defined as present in fewer than 3 individuals. For gains, 56 cases and 551 controls (total 607 samples)
were included in the “rare gain” analysis, so a rare gain was defined as present in fewer than 7 samples.
73
3.11 Statistical analysis Comparison of medians was performed using the Mann Whitney U test and comparison of means was
performed using the two-tailed Student’s t-test with Levene’s test for equal variance. Testing for
significant difference in proportions was performed with the two-tailed Fisher’s exact test. A p-value <
0.05 was considered significant. Statistical testing was performed using the SPSS© software package
(version 17).
For comparing differences in proportions of cases and controls at each CNV locus, we only considered
regions containing only losses or only gains (in cases and/or controls) for non-EBV samples, and we
excluded samples with only a single chip in the analysis. After calculating two-tailed Fisher’s exact test
p-values for each loss and gain locus, we performed a Bonferroni correction to account for multiple-
testing. The number of multiple tests was defined as the total number of loss or gain loci in the above
comparison (losses and gains were assessed separately).
3.12 Breakpoint Mapping and Sequencing To precisely identify the CNV breakpoints, qPCR was performed at several positions near the estimated
breakpoints (based on the SNP microarray results), narrowing down the estimated location of the
breakpoint to a region approximately 1,000 bp in length. (See Appendix Table S6 for primer sequences;
standard PCR conditions were used as described previously). Primers were designed to PCR-amplify the
region estimated to contain the breakpoint (see Appendix Table S6) and Sanger sequencing was used to
identify the exact base pairs delineating the breakpoint. Products were cleaned up using Qiagen MinElute
PCR purification kit. Sanger sequencing was performed by the AGTC service lab.
4. Results
4.1 Affymetrix 500K results Of the original 128 FPC cases genotyped on the Affymetrix 500K array, eight were subsequently
excluded (two subjects had excessively noisy data based on CNV count > 40 per analysis run; one subject
was discovered to have had chronic lymphocytic leukemia at the time of blood sample donation, making
it difficult to distinguish germline from somatic CNVs detected in the sample; and five subjects no longer
met inclusion criteria in light of new information that became available after the start of the study),
leaving 120 cases in the final analysis with both NspI and StyI chips represented for each sample. Some
of the subjects were reassigned to different inclusion criteria after updated information became available,
resulting in 68 “3+ FPC” subjects, 28 “2 FPC” subjects, and 24 “single young” subjects contributing to
74
the final set of case CNVs detected on Affymetrix 500K array. Two controls were discovered to have a
history of sporadic pancreatic cancer (no affected relatives), and two other controls each reported having
two relatives with pancreatic cancer, suggesting potential FPC kindreds. After excluding those four
samples, 1,194 controls remained in the final analysis. For 236 of those controls, only one chip was
included in the analysis (137 NspI only; 99 StyI only) due to inadequate hybridization of the second chip.
STRUCTURE software was used for estimating population ancestry of the 120 FPC cases and 958
controls that had NspI + StyI chips available for analysis: 89.2% of cases and 94.8% of controls were
Caucasian; 1.7% of cases and 2.1% of controls were Asian; and 9.2% of cases and 3.1% of controls were
of admixed background.
Figures 8 and 9 summarize the number of gains and losses called by each algorithm on each chip in cases
and controls.
Figure 8 – Gains and losses identified in FPC cases by each algorithm/chip
Figure 8 Legend: Number of losses and gains identified by each algorithm and resultant number of losses and gains
after merging overlapping CNVs.
75
Figure 9 - Gains and losses identified in controls by each algorithm/chip
Figure 9 Legend: Number of losses and gains identified by each algorithm and resultant number of losses and gains
after merging overlapping CNVs.
The total number of autosomal CNVs identified in cases and controls was 873 and 10,794 respectively, of
which 382 CNVs (123 losses + 259 gains) in cases and 3,115 CNVs (805 losses + 2,310 gains) in controls
were considered high confidence calls (corresponding to 66 loss loci + 105 gain loci in cases and 313 loss
loci + 467 gain loci in controls). (Appendix Tables S7 to S10 for high- and low-confidence CNVs in cases
and controls (available as excel files on attached CD)). The proportion of losses and gains considered
“high-confidence” was significantly larger in cases than in controls (losses: 48% cases vs. 33% controls,
p<0.001; gains: 42% cases vs. 28% controls, p<0.001). As well, the percentage of cases with at least one
high-confidence loss was significantly greater than controls (68% vs 47%, p<0.001), but no significant
difference existed between cases and controls in the percentage of samples with high-confidence gains
(85% vs. 80%, p=0.227). Significance testing results were the same whether or not the 236 controls with
only one chip in the analysis were included, or whether the denominator is all samples vs. only samples
that had at least one CNV call. We note that no significant difference was observed between cases and
controls when restricting the analysis to FGICR controls that were genotyped at the same centre (TCAG).
(Tables 7 and 8)
76
Table 7 - Proportion of high-confidence losses in cases and controls
% of losses that are high-confidence (HC)
% of HC losses if remove controls with only 1 chip
% of samples with HC losses
% of samples with HC losses if remove controls with only 1 chip
% of samples with HC losses among 2-chip samples with at least one loss call
Cases 48 48 68 68 76 All Controls 33 35 47 53 63 Fisher's exact p < 0.001 p < 0.001 p < 0.001 p=0.002 p=0.009 FGICR controls 41 43 51 55 64 Fisher's exact (compared to cases) p=0.303 p=0.512 p=0.070 p=0.190 p=0.190
Table 8 - Proportion of high-confidence gains in cases and controls
% of gains that are high-confidence (HC)
% of HC gains if remove controls with only 1 chip
% of samples with HC gains
% of samples with HC gains if remove controls with only 1 chip
% of samples with HC gains among 2-chip samples with at least one gain call
Cases 42 42 85 85 87 All Controls 28 29 80 86 88 Fisher's exact p < 0.001 p < 0.001 p=0.227 p=0.782 p=0.882 FGICR controls 49 50 80 81 85 Fisher's exact (compared to cases) p=0.109 p=0.086 p=0.227 p=0.626 p=0.789
4.2 Affymetrix 6.0 results In 36 cases genotyped on the Affymetrix 6.0 array (two of the original 38 samples were excluded due to
excess noise – see methods), a total of 3,364 autosomal CNVs (2,665 losses and 699 gains) were
identified using Birdsuite, and 3,266 autosomal CNVs were identified using iPattern (1,975 losses and
1,291 gains). Table 9 summarizes some key parameters of CNVs identified by each algorithm.
Table 9 - CNVs called by each of Birdsuite and iPattern in 36 samples on Affymetrix 6.0 array Birdsuite iPattern # losses 2,665 1,975 # gains 699 1,291 median size losses (bp) 7,793 10,388 median size gains (bp) 60,599 19,857 # genic losses (% of all losses) 969 (36%) 693 (35%) # genic gains (% of all gains) 512 (73%) 690 (53%) # losses called as HC losses in 500K array (in same sample) 33 35 # losses called as LC losses in 500K array (in same sample) 20 20 # gains called as HC gains in 500K array (in same sample) 70 70 # gains called as LC gains in 500K array (in same sample) 33 38 mean # losses per sample/mean # gains per sample 74/19 55/36 HC = high-confidence; LC = low-confidence on 500K array
77
The high-confidence set of Affy6 CNVs (incorporating GTC-supported CNVs) comprised 2,187 CNVs
(1,656 losses + 531 gains). (Appendix Tables S11 to S12 for high-confidence CNVs on Affy6 array in
FPC cases and controls (available as excel files on attached CD)). The median size of high-confidence
losses and gains was 12.7kb (1kb-1.4Mb) and 48.9kb (1kb-1.6Mb), respectively, and the average number
of losses and gains per genome was 46 and 15, respectively.
4.3 CNV validation Quantitative PCR was used to attempt validation of 18 losses (13 high-confidence and 5 low-confidence)
and 10 gains (all high-confidence) in FPC cases, of which all the high-confidence CNVs validated and 4/5
low-confidence CNVs validated. (Appendix Figures S1 to S32 for qPCR results). Of the 33 FPC cases
that were hybridized to both Affy 500K and Affy6.0 arrays, 31 yielded useable results on both arrays.
For those 31 cases, 113 high-confidence CNVs and 142 low-confidence CNVs were called on the 500K
array, of which 107 (95%) high-confidence CNVs and 63 (44%) low-confidence CNVs were validated on
the Affy6 array. The combined results of qPCR validation and Affy6 genotyping demonstrated a
validation rate of 95% (121/127) for high-confidence CNVs but only 45% (66/146) for low-confidence
CNVs. Therefore, the remainder of this analysis was limited to high-confidence CNVs in cases and
controls. Approximately one third (121/382) of all high-confidence case CNVs identified on the 500K
array, corresponding to half (88/171) of all high-confidence CNV loci in cases, have been confirmed by
either the Affymetrix 6.0 array and/or qPCR.
4.4 Comparing CNV profile of cases and controls We compared several characteristics of CNVs identified on the 500K array between FPC cases and
FGICR/OFCCR controls. Table 10 compares several key CNV attributes between cases and controls
(based on high-confidence CNVs and excluding EBV-derived samples and controls with only one chip in
the analysis).
Table 10 - High confidence CNV profile of cases vs. controls (excluding EBV-derived samples and excluding controls with data from only one chip)
FPC cases Controls p-value
# Lymphocyte samples 91 950 #High-confidence losses/high-confidence gains 91/190 731/2,059 Median CNV size (range) 219.5kb
(1.2kb-6.4Mb) 219.5kb
(1.2kb-6.8Mb) 0.439
Median CNV SNP count (range) 42 (2-417) 40 (2-1318) 0.578 #Genic CNVs/all CNVs Losses Gains
52/91 (57%)
153/190 (81%)
400/731 (55%)
1,646/2,059 (80%)
0.738 0.850
78
#Samples with genic CNVs/samples with any CNVs Losses Gains
43/59 (73%) 70/75 (93%)
327/500 (65%) 765/816 (94%)
0.309 0.805
#CNV genes identified as “Cancer Genes” in MSKCC CancerGenes database/all CNV genes recognized by the MSKCC database Losses Gains
8/36 (22%) 53/335 (16%)
35/264 (13%) 507/2940 (17%)
0.201 0.541
#CNV loci included in rare analysis/all CNV loci Losses Gains
36/52 (69%) 65/83 (78%)
203/290 (70%) 349/428 (82%)
1.000 0.541
#CNVs that are part of rare loci/all CNVs Losses Gains
23/91 (25%)
47/190 (25%)
199/731 (27%)
461/2,059 (22%)
0.802 0.469
#Samples with CNVs included in rare analysis/samples with any CNV Losses Gains
32/59 (54%) 56/75 (75%)
235/500 (47%) 551/816 (68%)
0.335 0.244
#Samples with rare CNVs/samples with any CNV Losses Gains
21/59 (36%) 37/75 (49%)
169/500 (34%) 348/816 (43%)
0.773 0.275
#Genic rare CNVs/all rare CNVs Losses Gains
10/23 (43%) 33/47 (70%)
69/199 (35%)
330/461 (72%)
0.491 0.866
#Samples with genic rare CNVs/samples with rare CNVs Losses Gains
10/21 (48%) 27/37 (73%)
63/169 (37%)
267/348 (77%)
0.476 0.684
Mean CNVs per genome* Losses Gains
1.5 2.5
1.5 2.5
0.443 0.956
Mean rare CNVs per genome* Losses Gains
0.4 0.6
0.4 0.6
0.919 0.498
*mean and t-test calculated for losses and gains based only on samples with at least one high-confidence loss or gain, respectively (to avoid the bias of samples which didn’t get a high-confidence CNV call due to noise)
Overall, no significant difference was observed in the CNV profile of cases and controls, including such
parameters as CNV size, proportion of genic CNVs, proportion of rare CNVs, and average number of
CNVs per individual genome. In both groups, gains were larger than losses (median size - cases: 228.7kb
vs. 176.6kb, p=0.016; controls: 224.4kb vs. 168.0kb, p<0.001) and were more likely to overlap genes
(cases: 153/190 gains vs. 52/91 losses are genic, p<0.001; Controls: 1,641/2,059 gains vs. 400/731 losses
are genic, p<0.001).
79
4.5 CNVs of interest Figure 7 summarizes the CNV prioritization plan that we applied to our data. The highest priority is
assigned to CNVs that segregate with disease status in blood relatives, or alternatively de novo CNVs in
singleton young affected subjects.
Since no trios were available for analysis, we could not determine which CNVs were de novo. Only two
pairs of siblings were genotyped, while the remaining were all unrelated subjects. In one pair of siblings
whose parents are not consanguinous, only a single gain was shared by the two siblings and this CNV was
also identified in many other cases and controls. In the second pair of siblings whose parents are first-
cousins, one loss and three gains were shared by the two siblings but all the CNVs were also shared by
controls. Hence, no FPC-specific CNVs were found to segregate in either of the two pairs of siblings.
Next in priority are CNVs that overlap in two or more unrelated cases and are absent in controls. We also
considered CNVs present in cases and controls if they met the following conditions: (1) CNV present in
two or more cases; (2) CNV overlaps gene(s) in cases; (3) the genic portion of the region is not
overlapped by control CNVs or DGV CNVs. (To ensure that we are not missing anything significant, we
assessed the data for loci overlapping two or more cases and no controls even if reported in the DGV, but
none fit this criteria). A total of 64 FPC CNVs (27 losses and 37 gains) detected on the 500K array were
not identified in FGICR or OFCCR controls. After further excluding regions that overlapped POPGEN
or OHI controls or were reported in the DGV, the number of FPC-specific CNVs identified on the 500K
array is 37 CNVs (16 losses and 21 gains). On the Affymetrix 6.0 array, 119 FPC CNVs (71 losses and 48
gains) were not identified in POPGEN or OHI controls, and after further excluding regions which
overlapped FGICR and OFCCR controls or were in the DGV, 73 FPC-specific CNVs (45 losses and 28
gains) remained. Combining results from the two arrays (including regions identified on both platforms)
yielded a total of 93 non-redundant FPC-specific CNVs (53 losses and 40 gains), each CNV present in a
single individual only (a total of 50 FPC cases, including 7 EBV-derived samples); 13 losses and 8 gains
were in non-Caucasian individuals.
One duplication (G_97) appeared to affect the same gene (TGFBR3) in two unrelated cases, albeit with
different breakpoints in each case (Figure 10). This gene codes for a receptor of TGF-beta, a signaling
molecule with an important role in pancreatic cancer initiation and progression, and decreased expression
of TGFBR3 has been observed in various cancers suggesting that it behaves as a tumor-suppressor. Given
the potential significance of this gene for pancreatic cancer, we aimed to investigate this duplication
further.
80
Figure 10 – Duplications overlapping TGFBR3 gene
Figure 10 Legend: TGFBR3 transcripts circled; red bars represent breakpoints of CNVs identified on SNP arrays
Although an overlapping duplication was also present in one POPGEN control, the control duplication
only overlapped the beginning of one of the multiple isoforms of this gene. (There was also a large low-
confidence duplication called in one of our ARCTIC controls, but this appeared to be a false call as
demonstrated by qPCR – see Appendix Figure S33). The duplication in case ID-27 was validated by
qPCR using two different primer sets. We validated the duplication in case ID-203 using those same
primer sets, and additionally tested family members for this subject for whom DNA was available.
(Figure 11; Appendix Figures S33-S38).
81
Figure 11 – Pedigree of case ID-203, indicating results of qPCR testing for duplication G_97
Figure 11 Legend: GB = gallbladder; PC = Pancreas cancer; dup = duplication identified; no dup = no
duplication identified; blood = source of DNA is lymphocytes; tissue = source of DNA is FFPE resected specimen
At this point, we observed that the mother of the proband did not carry the duplication, which weakened
the argument for this CNV being causative for pancreatic cancer (since the pancreatic cancer was
considered matrilineal in this family, with a maternal grandmother reported to have died of the disease).
However, we considered the possibility of the disease being inherited from the paternal side, particularly
since the paternal grandmother was reported to have died of “gallbladder cancer”, which could have been
a misdiagnosis of pancreatic cancer. We did not have access to DNA from the father or paternal
grandmother, but as noted in the pedigree, a sister of the proband’s had also died of pancreatic cancer.
We wished to test for segregation of the duplication with the disease, but only formalin-fixed paraffin-
embedded (FFPE) tissue was available for DNA extraction from this sister. Due to the fragmented nature
of FFPE-derived DNA (caused by cross-linking and degradation of nucleic acid by formalin
preservation), qPCR performed on FFPE-DNA can be biased and difficult to verify. Therefore, we
decided to fine-map the breakpoints of the duplication to allow Sanger sequencing of the tandem
duplication point. Our fine-mapping method involved designing qPCR probes at several positions falling
within as well as outside the array-defined boundaries of the duplication (Figure 12; Appendix Figures
S39 to S45 for qPCR results).
82
Figure 12 – Fine-mapping the breakpoint of duplication overlapping TGFBR3 using qPCR walk-along method
Figure 12 Legend: Panel [A] depicts the array-based estimation of the duplication breakpoints; panels [B] and
[C] indicate the locations of the qPCR probes at either end of the duplication (shown as small vertical black bars). Panels [B] and [C], the red arrows indicate the area between the confirmed duplicated and non-duplicated positions at either end of the CNV.
At this point, we selected two primers used for qPCR analysis (O_Out_5 and T_Out_3) to attempt PCR
amplification of the region containing the duplication breakpoint. Although we did not know at this point
the exact size of the duplication, we were able to amplify a fragment approximately 1.5-2kb in size (see
Figure 13), whereas a control sample not containing the duplication failed to amplify anything using these
primers (as would be expected).
Figure 13 – PCR gel demonstrating amplification of ~1.5-2kb fragment containing G_97 duplication breakpoint in case Id_203
Figure 13 Legend: Each well represents a separate PCR reaction (three for duplication-carrying sample and
three for non-duplication control)
A
B
C
83
We submitted the fragment for Sanger sequencing from both ends; although the size of the fragment was
too large to read completely from either primer, we obtained sufficient length of reads from each primer
such that they overlapped at the breakpoint of the duplication, thus allowing us to pinpoint the exact
location of the breakspoint (see Figure 14).
Figure 14 – G_97 duplication breakpoint mapping by Sanger sequencing
Figure 14 Legend: Sequence [A] is located at the end of G_97 that does not transect TGFBR3; the purple-highlighted
portion is seen in Sanger sequence reads from forward primer (O_Out_5) located at that end of the duplication. Sequence [C] is located at the end of G_97 that transects TGFBR3; the yellow-highlighted portion is seen in Sanger sequence reads from reverse primer (T_Out_3) located at that end of the duplication. Non-highlighted portion of each of those reads represents the normally expected sequence in each location if no duplication was present. The red-higlighted sequence is the region of the tandem duplication breakpoint that observed in each of the Sanger sequence reads from the above-described primers; note the insertion of “TAT” at the point of duplication.
Based on this information, we designed a primer set to amplify a smaller fragment encompassing the
breakpoint (~100 bp), to allow amplification of FFPE-derived DNA (obtained from non-tumor region of
the specimen block) from the affected sister of the proband. We also performed PCR amplification of
several other amplicons of similar size to control for DNA degradation, and we used case Id-203 as a
positive control for the duplication. As Figure 15 illustrates, although the FFPE DNA appeared to
amplify the four other test amplicons well, no amplification of the duplication breakpoint region was
observed in the affected sister, indicating that she did not inherit the duplication.
A
B
C
84
Figure 15 - PCR gel illustrating amplification of test regions and duplication breakpoint in case Id-203 and affected sister
Figure 15 Legend: Wells within the blue boxes belong to sister of ID_203 (source of FFPE DNA); wells
outside blue boxes belong to case ID_203 (blood-derived DNA); every fifth column is water control
4.6 FPC-specific CNVs Since the TGFBR3 duplication did not segregate with pancreatic cancer in the family we studied, and no
FPC-specific CNV occurred in more than one case, we proceeded to annotate the FPC-specific CNVs and
to prioritize them based on gene content and their association with cancer. (Figure 16 illustrates the
distribution of FPC-specific CNVs across the genome).
100 bp
100 bp
85
Figure 16 - FPC-specific losses and gains on autosomal chromosomes
Twenty-three FPC-specific losses and 23 FPC-specific gains overlapped introns, exons, and/or
untranslated regions of 104 RefSeq genes (Table 11).
Table 11 – FPC specific CNVs
CNV type CNV Id Sample Id
Coordinates (hg18) Size (kb) RefSeq Genes
Overlaps Pancreatic Expression Database CNVs?
Gain Affy6.0_G_11 127 chr1:49856085-50089082 233.0 AGBL4 no
Gain Affy500K_G_280 & Affy6_G_298 62
chr18:6838462-7291170 452.7
ARHGAP28, LAMA1, LRRC30, LOC400643
High-level amplification
Gain Affy500K_G_380 82 chr3:143693491-143928895 235.4
ATR, PLS1, TRPC1 no
Gain Affy6.0_G_324 20 (Admixed) chr19:60436319-60696243 259.9
BRSK1, UBE2S, SHISA7, TMEM190, COX6B2, no
Figure 16 Legend: Red box = loss; Green box = gain
86
FAM71E2, HSPBP1, TMEM150B, ISOC2, IL11, RPL28, TMEM238, ZNF628, SUV420H2, NAT14, PPP6R1, SSC5D
Gain Affy500K_G_136 37 (EBV) chr16:78810438-79258408 448.0
DYNLRB2, CDYL2, MIR548H4
High-level amplification
Gain Affy500K_G_615 & Affy6_G_77 125
chr7:133223330-133393933 170.6 EXOC4 no
Gain Affy6.0_G_235 99 chr15:32814039-32848252 34.2 GJD2 no
Gain Affy500K_G_365 79 chr4:93344017-93591992 248.0 GRID2 no
Gain Affy6.0_G_226 44 chr15:70381008-70436843 55.8 HEXA, CELF6 no
Gain Affy500K_G_603/604 & Affy6_G_93
123 (Admixed)
chr8:39935640-39943638 8.0 IDO2 no
Gain Affy6.0_G_39 123 (Admixed)
chr3:161448573-161518365 69.8 IFT80 no
Gain Affy6.0_G_143 17 chr10:71778181-71797516 19.3 LRRC20 no
Gain Affy6.0_G_170 20 (Admixed) chr11:65027491-65201466 174.0
LTBP3, PCNXL3, MAP3K11, MIR4489, MALAT1, RELA, SIPA1, SSSCA1, FAM89B, KCNK7, MIR4690, EHBP1L1, LOC254100, SCYL1 no
Gain Affy500K_G_176 & Affy6_G_301 44
chr18:2254263-2555103 300.8 METTL4 no
Gain Affy6.0_G_33 69 chr2:216465517-216485115 19.6 none no
Gain Affy500K_G_88 24 chr4:26691114-26985948 294.8
none
(mRNA present) no
Gain Affy500K_G_369 80 chr4:29195980-29209908 13.9 none no
Gain Affy500K_G_602 & Affy6_G_50
123 (Admixed)
chr4:72734028-72817447 83.4 none no
Gain Affy500K_G_511 107 (EBV) chr4:105853937-106127766 273.8 none no
Gain Affy500K_G_407 86 chr6:48829836-49492706 662.9 none no
Gain Affy6.0_G_70 44 chr6:132466247- 12.9 none no
87
132479169
Gain Affy6.0_G_95 99 chr8:83294045-83332227 38.2 none no
Gain Affy500K_G_49 12 (Admixed) chr9:81978854-82021829 43.0 none no
Gain Affy6.0_G_138 54 chr10:4497158-4555255 58.1 none no
Gain Affy6.0_G_152 54 chr11:41420026-41456633 36.6
none
(mRNA present) no
Gain Affy500K_G_622 & Affy6_G_158 126
chr11:81521790-81598468 76.7
none
(mRNA present) no
Gain Affy500K_G_502 106 (EBV) chr12:57378034-57482408 104.4
none
(mRNA present) no
Gain Affy6.0_G_194 69 chr13:86091484-86118457 27.0 none no
Gain Affy6.0_G_326 202 chr20:46926869-46943223 16.4 none no
Gain Affy500K_G_225 58 chr21:28431800-28667362 235.6
none
(mRNA present) no
Gain Affy500K_G_226 58 chr21:35973166-36013145 40.0
none
(mRNA present) no
Gain
Affy500K_G_105 & Affy6_G_283 & Affy6_G_284 28
chr17:2919396-3184579 265.2
OR1D2, OR1G1, OR1A2, OR1A1, OR1D4, OR3A2, OR3A1, OR3A4P no
Gain Affy500K_G_95 26 chr10:19849680-20589237 739.6 PLXDC2
High-level amplification
Gain Affy6.0_G_90 202 chr8:49008716-49049657 40.9 PRKDC, MCM4 no
Gain Affy6.0_G_3 123 (Admixed)
chr1:157133096-157188413 55.3 PYHIN1 no
Gain Affy500K_G_69 & Affy6_G_87 18
chr8:108696004-109010881 314.9 RSPO2
High-level amplification
Gain Affy500K_G_303 65 chr2:230753632-230823051 69.4 SP110, SP140 no
Gain Affy6.0_G_179 11 (Asian) chr12:81711207-81762121 50.9 TMTC2 no
Gain Affy6.0_G_212 67 chr14:73405361-73432688 27.3 ZNF410, PTGR2 no
Gain Affy6.0_G_315 62 chr19:60824299-60923809 99.5
ZNF784, NLRP9, EPN1, CCDC106, ZNF580, U2AF2, ZNF581 no
Loss Affy500K_D_125 & Affy6_D_1246 68
chr12:39394850-39501843 107.0 CNTN1
High-level amplification
Loss Affy6.0_D_870 123 (Admixed)
chr5:11220277-11229088 8.8 CTNND2 no
Loss Affy6.0_D_1507 11 (Asian) chr18:3670476- 45.1 DLGAP1 no
88
3715553
Loss Affy6.0_D_1127 123 (Admixed)
chr10:128752241-128780181 27.9 DOCK1 no
Loss Affy6.0_D_637 204 chr2:55010996-55019655 8.7 EML6 no
Loss Affy500K_D_24 & Affy6_D_1342 11 (Asian)
chr13:93544008-93670507 126.5 GPC6
High-level amplification
Loss Affy6.0_D_739 69 chr4:70867305-70952889 85.6
HTN1, HTN3, STATH no
Loss Affy500K_D_152 85 chr3:125676839-125815545 138.7 KALRN no
Loss Affy6.0_D_477 97 chr1:62528216-62538049 9.8 KANK4 no
Loss Affy6.0_D_1548 61 chr19:61684427-61697318 12.9 LOC100128252 no
Loss Affy6.0_D_844 40 chr4:178997998-179018809 20.8 LOC285501 no
Loss Affy6.0_D_911 123 (Admixed)
chr6:119578774-119604698 25.9 MAN1A1 no
Loss Affy500K_D_220 112 (EBV) chr8:6371546-6430547 59.0
MCPH1, ANGPT2 no
Loss Affy500K_D_142 77 (Admixed) chr8:17998784-18145035 146.3 NAT1 no
Loss Affy6.0_D_535 62 chr2:41356049-41390177 34.1 none no
Loss Affy500K_D_114 & Affy6_D_74 62
chr2:41474986-41608172 133.2 none no
Loss Affy6.0_D_677 20 (Admixed) chr3:22405124-22481450 76.3 none no
Loss Affy6.0_D_671 30 chr3:192351519-192375879 24.4 none no
Loss Affy6.0_D_769 28 chr4:123803190-123806840 3.7
none
(mRNA present) no
Loss Affy6.0_D_930 35 chr6:142219243-142324891 105.6
none
(mRNA present) no
Loss Affy6.0_D_992 64 (Admixed) chr7:23094182-23110722 16.5
none
(mRNA present) no
Loss Affy6.0_D_1029 64 (Admixed) chr8:2578046-2587479 9.4 none no
Loss Affy6.0_D_1650 20 (Admixed) chr8:58080498-58091757 11.3 none no
Loss Affy6.0_D_1069 125 chr8:88575501-88585299 9.8 none no
Loss Affy500K_D_93 48 chr8:89782116-89849946 67.8
none
(mRNA present) no
Loss Affy6.0_D_1644 91 chr8:131657747-131683625 25.9
none
(mRNA present) no
Loss Affy6.0_D_1024 16 chr8:138328381-138425832 97.5 none no
89
Loss Affy500K_D_134 74 chr9:2235919-2351848 115.9 none no
Loss Affy500K_D_43 & Affy6_D_1112 17
chr9:75525136-75638229 113.1 none no
Loss Affy6.0_D_1109 64 (Admixed) chr9:75637796-75657448 19.7 none no
Loss Affy6.0_D_1103 27 chr9:95533791-95585819 52.0 none no
Loss Affy6.0_D_1108 4 chr9:102517861-102553347 35.5
none
(mRNA present) no
Loss Affy500K_D_40 & Affy6_D_1198 16
chr11:39882017-40010124 128.1 none no
Loss Affy500K_D_6 2 (EBV) chr11:89730130-89888327 158.2
none
(mRNA present) no
Loss Affy6.0_D_1205 204 chr11:104741261-104793318 52.1
none
(mRNA present) no
Loss Affy500K_D_83 & Affy6_D_1253 40
chr12:130382166-130686668 304.5
none
(mRNA present) no
Loss Affy6.0_D_1336 101 chr13:39389124-39515818 126.7 none no
Loss Affy6.0_D_1377 54 chr14:42513084-42541303 28.2 none no
Loss Affy500K_D_121 & Affy6_D_1383 64 (Admixed)
chr14:85216336-85436133 219.8 none no
Loss Affy6.0_D_1679 68 chr15:57862260-57891107 28.8 none no
Loss Affy6.0_D_1428 35 chr15:60314660-60333770 19.1
none
(mRNA present) no
Loss Affy6.0_D_1467 67 chr16:54046835-54056160 9.3
none
(mRNA present) no
Loss Affy6.0_D_1601 11 (Asian) chr20:50766640-50780316 13.7 none no
Loss Affy500K_D_225 114 (EBV) chr21:23160325-23267106 106.8 none no
Loss Affy6.0_D_542 61 chr2:148426768-148464448 37.7 ORC4 no
Loss Affy6.0_D_925 101 chr6:162342089-162365931 23.8 PARK2 no
Loss Affy500K_D_234 117 (EBV) chr5:95640616-96152064 511.4
PCSK1, ERAP1, CAST
High-level amplification
Loss Affy6.0_D_1065 61 chr8:85558196-85579549 21.4 RALYL no
Loss Affy6.0_D_1527 203 chr18:38603464-38605275 1.8 RIT2 no
Loss Affy6.0_D_1484 35 chr17:75852813-75870192 17.4 RNF213 no
Loss Affy6.0_D_741 28 chr4:53829489-53875712 46.2 SCFD2 no
90
Loss Affy6.0_D_549 99 chr2:78025162-78059816 34.7 SNAR-H no
Loss Affy500K_D_98 & Affy6_D_743 54
chr4:147802903-148190197 387.3 TTC29 no
Fourteen genes (including one small nuclear RNA) had at least part of their coding regions affected by
FPC-specific losses, and 74 genes (including 3 microRNAs) had at least part of their coding regions
affected by FPC-specific gains (Table 12).
Table 12 – Genes whose coding regions are affected by FPC-specific CNVs
CNV type Gene Entrez Id Official full name Position (hg18) Array Sample
Extent of gene affected
Gain OR1A1 8383 olfactory receptor, family 1, subfamily A, member 1
chr17:2932535-3161719 500K 28 full
Gain OR1A2 26189 olfactory receptor, family 1, subfamily A, member 2
chr17:2932535-3161719 500K 28 full
Gain OR1D2 4991 olfactory receptor, family 1, subfamily D, member 2
chr17:2919396-3019805
500K & Affy6 28 full
Gain OR1G1 8390 olfactory receptor, family 1, subfamily G, member 1
chr17:2919396-3019805
500K & Affy6 28 full
Gain OR1D4 653166
olfactory receptor, family 1, subfamily D, member 4 (gene/pseudogene)
chr17:2932535-3184579
500K & Affy6 28 full
Gain OR3A1 4994 olfactory receptor, family 3, subfamily A, member 1
chr17:2932535-3184579
500K & Affy6 28 full
Gain OR3A2 4995 olfactory receptor, family 3, subfamily A, member 2
chr17:2932535-3184579
500K & Affy6 28 full
Gain OR3A4 390756 olfactory receptor, family 3, subfamily A, member 4
chr17:2932535-3184579
500K & Affy6 28 full
Gain CDYL2 124359 chromodomain protein, Y-like 2
chr16:78810438-79258408 500K 37 partial
Gain DYNLRB2 83657 dynein, light chain, roadblock-type 2
chr16:78810438-79258408 500K 37 full
Gain MIR548H4 100313884 microRNA 548h-4 chr16:78810438-79258408 500K 37 partial
Gain METTL4 64863 methyltransferase like 4 chr18:2254263-2555103
500K & Affy6 44 partial
Gain ARHGAP28 79822 Rho GTPase activating protein 28
chr18:6838462-7291170
500K & Affy6 62 partial
Gain LAMA1 284217 laminin, alpha 1 chr18:6838462-7291170
500K & Affy6 62 full
Gain LOC400643 400643 hypothetical LOC400643 chr18:6838462-7291170
500K & Affy6 62 full
Gain LRRC30 339291 leucine rich repeat containing 30
chr18:6838462-7291170
500K & Affy6 62 full
Gain SP110 3431 SP110 nuclear body protein chr2:230753632-230823051 500K 65 partial
Gain SP140 11262 SP140 nuclear body protein chr2:230753632-230823051 500K 65 partial
Gain GRID2 2895 glutamate receptor, ionotropic, delta 2
chr4:93344017-93591992 500K 79 partial
91
Gain ATR 545 ataxia telangiectasia and Rad3 related
chr3:143693491-143928895 500K 82 partial
Gain PLS1 5357 plastin 1 chr3:143693491-143928895 500K 82 full
Gain TRPC1 7220
transient receptor potential cation channel, subfamily C, member 1
chr3:143693491-143928895 500K 82 partial
Gain IDO2 169355 indoleamine 2,3-dioxygenase 2
chr8:39935640-39943638
500K & Affy6 123 partial
Gain EXOC4 60412 exocyst complex component 4
chr7:133223330-133393933
500K & Affy6 125 partial
Gain RSPO2 340419 R-spondin 2 homolog (Xenopus laevis)
chr8:108696004-108994913
500K & Affy6 18 partial
Gain PLXDC2 84898 plexin domain containing 2 chr10:19849680-20589237 500K 26 partial
Gain AGBL4 84871 ATP/GTP binding protein-like 4
chr1:49856085-50089082 Affy6 127 partial
Gain EHBP1L1 254102 EH domain binding protein 1-like 1
chr11:65027491-65201466 Affy6 20 full
Gain FAM89B 23625 family with sequence similarity 89, member B
chr11:65027491-65201466 Affy6 20 full
Gain KCNK7 10089 potassium channel, subfamily K, member 7
chr11:65027491-65201466 Affy6 20 full
Gain LOC254100 254100 hypothetical LOC254100 chr11:65027491-65201466 Affy6 20 full
Gain LTBP3 4054 latent transforming growth factor beta binding protein 3
chr11:65027491-65201466 Affy6 20 full
Gain MALAT1 378938
metastasis associated lung adenocarcinoma transcript 1 (non-protein coding)
chr11:65027491-65201466 Affy6 20 partial
Gain MAP3K11 4296 mitogen-activated protein kinase kinase kinase 11
chr11:65027491-65201466 Affy6 20 full
Gain MIR4489 100616284 microRNA 4489 chr11:65027491-65201466 Affy6 20 full
Gain MIR4690 100616292 microRNA 4690 chr11:65027491-65201466 Affy6 20 full
Gain PCNXL3 399909 pecanex-like 3 (Drosophila) chr11:65027491-65201466 Affy6 20 full
Gain RELA 164014
v-rel reticuloendotheliosis viral oncogene homolog A (avian)
chr11:65027491-65201466 Affy6 20 partial
Gain SCYL1 57410 SCY1-like 1 (S. cerevisiae) chr11:65027491-65201466 Affy6 20 full
Gain SIPA1 602180 signal-induced proliferation-associated 1
chr11:65027491-65201466 Affy6 20 full
Gain SSSCA1 10534
Sjogren syndrome/scleroderma autoantigen 1
chr11:65027491-65201466 Affy6 20 full
Gain PTGR2 145482 prostaglandin reductase 2 chr14:73405361-73432688 Affy6 67 partial
Gain ZNF410 57862 zinc finger protein 410 chr14:73405361-73432688 Affy6 67 partial
Gain CELF6 60677 CUGBP, Elav-like family member 6
chr15:70381008-70436843 Affy6 44 partial
Gain HEXA 3073 hexosaminidase A (alpha polypeptide)
chr15:70381008-70436843 Affy6 44 partial
Gain GJD2 57369 gap junction protein, delta 2, 36kDa
chr15:32814039-32848252 Affy6 99 full
92
Gain CCDC106 29903 coiled-coil domain containing 106
chr19:60824299-60923809 Affy6 62 full
Gain EPN1 29924 epsin 1 chr19:60824299-60923809 Affy6 62 full
Gain NLRP9 338321 NLR family, pyrin domain containing 9
chr19:60824299-60923809 Affy6 62 partial
Gain U2AF2 11338 U2 small nuclear RNA auxiliary factor 2
chr19:60824299-60923809 Affy6 62 full
Gain ZNF580 51157 zinc finger protein 580 chr19:60824299-60923809 Affy6 62 full
Gain ZNF581 51545 zinc finger protein 581 chr19:60824299-60923809 Affy6 62 full
Gain ZNF784 147808 zinc finger protein 784 chr19:60824299-60923809 Affy6 62 partial
Gain BRSK1 84446 BR serine/threonine kinase 1 chr19:60436319-60696243 Affy6 20 full
Gain COX6B2 125965
cytochrome c oxidase subunit VIb polypeptide 2 (testis)
chr19:60436319-60696243 Affy6 20 full
Gain FAM71E2 284418 family with sequence similarity 71, member E2
chr19:60436319-60696243 Affy6 20 full
Gain HSPBP1 612939
HSPA (heat shock 70kDa) binding protein, cytoplasmic cochaperone 1
chr19:60436319-60696243 Affy6 20 full
Gain IL11 3589 interleukin 11 chr19:60436319-60696243 Affy6 20 full
Gain ISOC2 79763 isochorismatase domain containing 2
chr19:60436319-60696243 Affy6 20 full
Gain NAT14 57106 N-acetyltransferase 14 (GCN5-related, putative)
chr19:60436319-60696243 Affy6 20 full
Gain PPP6R1 22870 protein phosphatase 6, regulatory subunit 1
chr19:60436319-60696243 Affy6 20 partial
Gain RPL28 6158 ribosomal protein L28 chr19:60436319-60696243 Affy6 20 full
Gain SHISA7 729956 shisa homolog 7 (Xenopus laevis)
chr19:60436319-60696243 Affy6 20 full
Gain SSC5D 284297
scavenger receptor cysteine rich domain containing (5 domains)
chr19:60436319-60696243 Affy6 20 partial
Gain SUV420H2 84787 suppressor of variegation 4-20 homolog 2 (Drosophila)
chr19:60436319-60696243 Affy6 20 full
Gain TMEM150B 284417 transmembrane protein 150B chr19:60436319-60696243 Affy6 20 full
Gain TMEM190 147744 transmembrane protein 190 chr19:60436319-60696243 Affy6 20 full
Gain TMEM238 388564 transmembrane protein 238 chr19:60436319-60696243 Affy6 20 full
Gain UBE2S 27338 ubiquitin-conjugating enzyme E2S
chr19:60436319-60696243 Affy6 20 full
Gain ZNF628 89887 zinc finger protein 628 chr19:60436319-60696243 Affy6 20 full
Gain IFT80 57560 intraflagellar transport 80 homolog (Chlamydomonas)
chr3:161448573-161518365 Affy6 123 partial
Gain PYHIN1 149628 pyrin and HIN domain family, member 1
chr1:157133096-157188413 Affy6 123 partial
Gain MCM4 4173
minichromosome maintenance complex component 4
chr8:49008716-49049657 Affy6 202 partial
Gain PRKDC 5591
protein kinase, DNA-activated, catalytic polypeptide
chr8:49008716-49049657 Affy6 202 partial
93
Loss NAT1 9
N-acetyltransferase 1 (arylamine N-acetyltransferase)
chr8:17998784-18145035 500K 77 full
Loss KALRN 8997 kalirin, RhoGEF kinase chr3:125676839-125815545 500K 85 partial
Loss ANGPT2 285 angiopoietin 2 chr8:6371546-6430547 500K 112 partial
Loss CAST 831 calpastatin chr5:95640616-96152064 500K 117 full
Loss ERAP1 51752 endoplasmic reticulum aminopeptidase 1
chr5:95640616-96152064 500K 117 full
Loss PCSK1 5122 proprotein convertase subtilisin/kexin type 1
chr5:95640616-96152064 500K 117 partial
Loss TTC29 83894 tetratricopeptide repeat domain 29
chr4:147802903-148190197
500K & Affy6 54 full
Loss RNF213 57674 ring finger protein 213 chr17:75852813-75870192 Affy6 35 partial
Loss ORC4 5000 origin recognition complex, subunit 4
chr2:148426768-148464448 Affy6 61 partial
Loss SNAR-H 100170221 small ILF3/NF90-associated RNA H
chr2:78025162-78059816 Affy6 99 full
Loss HTN1 3346 histatin 1 chr4:70867305-70952889 Affy6 69 partial
Loss HTN3 3347 histatin 3 chr4:70867305-70952889 Affy6 69 full
Loss STATH 6779 statherin chr4:70867305-70952889 Affy6 69 full
Loss SCFD2 152579 sec1 family domain containing 2
chr4:53829489-53875712 Affy6 28 partial
Fifty-five percent of the genes in Table 12 (48/88) have reported non-silent mutations (missense or
nonsense variants; insertions/deletions; gene fusions) in different cancers according to the COSMIC v.55
database, whereas only 37% of genes in all 500K + Affymetrix 6.0 FPC CNVs (p=0.002) and only 42%
of genes in all 500K + Affymetrix 6.0 control CNVs (p=0.022) had such mutations. None of the genes
overlapped by FPC-specific losses were reported to have downregulated expression in pancreatic cancer
in the Pancreatic Expression Database, whereas six genes overlapped by gains had reports of upregulation
in pancreatic adenocarcinoma and three genes were reported to be upregulated in intraductal papillary
mucinous neoplasm, a pre-invasive lesion. Furthermore, four FPC-specific gains overlapped regions
reported to have high-level amplification in pancreatic adenocarcinoma in the Pancreatic Expression
Database. The four gains overlap eight genes, of which four genes (LOC400643, DYNLRB2, LRRC30,
and LAMA1) are entirely encompassed by their respective gains. LOC400643 is a non-coding RNA and
has no known association with cancer. There are no reports of differential expression in pancreatic cancer
or somatic mutations in DYNLRB2, which codes for a light chain component of cytoplasmic dynein 1
complex but this gene is reported to be involved in TGF-beta/SMAD3 signaling550 and reported to be
downregulated in hepatocellular carcinoma551. LRRC30, which codes for leucine-rich repeat-containing
protein 30, has no reports of differential expression in pancreatic cancer or other association with
tumorigenesis, but does have two reported mutations in the COSMIC database (one nonsense mutation in
94
ovarian serous carcinoma and one missense mutation in hepatocellular carcinoma). LAMA1 codes for
laminin, an extracellular matrix component that binds to cells via high-affinity receptors and mediates
attachment, migration, and organization of cells into tissues during embryogenesis.552 The COSMIC v.55
database reports 18 protein-altering or truncating somatic mutations in this gene in tumors of the
pancreas, ovary, central nervous system, large intestine, breast, upper aerodigestive tract, and skin. In
comparison, for 10,849 COSMIC v.55 database genes that had at least one non-silent/non-intronic
mutation, the average number of mutations per gene is 3.7. A similar average number of reported somatic
mutations is observed in genes affected by CNVs in our study (determined from the compiled data of
500K and Affymetrix 6.0 arrays): 3.6 mutations per gene for FPC-specific genes (p=0.983), 3.4 mutations
per gene for all FPC genes (p=0.821), and 3.7 mutations per gene for all control genes (p=0.955). There is
also evidence for differential expression of LAMA1 in tumors of sites other than the pancreas: one study
reported hypermethylation and under-expression of LAMA1 in colorectal cancer553, while another study
reported overexpression of this gene in glioblastoma554.
Lastly, for non-complex CNV loci (i.e. only losses or gains per locus), we performed Fisher’s exact
testing to determine if any loci had a significantly different proportion in cases relative to controls. After
multiple-correction testing, no loss or gain locus demonstrated a significant difference.
5. Discussion Identifying predisposition genes associated with FPC has been challenging due to the rapid lethality of the
disease, low rate of tumor resection (resulting in paucity of tissue specimens for analysis), and probable
genetic heterogeneity. An estimated 20% of hereditary cases are linked to cancer syndromes caused by
alterations in known genes. However, most families that demonstrate clustering of pancreatic cancer do
not meet criteria for known cancer syndromes.161 We performed an analysis of germline CNVs in
pancreatic cancer patients suspected to have a heritable genetic cause for their disease. These primarily
included members of families with three or more affected cases, but also included families with only one
or two affected cases if at least one of the cases was under age 50 at diagnosis. Three different
computational algorithms were used for CNV identification in each array to identify high confidence
CNVs, an approach that is commonly used in CNV studies. One advantage to utilizing different
algorithms is improved sensitivity for detecting CNVs, since it multiple studies have illustrated
significant non-overlap between algorithms. For our purpose, the use of multiple CNV-calling algorithms
identified variants with a very high likelihood of validation (the “high-confidence” set), as verified by
qPCR and/or second-array hybridization. This allowed us to focus our downstream analysis on these
high-confidence CNVs, whose expected validation rate was > 95%, rather than low-confidence CNVs
(meaning those called by only a single algorithm on a single chip), of which only half appeared to be true
95
genomic alterations. Those results are in keeping with recently published comparative assessment of
CNV-calling algorithms and platforms.555 While we acknowledge that our CNV list is not exhaustive,
this is a logistical limitation of the field as it is neither plausible to genotype hundreds of samples on
multiple platforms nor to perform qPCR validation on hundreds of CNVs. Thus, our approach at least
ensured that we were working with a highly valid set of data.
Interestingly, we noted a discrepancy in the proportion of high-confidence CNVs between samples
genotyped at TCAG in Toronto (all the cases and a small subset of controls) and those genotyped in
Quebec (most controls). We attributed this difference to an apparently higher level of noise in control
arrays genotyped in Quebec. Pinto et al.555 commented on the effect of inter-laboratory variability on
CNV validation rate, finding it to be less important than reproducibility of the chosen platform or calling
algorithm. However, they do note that Affymetrix arrays (the platform used in our study) are an
exception to this, being highly dependent on the reference data set used for the analysis. Since we used
the total number of samples within each group (i.e. those genotyped at each centre constituted a group) as
reference, a noisier set of data from the Quebec samples would be expected to result in a greater
proportion of noisy and/or unreliable calls. We expect that some of the control low-confidence CNVs
would in fact be real calls, so we advocate that CNVs of interest that are to be investigated futher should
be checked for CNV calls in controls and those should be validated before further analysis. (We
performed such validation for the CNV G_97 that overlapped TGFBR3; it appeared to overlap a low-
confidence duplication in an ARCTIC control but this putative gain was demonstrated by qPCR to be a
false call).
To date, this is the largest study of germline CNVs in unrelated cancer patients from high-risk families. A
previous study of 57 pancreatic cancer patients from 56 high-risk kindreds (each containing at least a pair
of affected first-degree relatives) used an oligonucleotide-based CGH platform to identify FPC-specific
germline CNVs, filtering out losses or gains that were also identified in 607 controls (372 were analyzed
in the same study, and 235 were previously reported in two other studies).345 Twenty-five FPC-specific
losses overlapping 81 genes and 31 FPC-specific gains overlapping 425 genes were identified. In our
study, we investigated 133 members of 131 high-risk kindreds, of whom 17 subjects were part of the
previous CGH study, and we identified 93 FPC-specific CNVs using a combination of Affymetrix 500K
and Affymetrix 6.0 arrays. The median size of FPC-specific CNVs in the CGH study was larger than in
our FPC-specific CNVs (losses: 151kb vs. 35.5kb; gains: 379kb vs. 73kb). This may be due, in part, to
the lower resolution of the CGH platform (mean inter-marker distance = 30kb) compared to the
Affymetrix 500K array (median inter-marker distance = 2.5kb) and Affymetrix 6.0 array (median inter-
marker distance = 0.7kb) used in our study. It may also reflect enrichment for somatic CNVs caused by
EBV-transformation, since all FPC DNA samples in the CGH study were extracted from EBV-
96
transformed cells whereas only 29 samples in our population were EBV lymphoblasts. The size of
control populations used to filter CNVs was larger in our study and the number of control CNVs from
non-BAC studies currently catalogued in the DGV is greater than was available at the time of publication
of the previous FPC CNV study. As a result, some of the CNVs identified as “FPC-specific” in the
previous study overlapped CNVs in our controls and/or in the DGV. This may explain the slightly higher
(FPC-specific CNVs)-to-sample ratio observed in the CGH study (approximately 1 CNV per sample)
compared to our study (0.8 CNV per sample).
It is difficult to estimate concordance in CNV calling between the two studies, as we do not know how
many of the 56 FPC-specific CNVs reported in the CGH study were identified in samples that were also
used in our study. Only 1/25 loss and 3/31 gain loci reported in the CGH study were also observed in our
analysis in samples common to both studies, and all of these overlapped CNVs in our controls and/or in
the DGV. Interestingly, multiple reports have demonstrated generally low concordance for CNV calling
on different platforms/algorithms when analyzing the same DNA source.259,555 In addition to CNVs
identified in cases common to both studies, there was one FPC-specific loss locus which was identified in
two different subjects (one in each study). The region overlapped a gene, DOCK1 (dedicator of
cytokinesis 1), but in our study the loss only encompassed an intronic portion of the gene. This gene may
have a role in cellular proliferation and migration556,557, and it has been reported to be overexpressed in
high-grade dysplastic lesions (PanIN3), suggesting that it may be important in advancing
tumorigenesis.558
A number of other genome-wide germline CNV analyses have been reported for various cancers, but only
a few have studied familial cancers. In addition to the aforementioned familial pancreatic cancer study,
microarray-based germline CNV studies have been reported for Li-Fraumeni syndrome348, young-onset
and/or familial colorectal cancer in families without mutations in known predisposition genes347, and
BRCA1-associated ovarian cancer.346 Shlien et al.348 described an increased frequency of germline CNVs
in 33 Li-Fraumeni family members carrying mutations in the TP53 gene (of which 23 were affected by
cancer), compared to 20 Li-Fraumeni family members with wildtype TP53 and 70 healthy controls. Since
many of the CNVs overlapped or were near important cancer genes, the authors proposed a model
whereby baseline genomic instability in these patients progresses over time, leading to more frequent and
larger copy number alterations affecting genes that contribute to tumorigenesis. In our study, patients and
controls had a similar number of alterations per genome, with similar CNV size, ratios of losses to gains,
likelihood of CNVs to overlap genes, and proportion of genic CNVs that were associated with cancer.
The lack of significant difference in the germline CNV profile between cases and controls suggests that
causative genes for pancreatic cancer do not significantly impact genomic stability in non-tumor cells.
Our results are similar to those of Yoshihara et al.346 who compared 68 Japanese subjects with germline
97
BRCA1 mutations (of whom 51 had ovarian cancer), 34 sporadic ovarian cancer patients, and 47 healthy
controls. They reported no significant difference in the per-genome total number of CNVs between
BRCA1-mutation carriers and controls, although the number of deletions was higher in the BRCA1
subjects. Otherwise, they found no evidence for differential clustering of the global CNV data between
groups, and no correlation of age at diagnosis with CNV frequency.
Our proposal for CNV prioritization emphasized regions that segregate with disease in the same family
and/or overlapping CNVs in multiple unaffected cases (and absent in controls). We only had access to
CNV data for two sets of relatives (two sibling pairs), neither of which demonstrated evidence of FPC-
specific CNVs that were co-inherited within the same family. When looking at overlapping CNVs in
cases, one region that caught our interest contained two overlapping duplications in two unrelated cases,
both of which intersected the TGFBR3 gene. While none of the ARCTIC controls had a validated CNV
in this region, a single POPGEN control from the Affy6 dataset contained a duplication that overlapped
the cases’ duplications. However, the control’s duplication did not intersect the gene to the same extent
as the cases, and in fact only appeared to transect the 5’ end of one of multiple isoforms of the gene
(whereas the cases intersected all isoforms). The significance of the TGF-beta pathway in cancer
initiation and progression in general, and in pancreatic cancer in particular, made this duplication
especially interesting to us. We successfully validated this CNV in both affected cases, and we further
demonstrated that it was heritable in one of the two families for which we had access to DNA from
multiple relatives. Furthermore, we successfully identified the exact breakpoint of the duplication,
proving in the process that it is a tandem duplication, by a combined approach of qPCR walk-along and
Sanger-sequencing of a PCR-amplified fragment. This breakpoint contained three base pairs that do not
appear to be derived from the sequence of either end of the duplication (“TAT”), which is a common
finding at the breakpoints of duplications caused by non-homologous end-joining (NHEJ).559 However,
once we were able to design a sufficiently small fragment containing the region of the breakpoint to test
its presence in FFPE-derived tissue from an affected sister of the proband, we found that this duplication
does not cosegregate with pancreatic cancer in that family. This effectively refuted the implication of this
duplication as a cause for familial pancreatic cancer. (We also note that the breakpoints of both case
duplications fell within intronic regions of TGFBR3, further decreasing the likelihood of disrupting the
gene). While this direction in our investigation ultimately proved fruitless, it confirmed the challenge of
interpreting the impact of CNVs, an aspect of CNV research that has lagged behind the ability to detect
CNVs or statistical methods for performing genome-wide association studies using CNVs as disease
markers. As illustrated by our effort, the process of fine-mapping CNV breakpoints is painstaking but
necessary to understanding the precise region that is transected by a duplication or deletion. And even
that alone is not sufficient to prove that a CNV causes a particular phenotype; for that, further functional
98
work would be required such as demonstration of expression correlation to copy number, and impact of
altered expression on cellular function.
Next in priority for our analysis were single-case FPC-specific CNVs that overlapped genic regions. We
identified 88 genes whose coding regions were partially or completely encompassed by FPC-specific
CNVs, and although some are unlikely to be candidate FPC genes (e.g. olfactory receptor genes), many
are functionally relevant to carcinogenesis, and are differentially expressed and/or overlap regions that are
reported deleted or amplified in pancreatic adenocarcinoma. Moreover, the proportion of genes that were
reported in COSMIC v.55 to have protein-altering mutations in tumors or malignant cell lines was
significantly higher in FPC-specific genes than in either the full population of cases or in controls. This
further suggests that FPC-specific CNVs are enriched for cancer-associated genes. In the report by
Yoshihara et al.346, the primary genetic etiology for the hereditary cancer was already known (BRCA1),
and the authors presented genes overlapped by BRCA1-specific CNVs as potential modifiers to the
development of cancer. Alternatively, the study by Venkatachalam et al.347 identified seven genic CNVs
specific to patients with familial colorectal cancer who have no known genetic mutation, each CNV found
in a single individual only. In that study, like ours, each gene is considered a potential causative gene for
familial colorectal cancer. None of the genes overlapped by cancer patient CNVs reported by Shlien et
al.348, Yoshihara et al.346, or Venkatachalam et al.347 were part of our FPC-specific gene list. It should be
noted that, in addition to the RefSeq genes we highlighed in this paper, 6 FPC-specific gains and 11 FPC-
specific losses that did not overlap RefSeq genes did overlap expressed human mRNA. While these
regions are of lower interest relative to bonefide genes, some published CNV studies have reported
associations of non-genic regions with disease, demonstrating evidence for hitherto unidentified genes
and/or regulatory elements.340,344
The final stage of our CNV prioritization involved calculating the difference in proportion of cases vs.
controls containing each simple loss or simple gain locus. This approach would theoretically identify
CNVs that are detected in both cases and controls but at a higher frequency in cases. No locus achieved a
statistically significant p-value after multiple-testing correction. This was not unexpected, since the
number of cases included in our analysis was too small for the purpose of identifying a significant
genome-wide association result, unless a very high effect size was associated with a CNV. Furthermore,
the biases inherent in the design of our study (e.g. the Affymetrix 500K array is suboptimal for detecting
recurrent CNVs relative to rare CNVs, the differences in noise level and high-confidence CNV calling
between cases and controls) meant that such an analysis would be inappropriate with our dataset. A
properly designed genome-wide association CNV study requires a well-validated platform for genotyping
CNVs, such as the Affymetrix 6.0 array, and the necessary sample size for achieving sufficient power in
the statistical analysis. Alternatively, we note that some of the loci in our study had a significant p-value
99
with a higher case frequency before multiple-correction testing, and those regions can be selected for
further testing in an independent case-control study that directly genotypes the CNVs of interest (for
example using a PCR-based approach). Such a technique was in fact utilized by Huang et al.344 for
identifying the 6q13 deletion associated with pancreatic cancer.
In conclusion, we have presented a list of candidate predisposition genes for FPC overlapped by germline
CNVs that are specific to the largest cohort of high-risk pancreatic cancer patients published to date. One
limitation of our analysis is the coverage and resolution of the platform we used for primary CNV
discovery (i.e. Affymetrix 500K array). Since the completion of our study, novel methods of CNV
detection have become available, including very high resolution tiling microarrays and next-generation
sequencing. We expect future studies using these methods to independently test our findings and detect
additional FPC candidate genes. Some of the samples containing FPC-specific CNVs in our study
differed in ancestry from the majority of controls, raising the possibility that these CNVs are specific to
the respective ancestry group rather than to pancreatic cancer risk. Those CNVs should be investigated
further in a larger ethnicity-matched control cohort. Despite these limitations, our list of FPC-specific
genes contains several interesting candidates and further screening for mutations in other high-risk
pancreatic cancer subjects, along with investigation of the functional role of these genes, would add
support to the role of one or more genes in predisposition to FPC.
100
Chapter 4 - Exome Sequencing in a Familial Pancreatic Cancer Kindred
1. Abstract In recent years, the significant drop in cost of next-generation sequencing and target-region enrichment
have enabled researchers to use whole-exome sequencing for identification of predisposition genes for a
variety of Mendelian disorders, a few of which have been familial cancer syndromes. In this study, we
aimed to apply this novel method to investigate the genetics of a family containing four relatives affected
by pancreatic cancer. Blood-derived DNA was available from three affected relatives (two siblings and
their maternal uncle), and we also included an unaffected maternal aunt as a control. Target-enrichment
was performed using Nimblegen in-solution array and sequencing was performed by Illumina GAII
parallel sequencer. We present two alternative hypotheses: (1) in this family, rare variants that are
inherited by the three affected individuals and not inherited by the unaffected aunt are candidate
susceptibility genes for familial pancreatic cancer; and (2) in this family, rare variants that are inherited
by the three affected individuals, whether or not the are present in the unaffected aunt, are candidate
susceptibility genes for familial pancreatic cancer. We present four potential variant filtration models to
develop a list of candidate genes for further investigation, but we focus our downstream analysis on one
model. The validation rate for heterozygous single nucleotide variants and indels was high (> 80%) but
significantly lower for homozygous variants. In Model#1 of our analysis, we identify 9 candidate genes
with heteozygous single nucelotide variants in the three affected family members and absent in the
unaffected aunt, of which we further investigate the two top-ranked genes using Sanger sequencing in a
cohort of unrelate high-risk pancreatic cancer patients. We do not identify further subjects with
unreported variants in those genes. Further investigation of other genes in this model and the other three
filtration models will be possible in future exome sequening studies on other pancreatic cancer patients.
2. Introduction In the previous chapter, we performed a genome-wide analysis of germline CNVs in pancreatic cancer
patients from high-risk families to identify candidate susceptibility genes. As was discussed, this was
based on the hypothesis that a proportion of syndromic cancer cases occur due to large rearrangements
affecting the causative gene. It remains, though, that most variants which cause hereditary cancer are
point mutations, most commonly occurring in coding regions or splice-sites, thus altering the encoded
protein or causing premature termination. Until recently, such variants could only be identified by a
candidate-gene approach and laborious Sanger sequencing. However, the development of target-capture
arrays for building DNA arrays enriched for coding regions (“the exome”), in combination with
101
decreasing cost of massively parallel next-generation sequencing, has enabled interrogation of entire
genomes for susceptibility variants. Indeed, over the past couple of years, a large number of exome-based
studies have been published identifying causative genes for heretofore unexplained Mendelian diseases.
(See Literature Search for more details).
While the number of studies specifically addressing cancer syndromes has been small, it is evident that a
similar strategy can be applied to identifying susceptibility genes in individuals or families who appear to
inherit the disease in a Mendelian fashion (dominant or recessive). Therefore, we chose a family
consisting of two affected siblings, their affected mother, and an affected maternal uncle to investigate
using exome sequencing. The CNV profile of the two siblings was already characterized in Chapter III
(CNV-case ID-89 here identified as ID-001 and CNV-case ID-30 here identified as ID-006), and all
deletions and gains segregating in the two siblings were also found in controls. (Indeed, only one deletion
was FPC-specific, found exclusively in sibling ID-006, and it occurred in a non-genic region. This CNV
was not identified in sibling ID-001). For the study described in this chapter, blood-derived DNA was
available for the two siblings and their affected maternal uncle (but not their mother). We chose to also
include DNA from an unaffected maternal aunt to act as a control for filtering out candidate variants, with
the hypothesis that all three affecteds would be carriers of a high-penetrance variant and that the 80-year-
old unaffected aunt is unlikely to be a carrier. However, we acknowledge that, since we do not know the
penetrance of the gene in question, the aunt may also be an unaffected carrier. For that reason, we also
present an alternate hypothesis that considers the aunt a possible carrier of the variant of interest, and thus
identifying variants shared among the affected members whether or not present in the aunt.
In the methods below, filtration models#1 and #3 fall under the first hypothesis: variants inherited by the
three affected relatives and absent in the unaffected are in candidate susceptibility genes for FPC;
filtration models # 2 and #4 are based on the second hypothesis: variants inherited by the three affected
relatives are in candidate susceptibility genes for FPC, regardless of inheritance in the unaffected family
member. As described in this chapter, we only focus our downstream investigation and candidate gene
screening on results from model#1, pertaining to the first hypothesis.
3. Materials & Methods
3.1 Description of Family C We identified a consanguinous family of Maltese ancestry with a strong history of pancreatic cancer: the
proband was a male (ID-001) who presented with metastatic pancreatic cancer at age 42 years; soon after,
one of his sisters (ID-006) also presented with metastatic pancreatic cancer at age 34 years. Neither
patient had a resectable tumor and both subjects died within one to two years of diagnosis. The two
102
siblings were part of a sibship of seven (in addition, their mother had two miscarriages); a brother was
affected with low-grade B-cell follicular lymphoma at age 45 and remains alive and free of disease today
at age 49. Their mother had previously undergone a pancreaticoduodenectomy for pancreatic cancer at
age 58 but also died soon after from disease recurrence. Several years later, a maternal uncle (ID-011)
developed metastatic pancreatic cancer at age 80 while enrolled in an MRI-based screening program and
died of his disease. Figure 17 illustrates the pedigree of the family.
Figure 17 – Pedigree of FPC kindred investigated by exome sequencing
Figure 17 Legend: Large red box indicates affected mother without available DNA for sequencing; blue circles
indicate family members on whom exome sequencing was performed. Filled box = affected male; filled circle = affected female; unfilled box = unaffected male; unfilled circle = unaffected female. (“affected” refers to pancreatic cancer)
Blood samples were taken from all seven siblings (including the two pancreatic cancer patients before
they died), as well as from the affected maternal uncle and an unaffected maternal aunt (ID-010). No
blood sample was available for the mother. DNA was extracted from blood samples as per previously
described protocol (see Chapter II of this thesis).
3.2 Target-capture, next generation sequencing, and raw-data analysis [Note: DNA library preparation and sequencing, alignment of reads, and variant calling was performed by
members of Dr. John McPherson lab at Ontario Institute for Cancer Research (Quang Trinh). Data was
provided to W. Al-Sukhni for validation and downstream variant filtration and subsequent Sanger
sequencing in other patients. Most PCR-amplifcation for variant validation and screening in other
ID-001 ID-006
ID-011 ID-010
103
patients described in this chapter was performed by W. Al-Sukhni, with assistance from H. Kim and T.
McPherson.]
DNA samples from the siblings, uncle, and aunt were enriched for exomic regions using Nimblegen
SeqCap EZ Human Exome Library v2.0, as per industry protocol. This in-solution array contains 2.1
million empirically optimized oligonucleotide probes targeting approximately 300,000 exons based on
annotation of consensus coding sequence (CCDS) project (Sep 2009)374, RefSeq database (Jan 2010)560
and miRBase database (v.14, Sep 2009)561, with a total target size of approximately 35Mb. Resulting
DNA libraries were sequenced using the Illumina GAII next-generation sequencer using paired-end
2x101 standard sequencing procedure provided by Illumina, generating 101-bp reads to align against the
reference genome. For ID-001 and ID-006, the data in this analysis were generated by 6 sequencing lanes
each, for ID-010, two lanes were used, and for ID-011 three lanes were used.
Raw data was processed through an empirically-validated workflow: First, basic quality controls (QC)
such as number of reads, average base quality per cycle, and percentage of bases with their corresponding
Phred quality values were examined on each lane of raw data. Next, raw reads were aligned to the
reference human genome (GRCH37) using Novoalign562, and only uniquely aligned reads were included
for downstream analysis. After documenting several QC parameters (e.g. % of reads aligned, % of reads
aligned in correct orientation, % of reads aligned only as singletons), duplicated fragments that have
exactly the same start and end points are presumed to be PCR artifacts and are removed (“collapsing”)
using Picard command-line tools (http://picard.sourceforge.net). Further QC parameters to be assessed at
this point include comparing percent of reads aligned before and after collapsing, proportion of target
region that is covered at least once by sequencing, percent of bases covered at incrementally higher depth
of coverage, and average depth of coverage across the captured target region.
At this point, the data was processed through GATK563 software for quality recalibrations, local
realignments, and variant/indel calling. Variants passing a minimum quality score threshold of 30 were
considered reliable. A minimum read depth of 8x was considered necessary to call a variant, and the
maximum allowable number of single nucleotide variants (SNVs) in a 10-base window was two.
Heterozygosity/homozygosity for each variant was also estimated by GATK.
3.3 Validation of variants Validation of exome sequencing data was performed by two approaches. First, we took advantage of the
fact that the two siblings were previously genotyped on Affymetrix 500K array for the CNV project (see
Chapter 3 of this thesis). We identified common SNPs in common to both platforms for each sample and
checked the concordance rate in genotype call between the two platforms. The microarray genotype calls
104
were determined using the Affymetrix Genotyping Console (GTC 2.1), which uses the BRLMM564
algorithm for assigning genotypes. This algorithm has >99% accuracy in detecting homozygous and
heterozygous variant alleles. (Note that we were not able to directly identify SNPs that were wildtype
(i.e. reference) allele in the exome data since only variants were called and provided to us. As a result, we
can only comment on the concordance of heterozygous and homozygous variant calls in exome data in
relation to the microarrays)
Second, for variants that did not appear in the dbSNP database at the time of initial sequence results
(identifying the variant as “novel”), we performed Sanger sequencing to validate the variants.
Sequencing was performed in the sense and anti-sense direction for each variant to confirm. We
calculated specificity and sensitivity of heterozygous variant calling in exome data as follows:
Specificity = TN/(TN+FP), where TN = true negative (no variant is called in either the exome data or
Sanger sequencing) and FP = false positive (variant called by exome data but not validated by Sanger
sequencing)
Sensitivity = TP/(TP+FN), where TP = true positive (variant called by exome data and validated by
Sanger sequencing) and FN = false negative (variant not called by exome data but identified by Sanger
sequencing)
For the above definitions, we excluded homozygous calls and calls where the exome data indicates that
the allele is different from the reference genome but misidentifies the allele (e.g. exome analysis calls
G>A variant, but Sanger proves the variant to be G>T).
3.4 Filtering strategy All SNVs and indels within the exome-capture target regions were identified, and SIFT565 was used to
annotate the synonymous/non-synonymous/frameshift/non-frameshift nature of each SNV or indel.
Synonymous variants (i.e. no alteration in amino acid) were identified and removed. In addition, variants
reported in dbSNP131 were removed. Only coding region and/or splicing-site variants (up to +/- 3bp
from exons) were included in the final list per subject. We screened the excluded variants for very low
minor allele frequencies (< 0.2%) or variants that are somatic variants in cancer that should be re-included
in our list (since dbSNP does contain some somatic variants).
To identify candidate susceptibility genes for the pancreatic cancer in this family, we adopted four
filtering approaches:
105
Model#1 - Assuming the two siblings and the uncle are all carriers of the responsible variant, and that the
unaffected aunt is not a carrier, we identified variants in common to the siblings + uncle and absent in the
aunt.
Model#2 - To account for incomplete penetrance of the susceptibility gene, we assumed the aunt may or
may not be a carrier and identified variants in common to the siblings and uncle, whether or not present in
the aunt as well.
Model#3 - To account for the lower coverage in the uncle, we assumed the two siblings are carriers and
the aunt is not a carrier and identified variants in common to the siblings and absent in the aunt, whether
or not called in the uncle.
Model#4 - To account for lower coverage in the uncle and incomplete penetrance of the gene, we
assumed the two siblings are carriers and identified variants in common to the siblings, whether or not
present in the aunt and/or uncle.
For each model, the final list of variants was manually curated by screening in dbSNP135
(http://www.ncbi.nlm.nih.gov/snp) which includes results from the first phase of the 1000 Genomes246
project (low-coverage genome-wide sequencing of 180 samples, sufficient to call most variants ≥ 1%
minor allele frequency, and deep-sequencing of exons captured for 1000 genes in 900 individuals,
sufficient to call rare and low-frequency variants in the coding region of these exons). We also screened
the variants in the Exome Sequencing Project566, a collaborative project that is sequencing thousands of
genomes from large, well-phenotyped cohorts. To date, data for approximately 5,400 samples are
available online. For the purpose of this analysis, since cancer syndromes are typically caused by high-
penentrance, rare variants, we removed variants that appear with a frequency >0.2% in the 1000 Genome
or Exome Sequencing Project.572 For indels, we individually inspected the region of the genome near the
putative variant to verify that it is indeed novel based on the latest information in dbSNP135, since in
some repetitive regions, the exact position of the indel can be called differently by different algorithms.
For the remaining variants under each model, we identified the predicted effect of variants using SIFT565
and Polyphen-2.464 We also determined if the genes containing the variants have been reported to be
differentially expressed in pancreatic adenocarcinoma or pre-invasive lesions (in Pancreatic Expression
Database)253, as well as whether they have reported somatic mutations in cancer (as catalogued in
COSMIC database549). We also compared our list of genes generated from this analysis with the list of
genes affected by coding-region CNVs, reported in Chapter III of this thesis.
106
3.5 Screening candidate genes We performed PCR amplification and Sanger sequencing to validate top candidates and also performed
Sanger sequencing to screen candidate genes in a cohort of 70 familial and young-onset pancreatic cancer
cases. (Primer sequences were previously published by Jones et al.37).
4. Results Table 13 summarizes the number of raw reads generated per sample and the percentage of reads that were
aligned after collapsing PCR artifacts.
Table 13 – Summary of raw sequence data from Illumina GAII for each subject
N raw reads
N reads aligned marked as PCR
% reads aligned marked as PCR
N reads aligned
% reads aligned after collapsing
N reads aligned in + strand
% reads aligned in + strand
N reads aligned in - strand
% reads aligned in - strand
ID-001 (sibling) 255729700 93291690 36.48 100526497 39.31 50238199 49.98 50288298 50.02 ID-006 (sibling) 286351706 74640172 26.07 145229860 50.72 72609270 50 72620590 50 ID-010 (aunt) 125185468 16889185 13.49 98153318 78.41 49068884 49.99 49084434 50.01 ID-011 (uncle) 122363100 71869013 58.73 22965430 18.77 11455807 49.88 11509623 50.12
Although the two siblings generated approximately twice as many raw reads as the aunt and uncle, only
40-50% of the siblings’ reads were ultimately aligned after excluding PCR artifacts while nearly 80% of
the reads for the aunt were aligned. This resulted in approximately an equivalent number of reads for
those three samples contributing to the final alignment of each genome. Fewer than 20% of the raw reads
generated for the uncle were aligned after excluding PCR artifacts, resulting in significantly lower
coverage for the uncle’s genome: while each of the four samples had the majority of the target region
bases (~35Mb) covered by at least one read (1x), the exome-wide average read-depth for the uncle was
about 10-fold the average coverage of the other three samples (~20x vs. 186x). (Figures 18 and 19).
Figure 18 – Average coverage of bases in target region of exome per subject
Figure 18 Legend: ID-011 (uncle) had lower average read depth for target exome than the other 3 subjects.
107
Figure 19 – Read-depth per base in target region of exome in each subject
Of note, an accepted minimum threshold for accurate identification of a heterozygous variant (in previous
papers and by the lab performing sequencing) is 8x coverage: at this threshold, the algorithm can reliably
call a heterozygous variant at approximately 94-95% of the target region of the siblings and aunt but at
only 82% of the uncle.
4.1 Validation For siblings ID-001 and ID-006, a total of 1,985 and 1,995 SNPs, respectively, were identified as having
a heterozygous or homozygous non-reference allele in the exome data and which were genotyped on the
Affymetrix 500K array. Of those, 473 variants in ID-001 and 439 variants in ID-006 were discordant
between the exome data and the microarray genotypes; 318 of those were discordant in both siblings, the
majority of which were identified as wildtype on the microarray and homozygous variant on the exome
data. For ID-001, 1,086/1,103 (98.5%) SNPs identified as heterozygous in the exome data were
concordant with the microarray results, while only 426/882 (48.3%) SNPs identified as homozygous
variant in the exome were concordant with the microarray results (p<0.0001). The results for ID-006
were nearly identical: 1,122/1,141 (98.3%) of heterozygous SNPs and 434/854 (50.8%) homozygous
variants allele called by the exome data were concordant with microarray genotypes (p<0.0001).
We also performed Sanger sequencing on 38 SNVs that were unreported in dbSNP131, including eight
putatively novel homozygous variants. (Table 14)
8x
108
Table 14 – Sanger validation data for selected SNVs in each exome subject
Sib ID-001 (affected)
Sib ID-006 (affected)
Uncle ID-011 (affected)
Aunt ID-010 (unaffected)
Gene Variant (hg19)
NGS Sanger NGS Sanger NGS Sanger NGS Sanger
ABCC12 chr16:48177891 G/A
het het conc het het conc wt het disc wt wt conc
ADAMTS20 chr12:43886974 T/G
het het conc het het conc wt het disc wt wt conc
APLF chr2:68717350 A/T
het het conc het het conc het het conc wt wt conc
ASTN2 chr9:119802196 A/G
het het conc het het conc het het conc wt wt conc
AZI1 chr17:79169727 T/C
het het conc het didn’t do
n/a wt didn’t do
n/a wt wt conc
C14orf102 chr14:90752754 G/A
het het conc het het conc wt het disc wt noisy n/a
C1orf65 chr1:223568054 G/A
het het conc het didn’t do
n/a het didn’t do
n/a wt wt conc
CCDC141 chr2:179702237 G/A
het het conc het het conc het het conc wt wt conc
CEP110 chr9:123886284 A/T
het het conc het het conc het het conc wt wt conc
CREBBP chr16:3820773 G/A
het het conc het didn’t do
n/a wt didn’t do
n/a wt wt conc
MUC7 chr4:71346606 C/T
het het conc het het conc het het conc wt wt conc
PCYOX1 chr2:70503881 T/A
het het conc het het conc het het conc wt wt conc
RASSF6 chr4:74442178 A/C
het het conc het het conc het het conc wt wt conc
SEZ6L2 chr16:29896915 C/T
het het conc het het conc wt het disc wt wt conc
SFRS2IP chr12:46322436 C/T
het het conc het het conc wt het disc wt wt conc
TAF5L chr1:229745873 C/T
het het conc het het conc het het conc wt wt conc
CYP2C9 chr10:96741007 C/A
het het conc het het conc wt het disc wt wt conc
AGL chr1:100318245 T/G
het wt disc het didn’t do
n/a wt didn’t do
n/a wt wt conc
ARAP1 chr11:72406442 T/C
het het conc het noisy n/a wt wt conc wt didn’t do
n/a
RPA1 chr17:1800470 G/C
het het conc het het conc wt wt conc wt didn’t do
n/a
AKAP7 chr6:131571655 A/G
het het conc het het conc wt wt conc wt didn’t do
n/a
NEIL3 chr4:178283483 G/A
het het conc het het conc wt wt conc wt didn’t do
n/a
C9 chr5:39331804 A/G
het het conc het het conc wt wt conc wt didn’t do
n/a
RAPGEF3 chr12:48131310 G/A
het het conc het het conc wt wt conc wt wt conc
SERPINB3 chr18:61323259 A/T
het het conc het het conc wt wt conc wt wt conc
C2orf24 chr2:220037608 A/C
het het conc het het conc wt wt conc wt wt conc
KDM4C chr9:7103702 A/C
het het conc het wt disc wt wt conc wt didn’t do
n/a
EXPH5 chr11:108389007 G/A
het het conc het het conc wt didn’t do
n/a wt wt conc
MSH6 chr2:48027541 het het conc het het conc het het conc het het conc
109
G/A
PCSK9
chr1:55524237 G/T
homo homo (diff variant-A)
disc homo homo (diff variant- A)
disc wt het (diff variant- G/A)
disc het het (diff variant- G/A)
disc
ANKRD11
chr16:89350038 G/T
homo homo (diff variant- A)
disc homo homo (diff variant- A)
disc wt homo (diff variant- A)
disc het het (diff variant- G/A)
disc
KIAA0020 -(1)
chr9:2811505 G/T
homo homo (diff variant- A)
disc homo homo (diff variant- A)
disc wt homo (diff variant- A)
disc het het (diff variant- G/A)
disc
KIAA0020 -(2)
chr9:2828765 C/T homo homo (diff variant- G)
disc homo homo (diff variant- G)
disc homo het (diff variant- C/G)
disc het het (diff variant- C/G)
disc
USP6 chr17:5037281 T/C
homo homo
conc homo homo conc wt wt conc wt wt conc
CHRNE
chr17:4802829 G/T
homo homo (diff variant- A)
disc homo homo (diff variant- A)
disc wt wt conc wt wt conc
TXNDC17 chr17:6544421 G/A
homo homo conc homo homo conc wt wt conc wt het disc
MYH2 chr17:10432311 C/T
homo homo conc homo homo conc wt wt conc wt wt conc
NGS = next-generation sequencing; het = heterozygous variant; homo = homozygous variant; wt = wildtype; i.e. homozygous reference allele; conc = concordant results between next-generation and Sanger sequencing; disc = discordant results between next-generation and Sanger sequencing
For heterozygous exome variants, 53/57 (93%) of calls in the siblings and aunt and 9/9 (100%) of calls in
the uncle were concordant with Sanger sequencing (p=1.000); for homozygous exome variants, 6/16
(37.5%) of calls in the siblings and aunt and 0/1 (0%) of calls in the uncle were concordant with Sanger
sequencing (p=1.000); for wildtype alleles in the exome data, 24/25 (96%) of calls in the siblings and aunt
and 13/22 (59%) of calls in the uncle were concordant with Sanger sequencing (p=0.003). Of the eight
homozygous variants called in the two siblings, only three validated as called in the exome data; the
remaining five were discovered to be a different homozygous allele by Sanger sequencing. Notably, the
three accurately called homozygous variants were all novel, whereas the five inaccuarately identified
variants were at positions of reported SNPs (i.e. the Sanger-sequence allele is the same as that reported in
dbSNP). Based on the Sanger sequencing results, the specificity for heterozygous variant calling in our
exome data in the siblings and aunt was 24/(24+2)=92% and in the uncle was 13/(13+0)=100%
(p=0.544); the sensitivity in the siblings and aunt was 52/(52+1)=98% and in the uncle 9/(9+6)=60%
(p<0.001).
In addition, we performed Sanger sequencing on 15 indels called in the exome data. (Table 15)
110
Table 15 – Sanger validation data for selected indels in each exome subject
Sib ID-001 (affected)
Sib ID-006 (affected)
Uncle ID-011 (affected)
Aunt ID-010 (unaffected)
Gene Variant Position NGS Sang NGS Sang NGS Sang NGS Sang
TUB
ins GAGGATGAG chr11:8118257 y y
conc n y
disc n n
conc n n
conc
C22orf40 del T chr22:46643104 y y conc y y conc n n conc n n conc
WDR92 ins C chr2:68384601 y n disc n
didn't test
n/a n
didn't test
n/a n
didn't test
n/a
KCNMB3 del T chr3:178960766 y y conc n n conc n n conc n n conc c4orf35 del A chr4:71201064 y y conc n n conc n n conc n n conc
FAM53C
del CCTCAGGCCTGAGCCTGCA chr5:137680588 y y
conc n n
conc n n
conc n n
conc
STAG3 ins G chr7:99797230 y n disc n
didn't test
n/a n
didn't test
n/a n
didn't test
n/a
ARHGAP36 ins C chrX:130217764 y n
disc n
didn't test
n/a n
didn't test
n/a n
didn't test
n/a
NBPF3
del GTCTCCCAG chr1:21801435 n n
conc y y
conc n n
conc n n
conc
ZNF683
del CCACCGAGCGCTGGGGTGCCCCAG chr1:26691286 n n
conc y y
conc n y
disc n n
conc
CLSPN del TTC chr1:36203659 n n conc y y conc n n conc n n conc
FLVCR2
del CCCAGCGTCTCGGTCCAT chr14:76045387 n n
conc y y
conc n n
conc n n
conc
NUCB1
del AGCAGC chr19:49425108 n n
conc y y
conc n y
disc n n
conc
MNDA del AGAA chr1:158817614 n n
conc y y
conc n n
conc n n
conc
PCDHGA2 del C chr5:140719334 n n
conc y y
conc y y
conc y y
conc
NGS = next-generation sequencing; Sang = Sanger sequencing; y = indel identified; n = indel not identified; conc = concordant results between NGS and Sanger; disc = discordant results between NGS and Sanger
Thirty-nine sequencing reactions were conducted in the siblings and aunt: 14/17 (82%) of indels called in
exome data were validated and only a single indel in one individual was missed on exome sequencing.
There were too few tests in the uncle to identify a significant difference (only one indel was called in this
sample set, which validated, and for 11 indels that were not called in the uncle 9 were also not observed
on Sanger sequencing). The specificity of indel calling in the sibs and aunt was 21/(21+3) = 88% and in
the uncle was 9/(9+0)=100% (p=0.545); the sensitivity in the sibs and aunt was 14/(14+1)=93% and in
the uncle was 1/(1+2)=33% (p=0.056).
111
4.2 Filtration results Table 16 summarizes the number of variants identified in each subject.
Table 16 – Number of variants identified in each exome subject Sibling – 001
(affected) Sibling – 006 (affected)
Uncle – 011 (affected)
Aunt – 010 (unaffected)
All SNVs in-target (autosomes + X chr) 20,665 20,822 10,815 21,930 In-target SNVs excluding synonymous SNVs 13,551 13,413 7,267 14,328 In-target nsSNVs that are nonsense or missense or splice-site excluding any corresponding to position in dbSNP131 (het/homozygous; % homozygous) [% of all nsSNVs]
298 (282/16; 5.4%) [2.2% of all nsSNVs]
306 (289/17; 5.6%) [2.3% of all nsSNVs]
146 (144/2; 1.4%) [2.0% of all nsSNVs]
325 (319/6; 1.9%) [2.3% of all nsSNVs]
All indels (intronic + exonic) 713 726 456 741 Exonic and splice-site indels not in dbSNP131 68 69 46 59 Model# 4 - Rare variants in common to siblings (+/- uncle +/- aunt) [truncating mutations (splice-site/nonsense/fs indels)]
98 SNVs + 5 indels*
[9 truncating (3/1/5)]
Model # 3 - Rare variants in common to siblings +/- uncle (-aunt) [truncating mutations (splice-site/nonsense/fs indels)]
68 SNVs + 1 indels* [4 truncating (3/0/1)]
Model # 2 - Rare variants in common to siblings + uncle (+/- aunt) [truncating mutations (splice-site/nonsense/fs indels)]
14 SNVs + 2 indels* [2 truncating (0/0/2)]
Model # 1 - Rare variants in common to siblings + uncle (- aunt) [truncating mutations (splice-site/nonsense/fs indels)]
9 SNVs + 0 indels* [0 truncating]
*Number of combined variants in each model given after excluding olfactor receptor genes and pseudogenes.; fs=frameshift; nsSNV = non-synonymous single nucelotide variant
For each of the siblings and the aunt, approximately 20,000-21,000 SNVs in the autosomes and X-
chromosome were identified within the target region of the exome, at ≥ 8x depth of coverage and passing
the quality thresholds of the alignment and variant-calling algorithms. For the uncle, the number of
variants called under the same threshold parameters was only half as many as the other samples. For
each of the four samples, approximately one-third of called variants were synonymous and were filtered
out. Further filtering of variants reported in dbSNP131 and present in untranslated regions or in introns
beyond +/- 3bp from exons (i.e. not splice site variants) reduced the number of variants per sample to
approximately 300 variants in the siblings and aunt, and approximately 150 variants in the uncle –
approximately 2% of all nonsynonymous variants (nsSNVs) in each subject. We noted that each sibling
had a higher proportion of unreported homozygous variants compared to the uncle and aunt (Sib 001 and
Sib 006 = 5.5% vs. Aunt = 1.9%, p=0.03 and p=0.02 respectively; Sib 001 and Sib 006 = 5.5% vs. Uncle
= 1.3%, p=0.07 and p=0.04 respectively). While it is possible that some of these may be false calls, this
higher degree of homozygosity in the siblings is expected since their parents are first cousins. Figures 20
to 22 illustrate the distribution of SNVs across the 22 autosomes and X chromosome in each subject; the
pattern of distribution is nearly identical in the siblings and aunt, and fairly similar to the uncle, for the
total SNV group and there was no significant difference in the pattern after excluding synonymous SNVs.
112
However, SNVs not reported in dbSNP131 took on a differing chromosomal distribution, and while the
new pattern remained consistent across the siblings and aunt, the uncle displayed a visibly differing
pattern of variant distribution.
Figure 20 – Genome-wide distribution of all SNVs identified in each exome subject
Figure 21 – Genome-wide distribution of SNVs excluding synonymous variants in each exome subject
113
Figure 22 – Genome-wide distribution of SNVs not reported in dbSNP131 in each exome subject
Around 715-740 indels were identified in each of the siblings and aunts, and about 450 in the uncle, but
most of those were intronic. Combining unreported protein-altering SNVs and indels, each of the siblings
and aunt had approximately 370 potentially significant variants, and the uncle had approximately 200.
4.3 Candidate genes Table 17 lists the genes identified by each filtering model described in the methods section.
Table 17 – Genes containing variants identified by filtration model #1, 2, 3, and/or 4
Filtering Model Variant VariantType GeneName SIFT Polyphen-2
Model#1/2/3/4 chr4#71346606#C#T# nonsynonymous_SNV MUC7 DAMAGING unknown
Model#1/2/3/4 chr1#223568054#G#A# nonsynonymous_SNV C1orf65 TOLERATED benign
Model#1/2/3/4 chr2#68717350#A#T# nonsynonymous_SNV APLF DAMAGING probably damaging
Model#1/2/3/4 chr4#74442178#A#C# nonsynonymous_SNV RASSF6 DAMAGING probably damaging
Model#1/2/3/4 chr9#119802196#A#G# nonsynonymous_SNV ASTN2 DAMAGING possibly damaging
Model#1/2/3/4 chr2#179702237#G#A# nonsynonymous_SNV CCDC141 TOLERATED benign
Model#1/2/3/4 chr9#123886284#A#T# nonsynonymous_SNV CEP110 DAMAGING probably damaging
Model#1/2/3/4 chr2#70503881#T#A# nonsynonymous_SNV PCYOX1 TOLERATED benign
114
Model#1/2/3/4 chr1#229745873#C#T# nonsynonymous_SNV TAF5L TOLERATED possibly damaging
Model#2/4 chr2#48027541#G#A# nonsynonymous_SNV MSH6 TOLERATED benign
Model#2/4 chr9#711331#G#A# nonsynonymous_SNV KANK1 DAMAGING probably damaging
Model#2/4 chr13#25839959#C#T# nonsynonymous_SNV MTMR6 TOLERATED benign
Model#2/4 chr21#37586387#A#C# nonsynonymous_SNV DOPEY2 TOLERATED possibly damaging
Model#2/4 chr10#17726708#G#A# nonsynonymous_SNV STAM DAMAGING probably damaging
Model#2/4 chr10#7774320#(+G) frameshift_indel ITIH2 FRAMESHIFT n/a
Model#2/4 chr3#182737989#(-T) frameshift_indel MCCC1 FRAMESHIFT n/a
Model#3/4 chr16#48177891#G#A# nonsynonymous_SNV ABCC12 DAMAGING probably damaging
Model#3/4 chr12#43886974#T#G# nonsynonymous_SNV ADAMTS20 DAMAGING benign
Model#3/4 chr4#178283483#G#A# nonsynonymous_SNV NEIL3 DAMAGING probably damaging
Model#3/4 chr16#29896915#C#T# nonsynonymous_SNV SEZ6L2 DAMAGING probably damaging
Model#3/4 chr16#3820773#G#A# nonsynonymous_SNV CREBBP TOLERATED unknown
Model#3/4 chr17#79169727#T#C# nonsynonymous_SNV AZI1 TOLERATED benign
Model#3/4 chr11#72406442#T#C# nonsynonymous_SNV ARAP1 DAMAGING possibly damaging
Model#3/4 chr5#39331804#A#G# nonsynonymous_SNV C9 DAMAGING probably damaging
Model#3/4 chr10#96741007#C#A# nonsynonymous_SNV CYP2C9 TOLERATED possibly damaging
Model#3/4 chr11#108389007#G#A# nonsynonymous_SNV EXPH5 DAMAGING probably damaging
Model#3/4 chr17#10432311#C#T# nonsynonymous_SNV MYH2 DAMAGING probably damaging
Model#3/4 chr8#24339740#T#C# nonsynonymous_SNV ADAM7 TOLERATED possibly damaging
Model#3/4 chr22#24939978#C#A# nonsynonymous_SNV C22orf13 TOLERATED benign
Model#3/4 chr16#85813453#G#A# nonsynonymous_SNV COX4NB TOLERATED benign
Model#3/4 chr2#233546356#G#A# nonsynonymous_SNV EFHD1 DAMAGING probably damaging
Model#3/4 chr9#36148648#G#A# splice-site GLIPR2 Not_scored not given
Model#3/4 chr12#44161949#G#A# nonsynonymous_SNV IRAK4 DAMAGING probably damaging
Model#3/4 chr6#39602710#G#A# nonsynonymous_SNV KIF6 DAMAGING probably damaging
Model#3/4 chr18#39542537#A#G# nonsynonymous_SNV PIK3C3 TOLERATED possibly damaging
Model#3/4 chr11#65618312#A#C# nonsynonymous_SNV SNX32 TOLERATED benign
115
Model#3/4 chr9#7103702#A#C# nonsynonymous_SNV KDM4C DAMAGING probably damaging
Model#3/4 chr14#51347190#G#A# nonsynonymous_SNV ABHD12B TOLERATED possibly damaging
Model#3/4 chr9#399245#C#A# nonsynonymous_SNV DOCK8 TOLERATED benign
Model#3/4 chr8#11142438#A#G# nonsynonymous_SNV MTMR9 TOLERATED benign
Model#3/4 chr11#61015898#A#C# nonsynonymous_SNV PGA5 TOLERATED benign
Model#3/4 chr12#112194215#G#A# nonsynonymous_SNV ACAD10 DAMAGING probably damaging
Model#3/4 chr12#124104074#G#A# nonsynonymous_SNV DDX55 TOLERATED benign
Model#3/4 chr5#79809468#G#A# nonsynonymous_SNV FAM151B TOLERATED benign
Model#3/4 chr1#230895257#G#C# splice-site CAPN9 Not_scored not given
Model#3/4 chr2#84670490#C#T# nonsynonymous_SNV SUCLG1 DAMAGING probably damaging
Model#3/4 chr7#6474403#G#A# nonsynonymous_SNV DAGLB TOLERATED benign
Model#3/4 chr14#68060533#G#A# nonsynonymous_SNV PIGH DAMAGING probably damaging
Model#3/4 chr9#99522502#C#T# nonsynonymous_SNV ZNF510 TOLERATED benign
Model#3/4 chr2#211521333#A#G# nonsynonymous_SNV CPS1 DAMAGING benign
Model#3/4 chr8#22211863#G#A# nonsynonymous_SNV PIWIL2 TOLERATED benign
Model#3/4 chr17#14139702#G#A# nonsynonymous_SNV CDRT15 TOLERATED benign
Model#3/4 chr12#21014025#A#G# nonsynonymous_SNV SLCO1B3 TOLERATED possibly damaging
Model#3/4 chr10#123845149#C#T# nonsynonymous_SNV TACC2 DAMAGING probably damaging
Model#3/4 chr12#27571118#A#G# nonsynonymous_SNV ARNTL2 TOLERATED benign
Model#3/4 chr12#27059313#T#C# nonsynonymous_SNV ASUN TOLERATED probably damaging
Model#3/4 chr7#2472653#G#A# nonsynonymous_SNV CHST12 TOLERATED benign
Model#3/4 chr17#76562706#G#A# nonsynonymous_SNV DNAH17 TOLERATED probably damaging
Model#3/4 chr6#6146007#T#C# splice-site F13A1 Not_scored not given
Model#3/4 chr9#72006662#G#A# nonsynonymous_SNV FAM189A2 DAMAGING probably damaging
Model#3/4 chr13#42404723#C#T# nonsynonymous_SNV KIAA0564 TOLERATED probably damaging
Model#3/4 chr9#86482712#C#G# nonsynonymous_SNV KIF27 TOLERATED benign
Model#3/4 chr12#96412989#G#C# nonsynonymous_SNV LTA4H TOLERATED possibly damaging
116
Model#3/4 chr9#100423254#C#G# nonsynonymous_SNV NCBP1 TOLERATED benign
Model#3/4 chr6#126236528#G#C# nonsynonymous_SNV NCOA7 DAMAGING possibly damaging
Model#3/4 chr11#77781071#A#G# nonsynonymous_SNV NDUFC2-KCTD14 DAMAGING probably damaging
Model#3/4 chr14#73717738#G#T# nonsynonymous_SNV PAPLN DAMAGING probably damaging
Model#3/4 chr11#70184526#A#T# nonsynonymous_SNV PPFIA1 TOLERATED benign
Model#3/4 chr12#114352786#C#T# nonsynonymous_SNV RBM19 TOLERATED benign
Model#3/4 chr14#81743769#T#A# nonsynonymous_SNV STON2 DAMAGING possibly damaging
Model#3/4 chr12#10959195#C#T# nonsynonymous_SNV TAS2R8 DAMAGING possibly damaging
Model#3/4 chr6#54173656#G#A# nonsynonymous_SNV TINAG TOLERATED benign
Model#3/4 chr12#29904627#A#G# nonsynonymous_SNV TMTC1 TOLERATED benign
Model#3/4 chr14#74824559#G#A# nonsynonymous_SNV VRTN DAMAGING possibly damaging
Model#3/4 chr16#3142568#C#G# nonsynonymous_SNV ZSCAN10 DAMAGING possibly damaging
Model#3/4 chr22#46643104#(-T) frameshift_indel C22orf40 FRAMESHIFT n/a
Model#4 chr2#21256234#T#C# nonsynonymous_SNV APOB TOLERATED benign
Model#4 chr5#141033932#A#G# nonsynonymous_SNV ARAP3 DAMAGING possibly damaging
Model#4 chr2#127953008#G#A# nonsynonymous_SNV CYP27C1 DAMAGING probably damaging
Model#4 chr6#10704808#A#G# nonsynonymous_SNV PAK1IP1 TOLERATED benign
Model#4 chrX#153609141#C#T# nonsynonymous_SNV EMD DAMAGING benign
Model#4 chr16#81045669#G#A# nonsynonymous_SNV CENPN TOLERATED probably damaging
Model#4 chr14#37737909#C#G# nonsynonymous_SNV MIPOL1 TOLERATED probably damaging
Model#4 chr5#149753777#C#T# nonsynonymous_SNV TCOF1 TOLERATED probably damaging
Model#4 chr6#151626965#T#A# nonsynonymous_SNV AKAP12 TOLERATED benign
Model#4 chr6#83754169#G#T# nonsynonymous_SNV UBE2CBP DAMAGING. probably damaging
Model#4 chr12#9307415#G#A# nonsynonymous_SNV PZP DAMAGING probably damaging
Model#4 chr11#108013182#A#G# nonsynonymous_SNV ACAT1 DAMAGING probably damaging
117
Model#4 chr8#62578060#C#T# nonsynonymous_SNV ASPH TOLERATED probably damaging
Model#4 chr5#75950796#C#T# nonsynonymous_SNV IQGAP2 DAMAGING probably damaging
Model#4 chr9#34256831#G#A# nonsynonymous_SNV KIF24 TOLERATED benign
Model#4 chr19#10664763#C#T# nonsynonymous_SNV KRI1 DAMAGING probably damaging
Model#4 chr12#52841335#G#T# nonsynonymous_SNV KRT6B DAMAGING benign
Model#4 chr6#90438697#T#C# nonsynonymous_SNV MDN1 DAMAGING possibly damaging
Model#4 chr11#102826185#G#A# nonsynonymous_SNV MMP13 TOLERATED benign
Model#4 chr7#47870890#C#T# stopgain_SNV PKD1L1 N/A not given
Model#4 chr1#204226965#A#G# nonsynonymous_SNV PLEKHA6 DAMAGING possibly damaging
Model#4 chr16#74678573#C#T# nonsynonymous_SNV RFWD3 DAMAGING. probably damaging
Model#4 chr17#46000447#T#A# nonsynonymous_SNV SP2 TOLERATED benign
Model#4 chr11#62346444#T#C# nonsynonymous_SNV TUT1 DAMAGING not given
Model#4 chr5#145895520#C#T# nonsynonymous_SNV GPR151 DAMAGING probably damaging
Model#4 chr11#58891961#(-T) frameshift_indel FAM111B FRAMESHIFT n/a
Model#4 chr16#69748923#(-CACT) frameshift_indel NQO1 FRAMESHIFT n/a
Four of the missense variants and two of the indels in the final list of candidates are in olfactory receptor
(OR) genes which we automatically downgrade on our list because they are functionally unlikely to be
cancer susceptibility genes, they are commonly affected by variants, and they have many homologous
pseudogenes that may inadvertently be captured and sequenced. A fifth missense variant belongs to a
pseudogene called RPL21P44, and it was also excluded.
Model#1, comprising variants shared by all three affected relatives and absent in the unaffected aunt,
generated the shortest list with only 9 SNVs and zero indels. Model#2, including shared variants by the
siblings and uncles without incorporating the aunt in the filtration, generated a final list of 16 genes (14
SNVs + 2 indels). Model#3 contained variants shared by the siblings and absent in the aunt, regardless of
whether they were called in the uncle; the final list consists of 69 genes (68 SNVs + 1 indels). Model#4
in our filtration strategy yielded the longest list of variants, producing 98 SNVs and 5 indels shared by the
two siblings irrespective of their status in the uncle and aunt, including 9 protein-truncating variants. No
gene contained more than one novel/rare variant in any model.
118
We also reviewed the list of filtered out variants in untranslated regions of the gene, but found that no
additional genes are added to Model#1 and #2 lists, only 6 additional variants in Model#3, and 11
additional variants in Model#4. These variants are identified separately in Table 18.
Table 18 – Additional candidate variants in untranslated regions shared by exome subjects
Variant Model Gene Position chr12#48131310#G#A# Model#3/4 RAPGEF3 3' UTR chr7#138732309#T#C# Model#3/4 ZC3HAV1 3' UTR chr14#75201644#A#T# Model#3/4 FCF1 3' UTR chr14#94563251#G#C# Model#3/4 IFI27L1 5' UTR chr6#131571655#A#G# Model#3/4 AKAP7 5' UTR chr19#5561228#G#A# Model#3/4 PLAC2 predicted noncodingRNA chr16#69997537#G#C# Model#4 CLEC18A 3' UTR chr9#40772067#C#T# Model#4 ZNF658 3' UTR chr7#74173181#C#T# Model#4 GTF2I 3' UTR chr7#15240888#G#A# Model#4 TMEM195 3' UTR chrX#148627384#A#G# Model#4 CXorf40A 3'UTR
None of the genes with SNVs or indels in our exome data contained coding-region CNVs in the CNV
study, nor were any reported to be associated with pancreatic cancer in published case-control studies (see
Literature Search). Due to time and resource constraints, the focus of the remainder of this chapter is on
discussing the results of model#1, the most stringent and shortest list of candidate susceptibility genes.
Using Sanger sequencing, we validated the missense variants in the 9 genes in the three affecteds and
verified absence in the aunt. Four genes had variants that were identified as damaging by SIFT as well as
Polyphen-2. Moreover, three of those genes have functions that suggest potential importance in tumor
development: APLF (aprataxin and PNKP like factor) has been shown to play a role in DNA single- and
double-strand repair by interacting with members of the PARP (Poly-ADP-Ribose-Polymerase) family567,
and APLF undergoes ATM-dependent hyperphosphorylation following ionizing radiation568; RASSF6
(Ras asssociation (RalGDS/AF-6) domain family member 6) is a Ras effector and candidate tumor
suppressor that is downregulated in some tumors569; and CEP110 (centriolin) encodes a protein required
for centrosome function as a microtubule organizing centre and is associatd with centrosomal
maturation570. A fourth gene, MUC7 (mucin 7, secreted), is overexpressed in pancreatic
adenocarcinoma571; however, we ranked it lower than the other above-mentioned genes since (a) SIFT
and Polyphen-2 did not provide a strong prediction of damaging effect for this variant, likely because it
was poorly conserved, and (b) most hereditary cancer syndromes are caused by inactivating mutations in
tumor suppressor genes that cause decreased expression of the encoded protein, and mucin 7 appeared to
be more of a marker and potential oncogene in pancreatic cancer rather than a tumor suppressor. The
remaining genes (ASTN2, TAF5L, CCDC141, C1orf65, and PCYOX1) were ranked lower on the list of
119
candidates due to lack of evidence linking them to cancer, and most of these variants were predicted to be
benign.
We PCR-amplified and Sanger sequenced each exon of APLF (10 exons) and RASSF6 (11 exons) in a
cohort of approximately 70 pancreatic cancer cases. No novel variant was identified in either gene in the
screening cohort. CEP110 was not screened in the same manner due to its very large size (42 coding
exons), and instead it will be investigated for variants in other subjects using future data from planned
whole-exome sequencing of 75 additional familial pancreatic cancer patients.
5. Discussion We have presented a list of candidate susceptibility genes for FPC by performing exome sequencing in a
family with a strong history of pancreatic cancer in two of seven siblings, their mother, and a maternal
uncle. Initially, our plan was to filter variants shared by the three affected members (2 siblings +
maternal uncle) while excluding variants present in the aunt (who was unaffected by age 80). This model
is based on an autosomal dominant mode of inheritance of a relatively high-penetrance gene. However,
since we do not actually know the penetrance of the gene in question, we also decided to account for the
possibility that the unaffected aunt may be a carrier. Thus model#2 comprised genes with variants shared
by the three affecteds irrespective of the status in the aunt. This approximately doubled the number of
candidate genes (16 vs. 9), but the list size remained manageable. Interestingly, the model#1 list, while
containing three functionally interesting genes, did not have any truncating mutations, whereas model#2
yielded two frameshift indels. Most familial cancer syndromes are caused by tumor suppressor genes that
segregate protein-truncating mutations in the affected members of the family. Nonetheless, although
several additional genes in the model#2 group are of potential interest, we elected to focus our
investigation on top candidates in model#1 for the purpose of this thesis due to time and resource
constraints.
One of the most interesting genes in our list is APLF, encoding a protein that has been demonstrated to
participate in DNA repair and is also thought to be a histone chaperone567,568,573. The DNA repair genes
BRCA2, PALB2, and ATM have all been linked to FPC in recent years, suggesting the importance of this
pathway in pancreatic tumorigenesis. However, a Sanger-based screen of all 10 exons of APLF in ~70
unrelated pancreatic cancer subjects yielded no novel variants. Similarly, RASSF6 is appealing as a
susceptibility gene in pancreatic cancer due to its regulatory effect on Ras, a protein whose activation has
been demonstrated in the majority of pancreatic adenocarcinomas and is an early event in tumorigenesis.
Sanger sequencing of the 11 exons of RASSF6 also failed to show novel variants in the screening cohort.
We note that the variants affecting each of these gene in Family C are rare (~0.2%), and both were
120
predicted to be damaging by both SIFT and Polyphen-2. This emphasizes the challenge inherent in using
exome sequencing in a single family, particularly with closely-related relatives, for attempting to identify
the genetic cause of a familial cancer syndrome. The presence of many potentially deleterious variants in
the exome of any individual has been well demonstrated by multiple whole-genomes and exomes
published to date. (See Literature Search for details). The successful studies that used exome sequencing
to identify high-penetrance cancer genes in autosomal dominant syndromes did so either by accessing
paired-tumor sequence to identify second hits or else by sequencing multiple unaffected individuals.
Whole exome sequencing does not yield good results from formalin-fixed paraffin-embedded (FFPE)
tumors, and the only resected specimen available in Family C belonged to the mother and was indeed
FFPE.
An alternative method of guiding exome data filtering in autosomal dominant syndromes is with linkage
analysis data, as has been demonstrated in several studies in other Mendelian diseases. As described in
the Literature Search, our group is part of a multi-centre consortium that has collected eligible families for
linkage analysis. Unfortunately, to date no useable results have been generated to allow us to guide our
exome sequencing. We also did not find any of our variants among the genes reported to be associated
with pancreatic cancer in case-control studies.
Our study had some technical limitations; perhaps the most significant was the lower depth of coverage in
the uncle’s exome compared to the other sequenced samples, which resulted in only half as many variants
being called in the uncle as in each of the siblings and aunt. Importantly, the distribution of novel SNVs
across chromosomes differed between the uncle and the other three subjects; suggesting that the uncle’s
decreased coverage is not evenly distributed across the genome and some chromosomes appear to be
particularly under-represented compared to the siblings and aunt (e.g. chromosomes 7 and 12). Sanger
sequencing indicated that the specificity of variant calling in the uncle was equivalent to that of the other
subjects but the sensitivity was significantly lower in the uncle. For this reason, we also considered
models that did not take the uncle’s data into account (#3 and #4). These analyses produced a much
longer list of candidate genes (75-110, depending on whether the aunt’s exome was used to filter out
variants). Those genes are too numerous to be individually screened in other pancreatic cancer patients
using Sanger sequencing. We present those genes here as additional candidates, and anticipate that data
from additional exomes will facilitate variant filtration and allow screening of interesting genes in a more
cost-effective manner.
Another limitation observed in our data is the low specificity of homozygous variant calls. It is not clear
what is causing these erroneous calls, and certainly it raises the importance of individual Sanger
validation of any homozygous variant. However, we note that all the homozygous variants we found to be
121
inaccurately called were at positions reported as SNPs in dbSNP; the only two novel homozygous
variants validated by Sanger were actually true calls. This suggests that homozygous calls in our final
filtration models may still have a higher validation rate than observed from the comparison to SNP chips.
Had our analysis been based on an autosomal recessive model of inheritance, this issue would have been
of greater significance (as we would have focused on homozygous variants in the siblings). In any case,
only three variants in any of our models were called as homozygotes (one of which we had successfully
validated by Sanger), and they were only present in the model#3 and #4 lists.
In conclusion, we present a list of candidate susceptibility genes for familial pancreatic cancer based on
exome sequencing of three affected members and one unaffected member of a single family. Our
screening of two top candidates in a cohort of unrelated cases failed to identify novel variants to support
the role of these genes in pancreatic cancer causation. However, other potential candidates remain to be
investigated and further screening of those candidates will be facilitated by large-scale exome sequencing
of other families.
122
Chapter 5 - General Discussion, Conclusions, and Future Directions
General Discussion
The overall aim of my research has been to better understand genetic susceptibility to pancreatic cancer, a
highly lethal malignancy that has dismal outcome for the majority of affected patients. More specifically,
I am interested in relatively highly-penetrant genetic variants that explain some or most of familial
pancreatic cancer (FPC), the autosomal dominant syndrome that has been proposed to explain clustering
of pancreatic cancer in families, often occurring at a younger age of onset than in sporadic cases. The
benefits of identifying such susceptibility genes include: to facilitate development of early-detection and
intervention by enriching trials with subjects that carry known predisposition genes; to calculate the
attributable risk of a particular variant through case-control and/or cohort studies, allowing more accurate
estimation of individual risk in members of FPC families and providing more informed genetic
counseling to such individuals; to identify individuals who may benefit from specific forms of therapy
that target the specific pathways implicated in tumorigenesis and to enable development of targeted
biological therapies.
To date, only a small proportion of hereditary pancreatic cancer cases is attributable to mutations in
specific genes, almost all of these occurring in the context of rare cancer syndromes such as Peutz-Jeghers
Syndrome or Familial Atypical Multiple Mole Melanoma. The most frequently identified mutated gene
in hereditary pancreatic cancer cases is BRCA2, accounting for up to 19%103 of pancreatic cancer families
and conferring an estimated lifetime risk of up to 5%.502 Often, BRCA2 families demonstrate other
associated cancers as well, particularly breast or ovarian cancer; however, a subset of BRCA2-associated
pancreatic cancer patients have no family history of other cancers, and indeed this gene has even been
implicated in apparently sporadic cases.112,113 Given this well-established link between BRCA2 and
pancreatic cancer, investigators have sought to determine if a similar association exists with BRCA1.
Indeed, as discussed in detail in Chapter 1, multiple studies have suggested that BRCA1 increases risk of
pancreatic cancer, albeit to a lesser extent than BRCA2. However, most of previous studies have been
criticized for being biased by their family-based design and population-based studies have produced
conflicting results. Notwithstanding these limitations, I felt that the role of BRCA1 in pancreatic cancer
required further consideration, not only for the value of providing more complete genetic counseling to
affected families and possibly including carriers in screening studies, but also because of the recent
accumulation of anecdotal reports indicating that BRCA1 and BRCA2 mutation carriers respond well to
certain chemotherapies (e.g. platinum-based chemotheraphy, PARP-1 inhibitors) which targeted the
123
impaired DNA repair system resulting from BRCA1/2 gene inactivation in these tumors. At the time of
conducting my study, our research group had collected seven FFPE-tumor specimens from pancreatic
cancer patients with confirmed germline BRCA1 mutations. Therefore, for the first section of my thesis, I
decided to conduct a loss-of-heterozygosity (LOH) analysis on these samples and compare with nine
sporadic cases that have no known BRCA1 mutations or familial history of breast/ovarian cancer. I
hypothesized that tumors with germline heterozygous inactivating mutations in BRCA1 demonstrate loss
of the remaining functional allele. My analysis indeed demonstrated that LOH at the BRCA1 locus was a
common event in tumors of mutation carriers, with evidence of loss of the functional allele, occurring in
5/7 BRCA1-mutation carriers while only 1/9 sporadic cases demonstrated LOH.
The limitations of my study, namely small sample size and the variable quality of DNA extracted from
FFPE tissue, are challenges that characterize the field of pancreatic cancer research. Due to the rapid
lethality of pancreatic cancer, only a small percentage of patients undergo resection before death.
Moreover, most specimens available for research exist as paraffin blocks of formalin-fixed tissue;
formalin fixation causes cross-linking of nucleic acids, often resulting in degradation of DNA and RNA.
For those reasons, molecular analyses of pancreatic tumors are fraught with difficulties and potential
biases. In my analysis, I attempted to circumvent the potential bias of DNA degradation by selecting
microsatellite markers that generate small amplicons, well below the lower limit of expected DNA
fragments in FFPE tissue (180bp).
To my knowledge, this is the first LOH analysis using familial pancreatic cancer cases with deleterious
BRCA1 mutations. Only two molecular studies previous to mine had investigated BRCA1 in pancreatic
tumors, and both assessed sporadic tumors only. Beger et al.510 found decreased mRNA and protein
expression of BRCA1 in half of 50 pancreatic cancers, with worse 1-year survival in the group with
decreased expression. Peng et al.523 reported frequent BRCA1 methylation in sporadic pancreatic cancers.
No additional studies have since been reported. Interestingly, although sporadic breast and ovarian
cancers do not usually have somatic BRCA1 mutations, they have been reported to have frequent LOH
events at the BRCA1 locus, prompting speculation about potential haploinsufficiency of BRCA1 in these
tumors that drives further genetic alterations.574 My findings suggest that sporadic pancreatic cancer
cases do not have frequent loss at the BRCA1 loss; this would be consistent with Peng et al.’s523 findings
of methylation being a frequent event, since it would function as an alternative to LOH for gene
inactivation. However, I acknowledge that my small sample size, due to the scarcity of resected tumor
samples from pancreatic cancer patients and particularly those with BRCA1 germline mutations, limits the
generalizability of my results. Further investigation of molecular alterations of BRCA1 in pancreatic
tumors is needed on a larger scale before drawing more conclusions regarding its mechanism of action in
the pancreas. Nonetheless, although my findings do not definitively implicate BRCA1 as a familial
124
pancreatic cancer gene, they certainly suggest such a role for this gene and indicate that larger
epidemiologic studies need to be conducted to establish the risk associated with BRCA1 mutations and
pancreatic cancer.
While my first study contributed toward understanding the role of a specific candidate gene (BRCA1) in
pancreatic tumorigenesis, the expected attributable risk of this particular gene to familial pancreatic
cancer is fairly low. Several approaches can be taken to identify genetic predisposition for the majority of
FPC cases not linked to a known gene. Candidate genes can be identified based either on function or
connection to the pathway of another established susceptibility gene, which was the rationale for pursuing
BRCA1. It is possible to screen high-risk pancreatic cancer patients for mutations in additional genes
associated with BRCA1 or BRCA2, or even other genes in pathways that have been implicated in
pancreatic tumorigenesis from somatic studies37, but performing Sanger sequencing on all coding regions
of each candidate gene is a costly and laborious process. Furthermore, the functional and pathway
properties of many genes are incompletely understood at this time, thus biasing the investigation to the
relatively small proportion of genes that have been well annotated thus far. One can also derive a
candidate gene list for screening in high-risk subjects based on results of genome-wide association studies
conducted on a large number of sporadic cases; as would be expected, these variants are invariably
associated with low odds ratios in sporadic cases, but some may be of greater significance in smaller
populations enriched for familial cases. However, most variants identified by genome-wide association
studies are not within coding sequences, requiring further fine-mapping and delineation of the actual
genes affected.
Under ideal conditions, genetic linkage analysis would be a powerful approach for identifying high-risk
variants segregating with a disease that is inherited in an autosomal dominant fashion in family-based
studies. Indeed, much effort has been invested in collecting families with multiple cases of pancreatic
cancer in closely-related members for the purpose of performing genetic linkage. One of the largest such
projects has been undertaken by the PACGENE consortium (described in the Literature Search), which
has been investigating FPC genetics for about 10 years. Thus far, no linkage results have been released
by PACGENE, and indeed only one FPC linkage analysis has been published by any group to date, in a
single high-risk family that does not resemble most FPC cases.187 The latter found evidence of linkage to
a region on chromosome 4q and proposed the gene of interest to be Palladin; however, multiple
subsequent analyses of Palladin in high-risk populations refuted it as a likely FPC gene. Genetic linkage
analysis is a statistics-based method that requires a sufficient number of genotyped affected and
unaffected members in a family to generate power for detecting regions segregating with disease status.
It is significantly weakened if there is genetic heterogeneity (i.e. multiple loci involved in causing the
same phenotype) or if some of the affected subjects are phenocopies. Moreover, linkage analysis alone
125
cannot pinpoint the causative gene, as illustrated by the aforementioned 4q linked region and the failure to
determine the responsible gene in that region. For all those reasons, I elected to approach the FPC
question from two novel directions: mapping the copy-number variable portion of the genome in a cohort
of probands from high-risk families and mapping the whole exome (single nucleotide variants and small
indels) of members of a single high-risk family.
The 2004 seminal papers demonstrating that structural variation of the human genome is detectable in all
individuals, regardless of phenotype or disease status, generated a paradigm shift in the field of
genomics.197,198 After multiple reports established that CNVs are a significant source of genomic
variability, attention turned to investigating their association with disease. To date, the majority of such
studies have been in diseases other than cancer, particularly the neuropsychiatric disorders; however,
copy number alteration is in fact a well-known characteristic of tumor genomes, often causing the
inactivation or amplification of important cancer-suppressing or cancer-driving genes, respectively.
Furthermore, germline genomic rearrangements represent a well-recognized mechanism of heredity in
familial cancer syndromes, usually affecting a small but non-negligible portion of cases. When I
embarked on this study, only two published report of germline CNVs in familial cancer syndromes were
available. The first was a survey of CNVs in 57 FPC subjects using an oligonucleotide-based CGH
array.345 This study presented several candidate regions, but lacked in array resolution and coverage,
sample size, and the size of the control dataset available for data filtration. The second report was based
on Li-Fraumeni syndrome patients who carry TP53 mutations348: the authors found that patients with
germline TP53 mutations have a significantly more unstable genome, manifested as higher frequency of
germline copy number variation than control genomes. They proposed that the increased frequency of
CNVs in Li-Fraumeni genomes predisposes to somatic expansion of deletions or duplications that affect
cancer-suppressing or cancer-driving genes, respectively. Since pancreatic cancer contains a high degree
of somatic genome instability, I hypothesized that the genomic profile of germline CNVs in FPC patients
may be distinct from that of controls. Furthermore, I hypothesized that identifying germline deletions or
duplications in cases that are not observed in healthy controls would generate a list of candidate
susceptibility genes for FPC.
For the third chapter of my thesis, I focused on a single family that was part of my CNV study. This
family contained two siblings (in a sibship of seven) who had died of pancreatic cancer at young ages
(30s and 40s), and whose mother and maternal uncle also died of the disease. At the time of this study,
the technology for sequencing most of the coding region of the genome (i.e. the exome) had become
accessible for considerably lower expense than in the past. Many studies had been published describing
the use of whole-exome analysis to pinpoint the causative variant in rare Mendelian disorders. Only one
report applying whole-exome sequencing to familial cancer had been published, showing PALB2 to be a
126
susceptibility gene for FPC. Notably, this latter paper did not use exome-capture and next-generation
sequencing as with all other reports, but was based on a large-scale Sanger-sequencing based effort to
sequence pancreatic tumors and paired blood-derived DNA to identify germline variants. I hypothesized
that whole-exome sequencing would reveal susceptibility genes in this high-risk family by identifying
rare variants shared by affected members.
My CNV results refuted the first part of my hypothesis, indicating that no discernible difference in
genome stability or other CNV characteristics exist between FPC cases and healthy controls. Since
conducting my study, only a couple of other such studies have been published in familial/hereditary
cancer populations.346,347 Neither offered much beyond a list of susceptibility genes, as we have done, and
neither described a significant difference in the frequency of germline CNVs between cases and controls.
While it is difficult to draw firm conclusions based on only a few studies, thus far there is little to suggest
that the phenomenon observed by Shlien et al.348 in Li-Fraumeni patients is replicated in other familial
cancer cases. TP53 is known to act as the “guardian of the genome”.575 Given our observations, we
would conclude that most FPC cases are not caused by mutations in genes with a similar impact on
genomic stability. Furthermore, CNVs in general do not appear to play as significant a role in
susceptibility to most familial cancers as they do in other diseases like neuropsychiatric and
developmental disorders.
Both the CNV study and the whole-exome analysis relied on relatively novel technology and were
significantly dependent on recently developed bioinformatic tools, and as such both had limitations
related to the technology and/or the available resources for analyzing the data. In the CNV study, the
Affymetrix GeneChip Human Mapping 500K SNP array used for CNV detection, consisting of two chips
that together genotype approximately 500,000 genome-wide SNPs, was originally designed for the
purpose of accurate SNP genotyping to enable sufficiently powered SNP-based genome-wide association
studies. As such, SNPs selected for inclusion in the array underwent rigorous validation for accuracy of
genotype, call rate, and linkage disequilibrium in different populations, but the probe design was not
optimized for accurate copy number. The median physical distance between SNPs on the array is 2.5kb,
but the density of genotyped SNPs across the genome is not uniform resulting in excellent coverage for
some regions and incomplete or entirely absent coverage in others. Nonetheless, at the time of my study
design, this array was one of the highest-resolution and best coverage platforms available for CNV
detection. When subsequent generations of CNV detection platforms were developed, it became evident
that most common CNVs tend to not be captured well by the Affy 500K array, due to this bias of SNP
distribution. However, this was not a significant concern for my analysis since I was specifically
interested in rare or low-frequency deletions or duplications. Since the use of the Affy500K array for
CNV analysis only began shortly before the design of my study, new algorithms had to be developed to
127
analyze the data with variable sensitivity and specificity. Therefore, it was necessary to use multiple
algorithms, and moreover I needed to demonstrate an approach that generates a well-validated set of
CNVs. I utilized qPCR to validate a subset of CNVs, but given the time- and resource-consuming nature
of individually validating individual CNVs in this manner, I also performed a secondary CNV analysis on
a subset of samples using a newer array (the Affy6.0). Therefore, I generated a set of high-confidence
CNVs with a validation rate of 95% or higher and due to logistical constraints I did not address any of the
remaining low-confidence CNVs. Since my validation experiments suggested that approximately half of
the 491 low-confidence case CNVs are likely to be real, it is likely that my approach missed some
additional FPC-specific CNVs containing candidate genes. Future investigations of FPC cases using
newer and higher-resolution platforms would serve to validate my results as well as fill the gaps in
coverage due to limitations of the 500K array and my analysis strategies.
Similarly, the technologies and algorithms used for studying the high-risk Family C were rapidly evolving
even as I was conducting my study. First, no target-capture array available to me at the time of my study
targeted 100% of coding regions in the genome, but rather they aimed to capture most of the well-
annotated coding regions. Even then, technical problems sometimes result in incomplete capture of this
target region. One of my samples, from the uncle in Family C, could not be sequenced to the same depth
of coverage as the remaining samples due to technical problems, resulting in a significantly lower number
of variants called in this individual. Since my hypothesis relied to a greater extent on filtering unshared
variants between the affected cases, and since the uncle’s second-degree relation to the siblings means
that he is expected to share fewer variants with the siblings than they share with each other, the
incomplete variant list generated in the uncle invariably meant that I would almost certainly miss
potential candidate genes if I included the uncle’s exome data. To address this shortcoming, I presented
alternative filtering models that did not necessarily exclude variants shared by the siblings but not called
in the uncle. As expected, these models generate considerably longer variant lists and require other
methods of prioritizing the results for further investigation. Furthermore, since my project was conducted
as a collaboration with the laboratory that performed the whole-exome sequencing, I was not directly
involved in running the analysis pipeline implemented by their group. I was able to validate the resultant
variant calls and determined that the dataset generated by this pipeline was accurate for both heterozygous
single variant and indel calls; validation of homozygous variant calls, however, was significantly lower. I
could not directly assess sensitivity on a large scale and so it is possible that additional true variants were
missed. Therefore, as with the CNV analysis, I needed to prioritize high specificity of variant calling at
the expense of slightly lower sensitivity so that I could work with a reliable dataset for downstream
analysis.
128
Another common theme to both types of analyses is the large number of variants generated for each
sample, even after applying quality controls to ensure maximum validity of data. Clearly, this is a direct
result of the higher resolution and genome-wide coverage of these approaches compared to older
techniques that assess only one or a few genomic regions or genes at a time. While such high coverage is
one of the primary attractive features of these technologies, it also creates significant challenges in
interpretation and prioritization of data. One component of data prioritization in my studies was the focus
on “rare” variants; since I am most interested in identifying variants with a relatively high effect size to
explain familial inheritance of pancreatic cancer, the frequency of such variant in the general population
is expected to be very low. The identification of a rare variant posed some interesting challenges for the
CNV and the exome analyses. To interpret the significance of CNVs, particularly in the context of my
hypothesis, I needed to have a control set for comparison to the cases. Approximately 45 spousal controls
were selected for genotyping alongside the cases; genotyping additional controls was not feasible at the
time due to financial constraints. Instead, I took advantage of a large control cohort that was previously
genotyped on the same Affy 500k array for a genome-wide association study of colorectal cancer
(ARCTIC). Approximately 1,100 controls were genotyped at a different facility from the cases, but I
analyzed these controls in a parallel manner to the cases, applying the same algorithm parameters and
filtering rules. It became evident during analysis that there was a greater level of noise in the ARCTIC
controls, manifesting as a greater proportion of control CNVs that were “low-confidence”. This
highlights the importance of study design in facilitating CNV analysis, which is more sensitive to “batch-
effect” than SNP studies. These data also suggests that some real CNVs in controls may be missed in our
analysis, and if those regions overlap rare CNVs in cases then they would be inaccurately identified as
candidate FPC-specific CNVs under our hypothesis. To address this concern, I noted the FPC-specific
CNVs that overlapped a low-confidence CNV in controls and validated the region before investigating
that region further. Furthermore, I also utilized the Database of Genomic Variants (DGV), but the quality
of data in this resource is directly linked to the limitations of the platform and algorithms used in each
source publication. While I was unable to determine the accuracy of each data source, I chose to exclude
CNVs detected by studies that used BAC clone arrays because those were later demonstrated to greatly
overestimate CNV size.
For filtering the exome variants, I turned to the dbSNP database which is continuously updated and
houses a large set of single base as well as indel variants. Older versions of dbSNP were largely
populated by data from the HapMap study, which mostly identified common variants present at a
population frequency of > 1%. However, as more human genomes were being sequenced in their
entirety, including results from the 1000 genome project and the Exome Sequencing Project, the dataset
became more difficult to interpret since most variants were not adequately validated and/or their
129
population frequency were not calculated, and many variants had a minor allele frequency < 1%. Indeed,
for my exome analysis, I decided to use a relatively strict definition of “rare” (< 0.2%) since variants with
higher frequencies have been described as “low-frequency” variants and some have been demonstrated to
have an intermediate effect size on disease predisposition rather than the high-penetrance effect in which I
am interested.494,495 It should be noted that indel reporting in dbSNP is significantly less accurate than
single nucleotide variants, particularly from next-generation sequencing platforms. As such, the accurate
determination of population frequency of indels is even more challenging. Moreover, dbSNP has been
contaminated with somatic variants found in tumors and other potentially pathogenic germline variants in
cancer. Therefore, I performed a careful screen of my final dataset to ensure that I did not filter out a
variant linked to cancer if the frequency of the variant was low.
Beyond filtering by frequency of variants, I attempted to take advantage of common phenotypes. For
CNVs, I attempted to identify CNVs present in multiple cases (but not in controls), but ultimately found
none (except for the TGFBR3 duplication, discussed below). For the exome data, I filtered by shared
variants among the three affected relatives, incorporating an unaffected family member as a negative
control (i.e. to filter out variants identified in this relative). My rationale for doing so was that Family C
had a very strong history of pancreatic cancer occurring at young ages in most of the affecteds, and thus
the unaffected 80-year-old aunt seemed significantly less likely to be a carrier of the putative high-
penetrance variant responsible for the disease in this family. Indeed, I modeled our primary filtering
approach on this premise, and it successfully reduced the number of eligible candidate genes to a
workable size. However, since I do not know the actual penetrance of the variant in question, I risked
losing the actual causal FPC gene by excluding all variants found in the aunt. I offered alternative
filtering models that took this ambiguity into account, and they generated significantly longer lists of
candidate genes.
In addition to using other cases (or family members) to filter variants, I turned to functional annotation.
For my CNV data, I focused on coding region variants and turned to available databases containing
somatic cancer variants (COSMIC) and pancreatic expression data (Pancreas Expression Database) to
annotate involved genes. While many genes did have potential connections to pancreatic cancer or
carcinogenesis in general, it was evident that none were immediately obvious candidates. This again
emphasizes the limitations of available functional annotation for most genes, and the challenge in
utilizing this approach to identifying susceptibility genes. Similarly, I attempted to prioritize variants
from my exome analysis based on likelihood to damage protein function (using two well-known
algorithms), as well as referring to the aforementioned databases for gene annotation. However, it is
difficult to be certain of the accuracy of prediction for any one variant, particularly if the prediction is
“benign” or “tolerated”, without adequate functional assays.
130
In both the CNV and exome analyses, I selected top-prioritized candidate genes for further investigation.
In the CNV study, overlapping duplications in two unrelated cases were found to intersect TGFBR3, a
receptor gene in the TGF-beta pathway that is of importance in the initiation and progression of
pancreatic cancer. This region overlapped only one duplication in controls, but with different breakpoints
from the case CNVs. Importantly, the control CNV did not appear to extend into the gene except for a
small part of one isoform that was longer than most other isoforms. I conducted a series of experiments
to validate the duplications in the cases, demonstrate heritability of the CNV in members of one of the
subjects’ families, delineate the exact location of the CNV breakpoints, and sequence the amplicon
containing tandem duplication breakpoints. However, an affected sister of the proband with only FFPE
tissue available did not harbor the duplication, indicating that it does not segregate with disease in that
family. In my exome analysis, I performed Sanger sequencing of all exons in the two top-ranked genes
identified by filtering Model #1 (rare variants shared by the three affecteds, absent in the unaffected
relative). Each gene had an exome variant predicted to be damaging, and both were reported to be have
potential tumor-suppressor roles. Yet, I did not find any other rare variants in the ~70 unrelated cases that
I screened.
These results raise several important issues. First, they highlight the significant challenge associated with
using a limited number of samples in genome-wide analyses such as CNV surveys or exome sequencing.
In the case of CNVs, since only a small percentage of all FPC cases attributed to a particular gene would
be expected to have a genomic rearrangement rather than a single base mutation or indel in that gene, a
small sample size reduces the likelihood of identifying multiple cases with the same affected gene. This
is particularly more challenging due to genetic heterogeneity. The fact that linkage analysis on the best
available families to date has failed to generate strong locus-specific linkage scores strongly suggests that
the families included in the analysis have different causal genes. Alternatively, there may be inaccuracy
in identifying FPC families, leading to inclusion of subjects who do not carry a high-penetrance variant.
For the exome analysis, it is evident that every individual genome contains a large number of low-
frequency or rare variants, many of which appear to be potentially damaging. Therefore, in a family-
based design, it is most helpful to sequence multiple affected subjects who have some genetic distance
(i.e. not just first-degree pairs) to maximize the filtering potential of identifying shared variants. Even
then, use of whole-exome data in a single family to identify a dominant-acting variant is difficult. Most
successful exome analyses of dominant Mendelian diseases have used more than one family, or at least in
the case of cancer they have utilized data from paired tumor genome to identify second-hits in candidate
genes. Genetic heterogeneity may also pose a problem in this setting, since the accepted method of
conclusively demonstrating involvement of a gene in prediposition to familial cancer is by identifying
rare deleterious variants in the same gene in other unrelated cases. However, if there are many different
131
genes that cause the disease, the possibility exists of “family-specific” genes (or more likely, genes acting
in a small percentage of families). This makes the decision to discard genes that do not demonstrate
variants in other samples difficult. Finally, there always remains the possibility that a non-coding variant
(whether a CNV or SNV/indel) may in fact be the causative agent. The reason for prioritizing coding
region of the genome in these types of analyses is more practical rather than dogmatic: while it is evident
from a number of studies that apparent “gene deserts” or unexpressed regions of genes such as introns can
impact gene expression (short- or long-range), there is little to no annotation of those regions to allow
prioritization and interpretation of the potential variant effect. Given that genic regions alone generate
sufficiently long lists of candidate genes, many studies, including mine, elect to ignore the non-genic
regions. However, should extensive investigations of the exome fail to yield answers, it will become
necessary to cast a wider net and characterize non-coding variants.
Conclusions I have successfully tested and proven my first hypothesis (that LOH occurs frequently at the BRCA1 locus
in pancreatic tumors from germline BRCA1-mutation carriers), thus contributing novel information to
understanding the role of BRCA1 in susceptibility to pancreatic cancer. For my second hypothesis, I
found no evidence of a distinct CNV profile in high-risk pancreatic cancer cases relative to controls but
demonstrated that FPC-specific losses and gains overlap some genes that have the potential to be involved
in pancreatic tumorigenesis. My data constitute the most comprehensive set of annotated germline CNVs
in high-risk familial pancreatic cancer patients to date. Finally, for the third part of my thesis, I applied a
heirarchical filtering approach to generate a list of candidate susceptibility genes responsible for FPC.
Similar to the list of genes generated by my CNV analysis, the exome candidates include many that have
a potential role in tumorigenesis. The combined list of genes generated by my thesis represents an
important resource for future studies of candidate FPC susceptibility genes.
Future Directions
As discussed above, a number of follow-up investigations flow naturally from the results of my studies,
including: validation of detected variants using more uptodate, higher resolution platforms and larger
sample sizes; sequencing the entire coding region of candidate genes identified by the CNV and/or exome
analysis in additional cases; and performing additional exome sequencing on other families to increase the
power to detect additional variants in the same gene(s).
In addition, several new directions may be taken in the future for the investigation of heritable
susceptibility to familial pancreatic cancer. One limitation to my studies was the focus on protein-coding
132
genes as the causative agent for heritability of pancreatic cancer. In part, this was necessary because of
the relative lack of annotation of non-coding regions of the genome and the challenge of studying such
regions. Another constraint is the single-view approach of each study; only one platform was utilized at a
time, and generalizing the results of different platforms used in different samples is challenging. A more
valuable approach would be to integrate data from multiple profiling techniques (e.g. genomic,
epigenomic, transcriptomic, immunohistochemistry) for specimens from the same individuals, thus
allowing for a more comprehensive assessment of potential hertiable factors in disease susceptibility. Of
course, there are practical limitations to such an approach, foremost among them the challenge of
obtaining pancreatic tumors from familial cancer patients due to the high mortality of the disease.
However, the aforementioned ICGC consortium has been addressing this issue by prospectively
collecting tumor specimens and developing xenografts and cell lines to allow further investigations on
recruited subjects.
An important question that arises after considering the results of my studies is whether a significant
portion of familial pancreatic cancer cases can be explained by relatively highly-penetrant variants in a
single gene. The fact that I did not find evidence for one gene being affected by deleterious variants in
more than one family suggests the possibility of many private genes contributing to familial pancreatic
cancer in different families. This would make the identification of such susceptibility genes considerably
more difficult. Certainly, functional analyses genes would become much more important in delineating
the causative agents, but pathway analysis may aid in identifying genes affected in different individuals
that lead to similar outcomes (i.e. pancreatic cancer development).
Another possibility that must be considered is the role of intermediate-effect variants and gene-gene
interactions within the same individual. Recently, our group has found evidence of rare deleterious
variants in cancer-predisposing genes that do not segregate with all pancreatic cancer patients in the same
family. While the non-carriers may be phenocopies, this observation also raises important questions
about the extent of genotyping that should be performed in a given family before attributing familial
cancer to a specific gene, and the importance of more extensive population data in understanding the
effect size of rare variants. Such data is forthcoming from large-scale exome and genome-sequencing
projects (such as the 1000 Genomes Project and the Exome Sequencing Project), but it also requires the
assessment of much larger FPC cohorts.
133
References
1. Hruban RH, Fukushima N. Pancreatic adenocarcinoma: update on the surgical pathology of
carcinomas of ductal origin and PanINs. Mod Pathol. 2007 Feb;20 Suppl 1:S61-70.
2. Howlader N, Noone AM, Krapcho M, et al. (eds). SEER Cancer Statistics Review, 1975-2008,
National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2008/, based on November
2010 SEER data submission, posted to the SEER web site, 2011.
3. Canadian Cancer Society’s Steering Committee on Cancer Statistics. Canadian Cancer Statistics
2011. Toronto, ON: Canadian Cancer Society; 2011.
4. Tada M, Nakai Y, Sasaki T, et al. Recent progress and limitations of chemotherapy for pancreatic
and biliary tract cancers. World J Clin Oncol. 2011 Mar 10;2(3):158-63.
5. Cleary SP, Gryfe R, Guindi M, et al. Prognostic factors in resected pancreatic adenocarcinoma:
analysis of actual 5-year survivors. J Am Coll Surg. 2004 May;198(5):722-31.
6. Sipos B, Frank S, Gress T, et al. Pancreatic intraepithelial neoplasia revisited and updated.
Pancreatology. 2009;9(1-2):45-54.
7. Hruban RH, Goggins M, Parsons J, et al. Progression model for pancreatic cancer. Clin Cancer Res.
2000 Aug;6(8):2969-72.
8. Yamaguchi K, Yokohata K, Noshiro H, et al. Mucinous cystic neoplasm of the pancreas or intraductal
papillary-mucinous tumour of the pancreas. Eur J Surg 2000;166(2):141–148.
9. Tanaka M, Chari S, Adsay NV, et al. International consensus guidelines for management of
intraductal papillary mucinous neoplasms and mucinous cystic neoplasms of the pancreas.
Pancreatology 2006;6(17):32.
10. Canto MI, Goggins M, Yeo CJ, et al. Screening for pancreatic neoplasia in high-risk individuals: an
EUS-based approach. Clin Gastroenterol Hepatol. 2004 Jul;2(7):606-21.
11. Canto MI, Goggins M, Hruban RH, et al. Screening for early pancreatic neoplasia in high-risk
individuals: a prospective controlled study. Clin Gastroenterol Hepatol. 2006 Jun;4(6):766-81.
12. Abe K, Suda K, Arakawa A, et al. Different patterns of p16INK4A and p53 protein expressions in
intraductal papillary-mucinous neoplasms and pancreatic intraepithelial neoplasia. Pancreas. 2007
Jan;34(1):85-91.
13. Tanno S, Nakano Y, Nishikawa T, et al. Natural history of branch duct intraductal papillary-mucinous
neoplasms of the pancreas without mural nodules: long-term follow-up results. Gut. 2008
Mar;57(3):339-43.
14. Al-Sukhni W, Borgida A, Rothenmund H, et al. Screening for pancreatic cancer in a high-risk cohort:
an eight-year experience. J Gastrointest Surg. 2012 Apr;16(4):771-83.
134
15. Jeurnink SM, Vleggaar FP, Siersema PD. Overview of the clinical problem: facts and current issues
of mucinous cystic neoplasms of the pancreas. Dig Liver Dis. 2008 Nov;40(11):837-46.
16. Maitra A, Hruban RH. Pancreatic cancer. Annu Rev Pathol. 2008;3:157-88.
17. Maitra A, Fukushima N, Takaori K, et al. Precursors to invasive pancreatic cancer. Adv Anat Pathol.
2005 Mar;12(2):81-91.
18. Calhoun ES, Jones JB, Ashfaq R, et al. BRAF and FBXW7 (CDC4, FBW7, AGO, SEL10) mutations
in distinct subsets of pancreatic cancer: potential therapeutic targets. Am J Pathol. 2003
Oct;163(4):1255-60.
19. Cheng JQ, Ruggeri B, Klein WM, et al. Amplification of AKT2 in human pancreatic cells and
inhibition of AKT2 expression and tumorigenicity by antisense RNA. Proc Natl Acad Sci U S A.
1996 Apr 16;93(8):3636-41.
20. Morris JP 4th, Wang SC, Hebrok M. KRAS, Hedgehog, Wnt and the twisted developmental biology
of pancreatic ductal adenocarcinoma. Nat Rev Cancer. 2010 Oct;10(10):683-95.
21. Thayer SP, di Magliano MP, Heiser PW, et al. Hedgehog is an early and late mediator of pancreatic
cancer tumorigenesis. Nature. 2003 Oct 23;425(6960):851-6.
22. Satoh K, Kanno A, Hamada S, et al. Expression of Sonic hedgehog signaling pathway correlates with
the tumorigenesis of intraductal papillary mucinous neoplasm of the pancreas. Oncol Rep. 2008
May;19(5):1185-90.
23. Morton JP, Mongeau ME, Klimstra DS, et al. Sonic hedgehog acts at multiple stages during
pancreatic tumorigenesis. Proc Natl Acad Sci U S A. 2007 Mar 20;104(12):5103-8.
24. Dai J, Ai K, Du Y, et al. Sonic hedgehog expression correlates with distant metastasis in pancreatic
adenocarcinoma. Pancreas. 2011 Mar;40(2):233-6.
25. Feldmann G, Karikari C, dal Molin M, et al. Inactivation of Brca2 cooperates with Trp53(R172H) to
induce invasive pancreatic ductal adenocarcinomas in mice: a mouse model of familial pancreatic
cancer. Cancer Biol Ther. 2011 Jun 1;11(11):959-68.
26. Maitra A, Hruban RH. Pancreatic cancer. Annu Rev Pathol. 2008;3:157-88.
27. Redston MS, Caldas C, Seymour AB, et al. p53 mutations in pancreatic carcinoma and evidence of
common involvement of homocopolymer tracts in DNA microdeletions. Cancer Res. 1994;54:3025–
33.
28. Iacobuzio-Donahue CA, Klimstra DS, et al. Dpc-4 protein is expressed in virtually all human
intraductal papillary mucinous neoplasms of the pancreas: comparison with conventional ductal
carcinomas. Am J Pathol. 2000;157(3):755–761.
29. Blackford A, Serrano OK, Wolfgang CL, et al. SMAD4 gene mutations are associated with poor
prognosis in pancreatic cancer. Clin Cancer Res. 2009 Jul 15;15(14):4674-9.
135
30. van Heek NT, Meeker AK, Kern SE, et al. Telomere shortening is nearly universal in pancreatic
intraepithelial neoplasia. Am J Pathol. 2002;161:1541–47.
31. Siveke JT, Schmid RM. Chromosomal instability in mouse metastatic pancreatic cancer--it's Kras
and Tp53 after all. Cancer Cell. 2005 May;7(5):405-7.
32. Hiyama E, Kodama T, Shinbara K, et al. Telomerase activity is detected in pancreatic cancer but not
in benign tumors. Cancer Res. 1997 Jan 15;57(2):326-31.
33. Sato N, Goggins M. The role of epigenetic alterations in pancreatic cancer. J Hepatobiliary Pancreat
Surg. 2006;13:286–95.
34. Sato N, Maitra A, Fukushima N, et al. Frequent hypomethylation of multiple genes overexpressed in
pancreatic ductal adenocarcinoma. Cancer Res. 2003;63:4158–66.
35. Szafranska AE, Davison TS, John J, et al. MicroRNA expression alterations are linked to
tumorigenesis and non-neoplastic processes in pancreatic ductal adenocarcinoma. Oncogene
2007;26:4442–52.
36. Erkan M, Reiser-Erkan C, Michalski CW, et al. Tumor microenvironment and progression of
pancreatic cancer. Exp Oncol. 2010 Sep;32(3):128-31.
37. Jones S, Zhang X, Parsons DW, et al. Core signaling pathways in human pancreatic cancers revealed
by global genomic analyses. Science. 2008 Sep 26;321(5897):1801-6.
38. Campbell PJ, Yachida S, Mudie LJ, et al. The patterns and dynamics of genomic instability in
metastatic pancreatic cancer. Nature. 2010 Oct 28;467(7319):1109-13.
39. Fuchs CS, Colditz GA, Stampfer MJ, et al. A prospective study of cigarette smoking and the risk of
pancreatic cancer. Arch Intern Med. 1996 Oct 28;156(19):2255-60.
40. Genkinger JM, Spiegelman D, Anderson KE, et al. Alcohol intake and pancreatic cancer risk: a
pooled analysis of fourteen cohort studies. Cancer Epidemiol Biomarkers Prev. 2009 Mar;18(3):765-
76.
41. Santibañez M, Vioque J, Alguacil J, et al. Occupational exposures and risk of pancreatic cancer. Eur J
Epidemiol. 2010 Oct;25(10):721-30.
42. Huxley R, Ansary-Moghaddam A, Berrington de González A, et al. Type-II diabetes and pancreatic
cancer: a meta-analysis of 36 studies. Br J Cancer. 2005;92: 2076–2083.
43. Risch HA, Yu H, Lu L, Kidd MS. ABO blood group, Helicobacter pylori seropositivity, and risk of
pancreatic cancer: a case-control study. J Natl Cancer Inst. 2010 Apr 7;102(7):502-5.
44. Talamini G, Falconi M, Bassi C, et al. Incidence of cancer in the course of chronic pancreatitis. Am J
Gastroenterol. 1999 May;94(5):1253-60.
45. Eppel A, Cotterchio M, Gallinger S. Allergies are associated with reduced pancreas cancer risk: A
population-based case-control study in Ontario, Canada. Int J Cancer. 2007 Nov 15;121(10):2241-5.
136
46. Bao Y, Ng K, Wolpin BM, et al. Predicted vitamin D status and pancreatic cancer risk in two
prospective cohort studies. Br J Cancer. 2010 Apr 27;102(9):1422-7.
47. Stolzenberg-Solomon RZ, Jacobs EJ, Arslan AA, et al. Circulating 25-hydroxyvitamin D and risk of
pancreatic cancer: Cohort Consortium Vitamin D Pooling Project of Rarer Cancers. Am J Epidemiol.
2010 Jul 1;172(1):81-93.
48. Jansen RJ, Robinson DP, Stolzenberg-Solomon RZ, et al. Fruit and vegetable consumption is
inversely associated with having pancreatic cancer. Cancer Causes Control. 2011 Dec;22(12):1613-
25.
49. Jiao L, Mitrou PN, Reedy J, et al. A combined healthy lifestyle score and risk of pancreatic cancer in
a large cohort study. Arch Intern Med. 2009 Apr 27;169(8):764-70.
50. Prizment AE, Gross M, Rasmussen-Torvik L, et al. Genes related to diabetes may be associated with
pancreatic cancer in a population-based case-control study in Minnesota. Pancreas. 2012
Jan;41(1):50-3.
51. Dong X, Li Y, Tang H, et al. Insulin-like growth factor axis gene polymorphisms modify risk of
pancreatic cancer. Cancer Epidemiol. 2012 Apr;36(2):206-11.
52. Li D, Tanaka M, Brunicardi FC, et al. Association between somatostatin receptor 5 gene
polymorphisms and pancreatic cancer risk and survival. Cancer. 2011 Jul 1;117(13):2863-72.
53. Dong X, Li Y, Chang P, et al. DNA mismatch repair network gene polymorphism as a susceptibility
factor for pancreatic cancer. Mol Carcinog. 2011 Jun 16. doi: 10.1002/mc.20817.
54. Pierce BL, Ahsan H. Genome-wide "pleiotropy scan" identifies HNF1A region as a novel pancreatic
cancer susceptibility locus. Cancer Res. 2011 Jul 1;71(13):4352-8.
55. Theodoropoulos GE, Panoussopoulos GS, Michalopoulos NV, et al. Analysis of the stromal cell-
derived factor 1-3'A gene polymorphism in pancreatic cancer. Mol Med Report. 2010 Jul-
Aug;3(4):693-8.
56. Pierce BL, Austin MA, Ahsan H. Association study of type 2 diabetes genetic susceptibility variants
and risk of pancreatic cancer: an analysis of PanScan-I data. Cancer Causes Control. 2011
Jun;22(6):877-83.
57. Mazaki T, Masuda H, Takayama T. Polymorphisms and pancreatic cancer risk: a meta-analysis. Eur
J Cancer Prev. 2011 May;20(3):169-83.
58. Dong X, Li Y, Chang P, et al. Glucose metabolism gene variants modulate the risk of pancreatic
cancer. Cancer Prev Res (Phila). 2011 May;4(5):758-66.
59. Diergaarde B, Brand R, Lamb J, et al. Pooling-based genome-wide association study implicates
gamma-glutamyltransferase 1 (GGT1) gene in pancreatic carcinogenesis. Pancreatology. 2010;10(2-
3):194-200.
137
60. Theodoropoulos GE, Michalopoulos NV, Panoussopoulos SG, et al. Effects of caspase-9 and survivin
gene polymorphisms in pancreatic cancer risk and tumor characteristics. Pancreas. 2010
Oct;39(7):976-80.
61. Fong PY, Fesinmeyer MD, White E, et al. Association of diabetes susceptibility gene calpain-10 with
pancreatic cancer among smokers. J Gastrointest Cancer. 2010 Sep;41(3):203-8.
62. Chen J, Amos CI, Merriman KW, et al. Genetic variants of p21 and p27 and pancreatic cancer risk in
non-Hispanic Whites: a case-control study. Pancreas. 2010 Jan;39(1):1-4.
63. Vrana D, Novotny J, Holcatova I, et al. CYP1B1 gene polymorphism modifies pancreatic cancer risk
but not survival. Neoplasma. 2010;57(1):15-9.
64. McWilliams RR, Petersen GM, Rabe KG, et al. Cystic fibrosis transmembrane conductance regulator
(CFTR) gene mutations and risk for pancreatic adenocarcinoma. Cancer. 2010 Jan 1;116(1):203-9.
65. Vrana D, Pikhart H, Mohelnikova-Duchonova B, et al. The association between glutathione S-
transferase gene polymorphisms and pancreatic cancer in a central European Slavonic population.
Mutat Res. 2009 Nov-Dec;680(1-2):78-81.
66. Duell EJ, Holly EA, Kelsey KT, et al. Genetic variation in CYP17A1 and pancreatic cancer in a
population-based case-control study in the San Francisco Bay Area, California. Int J Cancer. 2010
Feb 1;126(3):790-5.
67. Fesinmeyer MD, Stanford JL, Brentnall TA, et al. Association between the peroxisome proliferator-
activated receptor gamma Pro12Ala variant and haplotype and pancreatic cancer in a high-risk cohort
of smokers: a pilot study. Pancreas. 2009 Aug;38(6):631-7.
68. Zhao D, Xu D, Zhang X, et al. Interaction of cyclooxygenase-2 variants and smoking in pancreatic
cancer: a possible role of nucleophosmin. Gastroenterology. 2009 May;136(5):1659-68.
69. McWilliams RR, Bamlet WR, de Andrade M, et al. Nucleotide excision repair pathway
polymorphisms and pancreatic cancer risk: evidence for role of MMS19L. Cancer Epidemiol
Biomarkers Prev. 2009 Apr;18(4):1295-302.
70. Hamacher R, Diersch S, Scheibel M, et al. Interleukin 1 beta gene promoter SNPs are associated with
risk of pancreatic cancer. Cytokine. 2009 May;46(2):182-6.
71. Li D, Suzuki H, Liu B, et al. DNA repair gene polymorphisms and risk of pancreatic cancer. Clin
Cancer Res. 2009 Jan 15;15(2):740-6.
72. Suzuki H, Li Y, Dong X, et al. Effect of insulin-like growth factor gene polymorphisms alone or in
interaction with diabetes on the risk of pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2008
Dec;17(12):3467-73.
73. Suzuki T, Matsuo K, Sawaki A, et al. Alcohol drinking and one-carbon metabolism-related gene
polymorphisms on pancreatic cancer risk. Cancer Epidemiol Biomarkers Prev. 2008
Oct;17(10):2742-7.
138
74. Ohnami S, Sato Y, Yoshimura K, et al. His595Tyr polymorphism in the methionine synthase
reductase (MTRR) gene is associated with pancreatic cancer risk. Gastroenterology. 2008
Aug;135(2):477-88.
75. Yang M, Sun T, Wang L, et al. Functional variants in cell death pathway genes and risk of pancreatic
cancer. Clin Cancer Res. 2008 May 15;14(10):3230-6.
76. Ayaz L, Ercan B, Dirlik M, et al. The association between N-acetyltransferase 2 gene polymorphisms
and pancreatic cancer. Cell Biochem Funct. 2008 Apr;26(3):329-33.
77. Jiao L, Hassan MM, Bondy ML, et al. XRCC2 and XRCC3 gene polymorphism and risk of
pancreatic cancer. Am J Gastroenterol. 2008 Feb;103(2):360-7.
78. Jiao L, Hassan MM, Bondy ML, et al. The XPD Asp312Asn and Lys751Gln polymorphisms,
corresponding haplotype, and pancreatic cancer risk. Cancer Lett. 2007 Jan 8;245(1-2):61-8.
79. Wang L, Miao X, Tan W, et al. Genetic polymorphisms in methylenetetrahydrofolate reductase and
thymidylate synthase and risk of pancreatic cancer. Clin Gastroenterol Hepatol. 2005 Aug;3(8):743-
51.
80. Li D, Jiao L, Li Y, et al. Polymorphisms of cytochrome P4501A2 and N-acetyltransferase genes,
smoking, and risk of pancreatic cancer. Carcinogenesis. 2006 Jan;27(1):103-11.
81. Bartsch DK, Fendrich V, Slater EP, et al. RNASEL germline variants are associated with pancreatic
cancer. Int J Cancer. 2005 Dec 10;117(5):718-22.
82. Ockenga J, Vogel A, Teich N, et al. UDP glucuronosyltransferase (UGT1A7) gene polymorphisms
increase the risk of chronic pancreatitis and pancreatic cancer. Gastroenterology. 2003
Jun;124(7):1802-8.
83. Duell EJ, Holly EA, Bracci PM, et al. A population-based study of the Arg399Gln polymorphism in
X-ray repair cross- complementing group 1 (XRCC1) and risk of pancreatic adenocarcinoma. Cancer
Res. 2002 Aug 15;62(16):4630-6.
84. Amundadottir L, Kraft P, Stolzenberg-Solomon RZ, et al. Genome-wide association study identifies
variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet. 2009
Sep;41(9):986-90.
85. Petersen GM, Amundadottir L, Fuchs CS, et al. A genome-wide association study identifies
pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat Genet. 2010
Mar;42(3):224-8.
86. Low SK, Kuchiba A, Zembutsu H, et al. Genome-wide association study of pancreatic cancer in
Japanese population. PLoS One. 2010 Jul 29;5(7):e11824.
87. Wu C, Miao X, Huang L, et al. Genome-wide association study identifies five loci associated with
susceptibility to pancreatic cancer in Chinese populations. Nat Genet. 2011 Dec 11;44(1):62-6.
139
88. Wolpin BM, Kraft P, Gross M, et al. Pancreatic cancer risk and ABO blood group alleles: results
from the pancreatic cancer cohort consortium. Cancer Res. 2010 Feb 1;70(3):1015-23.
89. Risch HA, Yu H, Lu L, et al. ABO blood group, Helicobacter pylori seropositivity, and risk of
pancreatic cancer: a case-control study. J Natl Cancer Inst. 2010 Apr 7;102(7):502-5.
90. Greer JB, Yazer MH, Raval JS, et al. Significant association between ABO blood group and
pancreatic cancer. World J Gastroenterol. 2010 Nov 28;16(44):5588-91.
91. Iodice S, Maisonneuve P, Botteri E, et al. ABO blood group and cancer. Eur J Cancer. 2010
Dec;46(18):3345-50.
92. Wolpin BM, Kraft P, Xu M, et al. Variant ABO blood group alleles, secretor status, and risk of
pancreatic cancer: results from the pancreatic cancer cohort consortium. Cancer Epidemiol
Biomarkers Prev. 2010 Dec;19(12):3140-9.
93. Ben Q, Wang K, Yuan Y, et al. Pancreatic cancer incidence and outcome in relation to ABO blood
groups among Han Chinese patients: a case-control study. Int J Cancer. 2011 Mar 1;128(5):1179-86.
94. Nakao M, Matsuo K, Hosono S, et al. ABO blood group alleles and the risk of pancreatic cancer in a
Japanese population. Cancer Sci. 2011 May;102(5):1076-80.
95. Wang DS, Chen DL, Ren C, et al. ABO blood group, hepatitis B viral infection and risk of pancreatic
cancer. Int J Cancer. 2011 Aug 19. doi: 10.1002/ijc.26376. [Epub ahead of print]
96. Aird I, Lee DR, Roberts JA. ABO blood groups and cancer of oesophagus, cancerof pancreas, and
pituitary adenoma. Br Med J. 1960 Apr 16;1(5180):1163-6.
97. Lennon AM, Klein AP, Goggins M. ABO blood group and other genetic variants associated with
pancreatic cancer. Genome Med. 2010 Jun 22;2(6):39.
98. Giardiello FM, Welsh SB, Hamilton SR, et al. Increased risk of cancer in the Peutz-Jeghers
syndrome. N Engl J Med. 1987 Jun 11;316(24):1511-4.
99. Giardiello FM, Brensinger JD, Tersmette AC, et al. Very high risk of cancer in familial Peutz-Jeghers
syndrome. Gastroenterology. 2000 Dec;119(6):1447-53.
100. Lowenfels AB, Maisonneuve P, Cavallini G, et al. Pancreatitis and the risk of pancreatic cancer.
International Pancreatitis Study Group. N Engl J Med. 1993 May 20;328(20):1433-7.
101. Lowenfels AB, Maisonneuve P, DiMagno EP, et al. Hereditary pancreatitis and the risk of
pancreatic cancer. International Hereditary Pancreatitis Study Group. J Natl Cancer Inst. 1997 Mar
19;89(6):442-6.
102. de Snoo FA, Riedijk SR, van Mil AM, et al. Genetic testing in familial melanoma: uptake and
implications. Psychooncology. 2008 Aug;17(8):790-6.
103. Hahn SA, Greenhalf B, Ellis I, et al. BRCA2 germline mutations in familial pancreatic
carcinoma. J Natl Cancer Inst. 2003 Feb 5;95(3):214-21.
140
104. Murphy KM, Brune KA, Griffin C, et al. Evaluation of candidate genes MAP2K4, MADH4,
ACVR1B, and BRCA2 in familial pancreatic cancer: deleterious BRCA2 mutations in 17%. Cancer
Res. 2002 Jul 1;62(13):3789-93.
105. Martin ST, Matsubayashi H, Rogers CD, et al. Increased prevalence of the BRCA2 polymorphic
stop codon K3326X among individuals with familial pancreatic cancer. Oncogene. 2005 May
19;24(22):3652-6.
106. Stadler ZK, Salo-Mullen E, Patil SM, et al. Prevalence of BRCA1 and BRCA2 mutations in
Ashkenazi Jewish families with breast and pancreatic cancer. Cancer. 2012 Jan 15;118(2):493-9.
107. Ghiorzo P, Pensotti V, Fornarini G, et al. Contribution of germline mutations in the BRCA and
PALB2 genes to pancreatic cancer in Italy. Fam Cancer. 2012 Mar;11(1):41-47.
108. Schneider R, Slater EP, Sina M, et al. German national case collection for familial pancreatic
cancer (FaPaCa): ten years experience. Fam Cancer. 2011 Jun;10(2):323-30.
109. Slater EP, Langer P, Fendrich V, et al. Prevalence of BRCA2 and CDKN2a mutations in German
familial pancreatic cancer families. Fam Cancer. 2010 Sep;9(3):335-43.
110. Cho JH, Bang S, Park SW, et al. BRCA2 mutations as a universal risk factor for pancreatic
cancer has a limited role in Korean ethnic group. Pancreas. 2008 May;36(4):337-40.
111. Real FX, Malats N, Lesca G, et al. Family history of cancer and germline BRCA2 mutations in
sporadic exocrine pancreatic cancer. Gut. 2002 May;50(5):653-7.
112. Greer JB, Whitcomb DC. Role of BRCA1 and BRCA2 mutations in pancreatic cancer. Gut. 2007
May;56(5):601-5.
113. Goggins M, Schutte M, Lu J, et al. Germline BRCA2 gene mutations in patients with apparently
sporadic pancreatic carcinomas. Cancer Res. 1996 Dec 1;56(23):5360-4.
114. Wooster R, Neuhausen SL, Mangion J, et al. Localization of a breast cancer susceptibility gene,
BRCA2, to chromosome 13q12-13. Science. 1994 Sep 30;265(5181):2088-90.
115. Schutte M, da Costa LT, Hahn SA, et al. Identification by representational difference analysis of
a homozygous deletion in pancreatic carcinoma that lies within the BRCA2 region. Proc Natl Acad
Sci U S A. 1995 Jun 20;92(13):5950-4.
116. Schutte M, Rozenblum E, Moskaluk CA, et al. An integrated high-resolution physical map of the
DPC/BRCA2 region at chromosome 13q12. Cancer Res. 1995 Oct 15;55(20):4570-4.
117. Jones S, Hruban RH, Kamiyama M, et al. Exomic sequencing identifies PALB2 as a pancreatic
cancer susceptibility gene. Science. 2009 Apr 10;324(5924):217.
118. Tischkowitz MD, Sabbaghian N, Hamel N, et al. Analysis of the gene coding for the BRCA2-
interacting protein PALB2 in familial and sporadic pancreatic cancer. Gastroenterology. 2009
Sep;137(3):1183-6.
141
119. Slater EP, Langer P, Niemczyk E, et al. PALB2 mutations in European familial pancreatic cancer
families. Clin Genet. 2010 Nov;78(5):490-4.
120. Adank MA, van Mil SE, Gille JJ, et al. PALB2 analysis in BRCA2-like families. Breast Cancer
Res Treat. 2011 Jun;127(2):357-62.
121. Lal G, Liu G, Schmocker B, et al. Inherited predisposition to pancreatic adenocarcinoma: role of
family history and germ-line p16, BRCA1, and BRCA2 mutations. Cancer Res. 2000 Jan
15;60(2):409-16.
122. Skudra S, Staka A, Pukitis A, et al. Association of genetic variants with pancreatic cancer.
Cancer Genet Cytogenet 2007;179:76-8.
123. Axilbund JE, Argani P, Kamiyama M, et al. Absence of germline BRCA1 mutations in familial
pancreatic cancer patients. Cancer Biol Ther. 2009 Jan;8(2):131-5.
124. Roberts NJ, Jiao Y, Yu J, et al. ATM mutations in patients with hereditary pancreatic cancer.
Cancer Discov. 2012 Jan;2:41-46.
125. van der Heijden MS, Yeo CJ, Hruban RH, et al. Fanconi anemia gene mutations in young-onset
pancreatic cancer. Cancer Res. 2003 May 15;63(10):2585-8.
126. Rogers CD, van der Heijden MS, Brune K, et al. The genetics of FANCC and FANCG in
familial pancreatic cancer. Cancer Biol Ther. 2004 Feb;3(2):167-9.
127. Rogers CD, Couch FJ, Brune K, et al. Genetics of the FANCA gene in familial pancreatic cancer.
J Med Genet. 2004 Dec;41(12):e126.
128. Couch FJ, Johnson MR, Rabe K, et al. Germ line Fanconi anemia complementation group C
mutations and pancreatic cancer. Cancer Res. 2005 Jan 15;65(2):383-6.
129. Gargiulo S, Torrini M, Ollila S, et al. Germline MLH1 and MSH2 mutations in Italian pancreatic
cancer patients with suspected Lynch syndrome. Fam Cancer. 2009;8(4):547-53.
130. Kastrinos F, Mukherjee B, Tayob N, et al. Risk of pancreatic cancer in families with Lynch
syndrome. JAMA. 2009 Oct 28;302(16):1790-5.
131. Kempers MJ, Kuiper RP, Ockeloen CW, et al. Risk of colorectal and endometrial cancers in
EPCAM deletion-positive Lynch syndrome: a cohort study. Lancet Oncol. 2011 Jan;12(1):49-55.
132. Lindor NM, Petersen GM, Spurdle AB, et al. Pancreatic cancer and a novel MSH2 germline
alteration. Pancreas. 2011 Oct;40(7):1138-40.
133. Ruijs MW, Verhoef S, Rookus MA, et al. TP53 germline mutation testing in 180 families
suspected of Li-Fraumeni syndrome: mutation detection rate and relative frequency of cancers in
different familial phenotypes. J Med Genet. 2010 Jun;47(6):421-8.
134. Groen EJ, Roos A, Muntinghe FL, et al. Extra-intestinal manifestations of familial adenomatous
polyposis. Ann Surg Oncol. 2008 Sep;15(9):2439-50.
142
135. Sheldon CD, Hodson ME, Carpenter LM, et al. A cohort study of cystic fibrosis and malignancy.
Br J Cancer. 1993 Nov;68(5):1025-8.
136. Hruban RH, Canto MI, Goggins M, et al. Update on familial pancreatic cancer. Adv Surg.
2010;44:293-311.
137. MacDermott RP, Kramer P. Adenocarcinoma of the pancreas in four siblings. Gastroenterology.
1973 Jul;65(1):137-9.
138. Friedman JM, Fialkow PJ. Carcinoma of the pancreas in four brothers. Birth Defects Orig Artic
Ser. 1976;12(1):145-50.
139. Danes BS, Lynch HT. A familial aggregation of pancreatic cancer. An in vitro study. JAMA.
1982 May 28;247(20):2798-802.
140. Dat NM, Sontag SJ. Pancreatic carcinoma in brothers. Ann Intern Med. 1982 Aug;97(2):282.
141. Grajower MM. Familial pancreatic cancer. Ann Intern Med. 1983 Jan;98(1):111.
142. Ehrenthal D, Haeger L, Griffin T, et al. Familial pancreatic adenocarcinoma in three generations.
A case report and a review of the literature. Cancer. 1987 May 1;59(9):1661-4.
143. Lynch HT, Fitzsimmons ML, Smyrk TC, et al. Familial pancreatic cancer: clinicopathologic
study of 18 nuclear families. Am J Gastroenterol. 1990 Jan;85(1):54-60.
144. Ghadirian P, Boyle P, Simard A, et al. Reported family aggregation of pancreatic cancer within a
population-based case-control study in the Francophone community in Montreal, Canada. Int J
Pancreatol. 1991 Nov-Dec;10(3-4):183-96.
145. Fernandez E, La Vecchia C, D'Avanzo B, et al. Family history and the risk of liver, gallbladder,
and pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 1994 Apr-May;3(3):209-12.
146. Silverman DT, Schiffman M, Everhart J, et al. Diabetes mellitus, other medical conditions and
familial history of cancer as risk factors for pancreatic cancer. Br J Cancer. 1999 Aug;80(11):1830-7.
147. Schenk M, Schwartz AG, O'Neal E, et al. Familial risk of pancreatic cancer. J Natl Cancer Inst.
2001 Apr 18;93(8):640-4.
148. Ghadirian P, Liu G, Gallinger S, et al. Risk of pancreatic cancer among individuals with a family
history of cancer of the pancreas. Int J Cancer. 2002 Feb 20;97(6):807-10.
149. Inoue M, Tajima K, Takezaki T, et al. Epidemiology of pancreatic cancer in Japan: a nested case-
control study from the Hospital-based Epidemiologic Research Program at Aichi Cancer Center
(HERPACC). Int J Epidemiol. 2003 Apr;32(2):257-62.
150. Rulyak SJ, Lowenfels AB, Maisonneuve P, et al. Risk factors for the development of pancreatic
cancer in familial pancreatic cancer kindreds. Gastroenterology. 2003 May;124(5):1292-9.
151. Cote ML, Schenk M, Schwartz AG, et al. Risk of other cancers in individuals with a family
history of pancreas cancer. J Gastrointest Cancer. 2007;38(2-4):119-26.
143
152. Hassan MM, Bondy ML, Wolff RA, et al. Risk factors for pancreatic cancer: case-control study.
Am J Gastroenterol. 2007 Dec;102(12):2696-707.
153. Jacobs EJ, Chanock SJ, Fuchs CS, et al. Family history of cancer and risk of pancreatic cancer: a
pooled analysis from the Pancreatic Cancer Cohort Consortium (PanScan). Int J Cancer. 2010 Sep
1;127(6):1421-8.
154. Matsubayashi H, Maeda A, Kanemoto H, et al. Risk factors of familial pancreatic cancer in
Japan: current smoking and recent onset of diabetes. Pancreas. 2011 Aug;40(6):974-8.
155. Coughlin SS, Calle EE, Patel AV, et al. Predictors of pancreatic cancer mortality among a large
cohort of United States adults. Cancer Causes Control. 2000 Dec;11(10):915-23.
156. Tersmette AC, Petersen GM, Offerhaus GJ, et al. Increased risk of incident pancreatic cancer
among first-degree relatives of patients with familial pancreatic cancer. Clin Cancer Res. 2001
Mar;7(3):738-44.
157. Hemminki K, Li X. Familial and second primary pancreatic cancers: a nationwide epidemiologic
study from Sweden. Int J Cancer. 2003 Feb 10;103(4):525-30.
158. Klein AP, Brune KA, Petersen GM, et al. Prospective risk of pancreatic cancer in familial
pancreatic cancer kindreds. Cancer Res. 2004 Apr 1;64(7):2634-8.
159. Jacobs EJ, Rodriguez C, Newton CC, et al. Family history of various cancers and pancreatic
cancer mortality in a large cohort. Cancer Causes Control. 2009 Oct;20(8):1261-9.
160. Brune KA, Lau B, Palmisano E, et al. Importance of age of onset in pancreatic cancer kindreds. J
Natl Cancer Inst. 2010 Jan 20;102(2):119-26.
161. Klein AP, Beaty TH, Bailey-Wilson JE, et al. Evidence for a major gene influencing risk of
pancreatic cancer. Genet Epidemiol. 2002 Aug;23(2):133-49.
162. Lynch HT, Fusaro L, Lynch JF. Familial pancreatic cancer: a family study. Pancreas.
1992;7(5):511-5.
163. Bartsch DK, Kress R, Sina-Frey M, et al. Prevalence of familial pancreatic cancer in Germany.
Int J Cancer. 2004 Jul 20;110(6):902-6.
164. James TA, Sheldon DG, Rajput A, et al. Risk factors associated with earlier age of onset in
familial pancreatic carcinoma. Cancer. 2004 Dec 15;101(12):2722-6.
165. Petersen GM, de Andrade M, Goggins M, et al. Pancreatic cancer genetic epidemiology
consortium. Cancer Epidemiol Biomarkers Prev. 2006 Apr;15(4):704-10.
166. McFaul CD, Greenhalf W, Earl J, et al. Anticipation in familial pancreatic cancer. Gut. 2006
Feb;55(2):252-8.
167. Rieder H, Sina-Frey M, Ziegler A, et al. German national case collection of familial pancreatic
cancer - clinical-genetic analysis of the first 21 families. Onkologie. 2002 Jun;25(3):262-6.
144
168. Rulyak SJ, Lowenfels AB, Maisonneuve P, et al. Risk factors for the development of pancreatic
cancer in familial pancreatic cancer kindreds. Gastroenterology. 2003 May;124(5):1292-9.
169. Schneider R, Slater EP, Sina M, et al. German national case collection for familial pancreatic
cancer (FaPaCa): ten years experience. Fam Cancer. 2011 Jun;10(2):323-30.
170. Olson SH, Chou JF, Ludwig E, et al. Allergies, obesity, other risk factors and survival from
pancreatic cancer. Int J Cancer. 2010 Nov 15;127(10):2412-9.
171. Barton JG, Schnelldorfer T, Lohse CM, et al. Patterns of pancreatic resection differ between
patients with familial and sporadic pancreatic cancer. J Gastrointest Surg. 2011 May;15(5):836-42.
172. Ji J, Forsti A, Sundquist J, et al. Survival in familial pancreatic cancer. Pancreatology.
2008;8(3):252-6.
173. Yeo TP, Hruban RH, Brody J, et al. Assessment of "gene-environment" interaction in cases of
familial and sporadic pancreatic cancer. J Gastrointest Surg. 2009 Aug;13(8):1487-94.
174. Fogelman DR, Wolff RA, Kopetz S, et al. Evidence for the efficacy of Iniparib, a PARP-1
inhibitor, in BRCA2-associated pancreatic cancer. Anticancer Res. 2011 Apr;31(4):1417-20.
175. Villarroel MC, Rajeshkumar NV, Garrido-Laguna I, et al. Personalizing cancer treatment in the
age of global genomic analyses: PALB2 gene mutations and the response to DNA damaging agents in
pancreatic cancer. Mol Cancer Ther. 2011 Jan;10(1):3-8.
176. James E, Waldron-Lynch MG, Saif MW. Prolonged survival in a patient with BRCA2 associated
metastatic pancreatic cancer after exposure to camptothecin: a case report and review of literature.
Anticancer Drugs. 2009 Aug;20(7):634-8.
177. Sonnenblick A, Kadouri L, Appelbaum L, et al. Complete remission, in BRCA2 mutation carrier
with metastatic pancreatic adenocarcinoma, treated with cisplatin based therapy. Cancer Biol Ther.
2011 Aug 1;12(3):165-8.
178. Lowery MA, Kelsen DP, Stadler ZK, et al. An emerging entity: pancreatic adenocarcinoma
associated with a known BRCA mutation: clinical descriptors, treatment implications, and future
directions. Oncologist. 2011;16(10):1397-402.
179. Shi C, Klein AP, Goggins M, et al. Increased Prevalence of Precursor Lesions in Familial
Pancreatic Cancer Patients. Clin Cancer Res. 2009 Dec 15;15(24):7737-7743.
180. Brune K, Abe T, Canto M, et al. Multifocal neoplastic precursor lesions associated with lobular
atrophy of the pancreas in patients having a strong family history of pancreatic cancer. Am J Surg
Pathol. 2006 Sep;30(9):1067-76.
181. Abe T, Fukushima N, Brune K, et al. Genome-wide allelotypes of familial pancreatic
adenocarcinomas and familial and sporadic intraductal papillary mucinous neoplasms. Clin Cancer
Res. 2007 Oct 15;13(20):6019-25.
145
182. Iacobuzio-Donahue CA, van der Heijden MS, Baumgartner MR, et al. Large-scale allelotype of
pancreaticobiliary carcinoma provides quantitative estimates of genome-wide allelic loss. Cancer Res.
2004 Feb 1;64(3):871-5.
183. Calhoun ES, Hucl T, Gallmeier E, et al. Identifying allelic loss and homozygous deletions in
pancreatic cancer without matched normals using high-density single-nucleotide polymorphism
arrays. Cancer Res. 2006 Aug 15;66(16):7920-8.
184. Brune K, Hong SM, Li A, et al. Genetic and epigenetic alterations of familial pancreatic cancers.
Cancer Epidemiol Biomarkers Prev. 2008 Dec;17(12):3536-42.
185. Bodmer WF, Bailey CJ, Bodmer J, et al. Localization of the gene for familial adenomatous
polyposis on chromosome 5. Nature. 1987 Aug 13-19;328(6131):614-6.
186. Hall JM, Lee MK, Newman B, et al. Linkage of early-onset familial breast cancer to
chromosome 17q21. Science. 1990 Dec 21;250(4988):1684-9.
187. Eberle MA, Pfützer R, Pogue-Geile KL, et al. A new susceptibility locus for autosomal dominant
pancreatic cancer maps to chromosome 4q32-34. Am J Hum Genet. 2002 Apr;70(4):1044-8.
188. Earl J, Yan L, Vitone LJ, et al. Evaluation of the 4q32-34 locus in European familial pancreatic
cancer. Cancer Epidemiol Biomarkers Prev. 2006 Oct;15(10):1948-55.
189. Klein AP, de Andrade M, Hruban RH, et al. Linkage analysis of chromosome 4 in families with
familial pancreatic cancer. Cancer Biol Ther. 2007 Mar;6(3):320-3.
190. Pogue-Geile KL, Chen R, Bronner MP, et al. Palladin mutation causes familial pancreatic cancer
and suggests a new cancer mechanism. PLoS Med. 2006 Dec;3(12):e516.
191. Salaria SN, Illei P, Sharma R, et al. Palladin is overexpressed in the non-neoplastic stroma of
infiltrating ductal adenocarcinomas of the pancreas, but is only rarely overexpressed in neoplastic
cells. Cancer Biol Ther. 2007 Mar;6(3):324-8.
192. Zogopoulos G, Rothenmund H, Eppel A, et al. The P239S palladin variant does not account for a
significant fraction of hereditary or early onset pancreas cancer. Hum Genet. 2007 Jun;121(5):635-7.
193. Slater E, Amrillaeva V, Fendrich V, et al. Palladin mutation causes familial pancreatic cancer:
absence in European families. PLoS Med. 2007 Apr;4(4):e164.
194. Klein AP, Borges M, Griffith M, et al. Absence of deleterious palladin mutations in patients with
familial pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2009 Apr;18(4):1328-30.
195. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev
Genet. 2011 May;12(5):363-76.
196. Morrow EM. Genomic copy number variation in disorders of cognitive development. J Am Acad
Child Adolesc Psychiatry. 2010 Nov;49(11):1091-104..
197. Sebat J, Lakshmi B, Troge J, et al. Large-scale copy number polymorphism in the human
genome. Science. 2004;305:525-528.
146
198. Iafrate AJ, Feuk L, Rivera MN, et al. Detection of large-scale variation in the human genome.
Nat Genet. 2004;36:949-51
199. Sharp AJ, Locke DP, McGrath SD, et al. Segmental duplications and copy-number variation in
the human genome. Am J Hum Genet. 2005;77:78-88.
200. Tuzun E, Sharp AJ, Bailey JA, et al. Fine-scale structural variation of the human genome. Nat
Genet. 2005;37:727-32.
201. Conrad DF, Andrews TD, Carter NP, et al. A high-resolution survey of deletion polymorphism
in the human genome. Nat Genet. 2006;38:75-81.
202. McCarroll SA, Hadnott TN, Perry GH, et al. Common deletion polymorphisms in the human
genome. Nat Genet. 2006;38:86-92.
203. Hinds DA, Kloek AP, Jen M, et al. Common deletions and SNPs are in linkage disequilibrium in
the human genome. Nat Genet. 2006;38:82-5.
204. Locke DP, Sharp AJ, McCarroll SA, et al. Linkage disequilibrium and heritability of copy-
number polymorphisms within duplicated regions of the human genome. Am J Hum Genet.
2006;79:275-90.
205. Mills RE, Luttig CT, Larkins CE, et al. An initial map of insertion and deletion (INDEL)
variation in the human genome. Genome Res. 2006;16:1182-90.
206. Redon R, Ishikawa S, Fitch KR, et al. Global variation in copy number in the human genome.
Nature. 2006;444:444-54.
207. Simon-Sanchez J, Scholz S, Fung HC, et al. Genome-wide SNP assay reveals structural genomic
variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum Mol
Genet. 2007;16:1-14.
208. Wong KK, deLeeuw RJ, Dosanjh NS, et al. A comprehensive analysis of common copy-number
variations in the human genome. Am J Hum Genet. 2007;80:91-104.
209. Levy S, Sutton G, Ng PC, et al. The diploid genome sequence of an individual human. PLoS
Biol. 2007;5:e254.
210. Korbel JO, Urban AE, Affourtit JP, et al. Paired-end mapping reveals extensive structural
variation in the human genome. Science. 2007;318:420-6.
211. Pinto D, Marshall C, Feuk L, et al. Copy-number variation in control population cohorts. Hum
Mol Genet. 2007;16 Spec No. 2:R168-73.
212. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden Markov model designed for
high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome
Res. 2007;17:1665-74.
213. Zogopoulos G, Ha KC, Naqib F, et al. Germ-line DNA copy number variation frequencies in a
large North American population. Hum Genet. 2007;122:345-53.
147
214. deSmith AJ, Tsalenko A, Sampas N, et al. Array CGH analysis of copy number variation
identifies 1284 new genes variant in healthy white males: implications for association studies of
complex diseases. Hum Mol Genet. 2007;16:2783-94.
215. Jakobsson M, Scholz SW, Scheet P, et al. Genotype, haplotype and copy-number variation in
worldwide human populations. Nature. 2008;451:998-1003.
216. Perry GH, Ben-Dor A, Tsalenko A, et al. The fine-scale and complex architecture of human
copy-number variation. Am J Hum Genet. 2008;82:685-95.
217. Takahashi N, Tsuyama N, Sasaki K, et al. Segmental copy-number variation observed in
Japanese by array-CGH. Ann Hum Genet. 2008;72:193-204.
218. Wheeler DA, Srinivasan M, Egholm M, et al. The complete genome of an individual by
massively parallel DNA sequencing. Nature. 2008;452:872-6.
219. McCarroll SA, Kuruvilla FG, Korn JM, et al. Integrated detection and population-genetic
analysis of SNPs and copy number variation. Nat Genet. 2008 Oct;40(10):1166-74.
220. Cooper GM, Zerr T, Kidd JM, et al. Systematic assessment of copy number variant detection via
genome-wide SNP genotyping. Nat Genet. 2008 Oct;40(10):1199-203.
221. Kidd JM, Cooper GM, Donahue WF, et al. Mapping and sequencing of structural variation from
eight human genomes. Nature. 2008;453:56-64.
222. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human genome
sequencing using reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9.
223. Wang J, Wang W, Li R, et al. The diploid genome sequence of an Asian individual. Nature. 2008
Nov 6;456(7218):60-5.
224. Gusev A, Lowe JK, Stoffel M, et al. Whole population, genome-wide mapping of hidden
relatedness. Genome Res. 2009 Feb;19(2):318-26.
225. Itsara A, Cooper GM, Baker C, et al. Population analysis of large copy number variants and
hotspots of human genetic disease. Am J Hum Genet. 2009 Feb;84(2):148-61.
226. Shaikh TH, Gai X, Perin JC, et al. High-resolution mapping and analysis of copy number
variations in the human genome: a data resource for clinical and research applications. Genome Res.
2009 Sep;19(9):1682-90.
227. Kim JI, Ju YS, Park H, et al. A highly annotated whole-genome sequence of a Korean individual.
Nature. 2009 Aug 20;460(7258):1011-5.
228. Ahn SM, Kim TH, Lee S, et al. The first Korean genome sequence and analysis: full genome
sequencing for a socio-ethnic group. Genome Res. 2009 Sep;19(9):1622-9.
229. Matsuzaki H, Wang PH, Hu J, et al. High resolution discovery and confirmation of copy number
variants in 90 Yoruba Nigerians. Genome Biol. 2009;10(11):R125.
148
230. McKernan KJ, Peckham HE, Costa GL, et al. Sequence and structural variation in a human
genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.
Genome Res. 2009 Sep;19(9):1527-41.
231. McElroy JP, Nelson MR, Caillier SJ, et al. Copy number variation in African Americans. BMC
Genet. 2009 Mar 24;10:15.
232. Conrad DF, Pinto D, Redon R, et al. Origins and functional impact of copy number variation in
the human genome. Nature. 2010 Apr 1;464(7289):704-12.
233. Alkan C, Kidd JM, Marques-Bonet T, et al. Personalized copy number and segmental duplication
maps using next-generation sequencing. Nat Genet. 2009 Oct;41(10):1061-7.
234. Lin CH, Lin YC, Wu JY, et al. A genome-wide survey of copy number variations in Han Chinese
residing in Taiwan. Genomics. 2009 Oct;94(4):241-6.
235. Li J, Yang T, Wang L, et al. Whole genome distribution and ethnic differentiation of copy
number variation in Caucasian and Asian populations. PLoS One. 2009 Nov 23;4(11):e7958.
236. International HapMap 3 Consortium, Altshuler DM, Gibbs RA, et al. Integrating common and
rare genetic variation in diverse human populations. Nature. 2010 Sep 2;467(7311):52-8.
237. Ju YS, Hong D, Kim S, et al. Reference-unbiased copy number variant analysis using CGH
microarrays. Nucleic Acids Res. 2010 Nov;38(20):e190.
238. Pang AW, MacDonald JR, Pinto D, et al. Towards a comprehensive structural variation map of
an individual human genome. Genome Biol. 2010;11(5):R52.
239. Park H, Kim JI, Ju YS, et al. Discovery of common Asian copy number variants using integrated
high-resolution array CGH and massively parallel DNA sequencing. Nat Genet. 2010 May;42(5):400-
5.
240. Teague B, Waterman MS, Goldstein S, et al. High-resolution human genome structure by single-
molecule analysis. Proc Natl Acad Sci U S A. 2010 Jun 15;107(24):10848-53.
241. Kidd JM, Sampas N, Antonacci F, et al. Characterization of missing human genome sequences
and copy-number polymorphic insertions. Nat Methods. 2010 May;7(5):365-71.
242. Kidd JM, Graves T, Newman TL, et al. A human genome structural variation sequencing
resource reveals insights into mutational mechanisms. Cell. 2010 Nov 24;143(5):837-47.
243. Schuster SC, Miller W, Ratan A, et al. Complete Khoisan and Bantu genomes from southern
Africa. Nature. 2010 Feb 18;463(7283):943-7.
244. Yim SH, Kim TM, Hu HJ, et al. Copy number variations in East-Asian population and their
evolutionary and functional implications. Hum Mol Genet. 2010 Mar 15;19(6):1001-8.
245. Gayán J, Galan JJ, González-Pérez A, et al. Genetic structure of the Spanish population. BMC
Genomics. 2010 May 25;11:326.
149
246. 1000 Genomes Project Consortium. A map of human genome variation from population-scale
sequencing. Nature. 2010 Oct 28;467(7319):1061-73.
247. Mills RE, Walter K, Stewart C, et al. Mapping copy number variation by population-scale
genome sequencing. Nature. 2011 Feb 3;470(7332):59-65.
248. Chen W, Hayward C, Wright AF, et al. Copy number variation across European populations.
PLoS One. 2011;6(8):e23087.
249. Moon S, Kim YJ, Hong CB, et al. Data-driven approach to detect common copy-number
variations and frequency profiles in a population-based Korean cohort. Eur J Hum Genet. 2011
Nov;19(11):1167-72.
250. Helen V. Firth, Shola M. et al. DECIPHER: Database of Chromosomal Imbalance and
Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 2009;84(4):524-533.
251. Feenstra I, Fang J, Koolen DA, et al. European Cytogeneticists Association Register of
Unbalanced Chromosome Aberrations (ECARUCA); an online database for rare chromosome
abnormalities. Eur J Med Genet. 2006 Jul-Aug;49(4):279-91.
252. Futreal PA, Coin L, Marshall M, et al. A census of human cancer genes. Nat Rev Cancer. 2004
Mar;4(3):177-83.
253. Cutts RJ, Gadaleta E, Hahn SA, et al. The Pancreatic Expression database: 2011 update. Nucleic
Acids Res. 2011 Jan;39(Database issue):D1023-8.
254. Malcolm S. Microdeletion and microduplication syndromes. Prenat Diagn. 1996
Dec;16(13):1213-9.
255. Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural
variation with next-generation sequencing. Nature Methods. 2009;6:S13–S20.
256. Riley MC, Kirkup BC Jr, Johnson JD, et al. Rapid whole genome optical mapping of
Plasmodium falciparum. Malar J. 2011 Aug 26;10:252.
257. Kim Y, Kim KS, Kounovsky KL, et al. Nanochannel confinement: DNA stretch approaching full
contour length. Lab Chip. 2011 May 21;11(10):1721-9.
258. Xu MY, Aragon AD, Mascarenas MR, et al. Dual primer emulsion PCR for next- generation
DNA sequencing. Biotechniques. 2010 May;48(5):409-12.
259. Winchester L, Yau C, Ragoussis J. Comparing CNV detection methods for SNP arrays. Brief
Funct Genomic Proteomic. 2009 Sep;8(5):353-66.
260. Gautam P, Jha P, Kumar D, et al. Spectrum of large copy number variations in 26 diverse Indian
populations: potential involvement in phenotypic diversity. Hum Genet. 2012 Jan;131(1):131-43.
261. Scherer SW, Lee C, Birney E, et al. Challenges and standards in integrating surveys of structural
variation. Nat Genet. 2007 Jul;39(7 Suppl):S7-15.
150
262. Stankiewicz P, Pursley AN, Cheung SW. Challenges in clinical interpretation of
microduplications detected by array CGH analysis. Am J Med Genet A. 2010 May;152A(5):1089-
100.
263. Hastings PJ, Lupski JR, Rosenberg SM, et al. Mechanisms of change in gene copy number. Nat
Rev Genet. 2009 Aug;10(8):551-64.
264. Lee C, Scherer SW. The clinical context of copy number variation in the human genome. Expert
Rev Mol Med. 2010 Mar 9;12:e8.
265. Schrider DR, Hahn MW. Gene copy-number polymorphism in nature. Proc Biol Sci. 2010 Nov
7;277(1698):3213-21.
266. Nguyen DQ, Webber C, Hehir-Kwa J, et al. Reduced purifying selection prevails over positive
selection in human copy number variant evolution. Genome Res. 2008 Nov;18(11):1711-23.
267. Perry GH, Dominy NJ, Claw KG, et al. Diet and the evolution of human amylase gene copy
number variation. Nat Genet. 2007 Oct;39(10):1256-60.
268. Yim SH, Kim TM, Hu HJ, et al. Copy number variations in East-Asian population and their
evolutionary and functional implications. Hum Mol Genet. 2010 Mar 15;19(6):1001-8.
269. Perry GH, Yang F, Marques-Bonet T, et al. Copy number variation and evolution in humans and
chimpanzees. Genome Res. 2008 Nov;18(11):1698-710.
270. Stranger BE, Forrest MS, Dunning M, et al. Relative impact of nucleotide and copy number
variation on gene expression phenotypes. Science. 2007 Feb 9;315(5813):848-53.
271. Schlattl A, Anders S, Waszak SM, et al. Relating CNVs to transcriptome data at fine resolution:
assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 2011
Dec;21(12):2004-13.
272. Henrichsen CN, Vinckenbosch N, Zöllner S, et al. Segmental copy number variation shapes
tissue transcriptomes.Nat Genet. 2009 Apr;41(4):424-9.
273. Guryev V, Saar K, Adamovic T, et al. Distribution and functional impact of DNA copy number
variation in the rat. Nat Genet. 2008 May;40(5):538-45.
274. Zhou J, Lemos B, Dopman EB, et al. Copy-number variation: the balance between gene dosage
and expression in Drosophila melanogaster. Genome Biol Evol. 2011;3:1014-24.
275. Nuytemans K, Meeus B, Crosiers D, et al. Relative contribution of simple mutations vs. copy
number variations in five Parkinson disease genes in the Belgian population. Hum Mutat. 2009
Jul;30(7):1054-61.
276. Walters RG, Jacquemont S, Valsesia A, et al. A new highly penetrant form of obesity due to
deletions on chromosome 16p11.2. Nature. 2010 Feb 4;463(7281):671-5.
277. Prescott NJ, Dominy KM, Kubo M, et al. Independent and population-specific association of risk
variants at the IRGM locus with Crohn's disease. Hum Mol Genet. 2010 May 1;19(9):1828-39.
151
278. Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, et al. Genome-wide
association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.
Nature. 2010 Apr 1;464(7289):713-20.
279. de Cid R, Riveira-Munoz E, Zeeuwen PL, et al. Deletion of the late cornified envelope LCE3B
and LCE3C genes as a susceptibility factor for psoriasis. Nat Genet. 2009 Feb;41(2):211-5.
280. Morris DL, Roberts AL, Witherden AS, et al. Evidence for both copy number and allelic
(NA1/NA2) risk at the FCGR3B locus in systemic lupus erythematosus. Eur J Hum Genet. 2010
Sep;18(9):1027-31.
281. Gonzalez E, Kulkarni H, Bolivar H, et al. The influence of CCL3L1 gene-containing segmental
duplications on HIV-1/AIDS susceptibility. Science. 2005 Mar 4;307(5714):1434-40.
282. O'Donovan MC, Kirov G, Owen MJ. Phenotypic variations on the theme of CNVs. Nat Genet.
2008 Dec;40(12):1392-3.
283. Itsara A, Wu H, Smith JD, et al. De novo rates and selection of large copy number variation.
Genome Res. 2010 Nov;20(11):1469-81.
284. Piotrowski A, Bruder CE, Andersson R, et al. Somatic mosaicism for copy number variation in
differentiated human tissues. Hum Mutat. 2008 Sep;29(9):1118-24.
285. Rodríguez-Santiago B, Malats N, Rothman N, et al. Mosaic uniparental disomies and
aneuploidies as large structural variants of the human genome. Am J Hum Genet. 2010 Jul
9;87(1):129-38.
286. Bruder CE, Piotrowski A, Gijsbers AA, et al. Phenotypically concordant and discordant
monozygotic twins display different DNA copy-number-variation profiles. Am J Hum Genet. 2008
Mar;82(3):763-71.
287. Sasaki H, Emi M, Iijima H, et al. Copy number loss of (src homology 2 domain containing)-
transforming protein 2 (SHC2) gene: discordant loss in monozygotic twins and frequent loss in
patients with multiple system atrophy. Mol Brain. 2011 Jun 10;4:24.
288. Pamphlett R, Morahan JM. Copy number imbalances in blood and hair in monozygotic twins
discordant for amyotrophic lateral sclerosis. J Clin Neurosci. 2011 Sep;18(9):1231-4.
289. Thompson SL, Bakhoum SF, Compton DA. Mechanisms of chromosomal instability. Curr Biol.
2010 Mar 23;20(6):R285-95.
290. Thompson SL, Compton DA. Chromosomes and cancer cells. Chromosome Res. 2011
Apr;19(3):433-44.
291. Meza-Zepeda LA, Kresse SH, Barragan-Polania AH, et al. Array comparative genomic
hybridization reveals distinct DNA copy number differences between gastrointestinal stromal tumors
and leiomyosarcomas. Cancer Res. 2006 Sep 15;66(18):8984-93.
152
292. Vollebergh MA, Lips EH, Nederlof PM, et al. An aCGH classifier derived from BRCA1-mutated
breast cancer and benefit of high-dose platinum-based chemotherapy in HER2-negative breast cancer
patients. Ann Oncol. 2011 Jul;22(7):1561-70.
293. Johansson B, Bardi G, Heim S, et al. Nonrandom chromosomal rearrangements in pancreatic
carcinomas. Cancer. 1992 Apr 1;69(7):1674-81.
294. Brat DJ, Hahn SA, Griffin CA, et al. The structural basis of molecular genetic deletions. An
integration of classical cytogenetic and molecular analyses in pancreatic adenocarcinoma. Am J
Pathol. 1997 Feb;150(2):383-91.
295. Heidenblad M, Schoenmakers EF, Jonson T, et al. Genome-wide array-based comparative
genomic hybridization reveals multiple amplification targets and novel homozygous deletions in
pancreatic carcinoma cell lines. Cancer Res. 2004 May 1;64(9):3052-9.
296. Aguirre AJ, Brennan C, Bailey G, et al. High-resolution characterization of the pancreatic
adenocarcinoma genome. Proc Natl Acad Sci U S A. 2004 Jun 15;101(24):9067-72.
297. Holzmann K, Kohlhammer H, Schwaenen C, et al. Genomic DNA-chip hybridization reveals a
higher incidence of genomic amplifications in pancreatic cancer than conventional comparative
genomic hybridization and leads to the identification of novel candidate genes. Cancer Res. 2004 Jul
1;64(13):4428-33.
298. Mahlamäki EH, Kauraniemi P, Monni O, et al. High-resolution genomic and expression profiling
reveals 105 putative amplification target genes in pancreatic cancer. Neoplasia. 2004 Sep-
Oct;6(5):432-9.
299. Bashyam MD, Bair R, Kim YH, et al. Array-based comparative genomic hybridization identifies
localized DNA amplifications and homozygous deletions in pancreatic cancer. Neoplasia. 2005
Jun;7(6):556-62.
300. Nowak NJ, Gaile D, Conroy JM, et al. Genome-wide aberrations in pancreatic adenocarcinoma.
Cancer Genet Cytogenet. 2005 Aug;161(1):36-50.
301. Loukopoulos P, Shibata T, Katoh H, et al. Genome-wide array-based comparative genomic
hybridization analysis of pancreatic adenocarcinoma: identification of genetic indicators that predict
patient outcome. Cancer Sci. 2007 Mar;98(3):392-400.
302. Harada T, Baril P, Gangeswaran R, et al. Identification of genetic alterations in pancreatic cancer
by the combined use of tissue microdissection and array-based comparative genomic hybridisation.
Br J Cancer. 2007 Jan 29;96(2):373-82.
303. Suzuki A, Shibata T, Shimada Y, et al. Identification of SMURF1 as a possible target for 7q21.3-
22.1 amplification detected in a pancreatic cancer cell line by in-house array-based comparative
genomic hybridization. Cancer Sci. 2008 May;99(5):986-94.
153
304. Kwei KA, Bashyam MD, Kao J, et al. Genomic profiling identifies GATA6 as a candidate
oncogene amplified in pancreatobiliary cancer. PLoS Genet. 2008 May 23;4(5):e1000081.
305. Harada T, Chelala C, Crnogorac-Jurcevic T, et al. Genome-wide analysis of pancreatic cancer
using microarray-based techniques. Pancreatology. 2009;9(1-2):13-24.
306. Birnbaum DJ, Adélaïde J, Mamessier E, et al. Genome profiling of pancreatic adenocarcinoma.
Genes Chromosomes Cancer. 2011 Jun;50(6):456-65.
307. Calhoun ES, Hucl T, Gallmeier E, et al. Identifying allelic loss and homozygous deletions in
pancreatic cancer without matched normals using high-density single-nucleotide polymorphism
arrays. Cancer Res. 2006 Aug 15;66(16):7920-8.
308. Harada T, Chelala C, Bhakta V, et al. Genome-wide DNA copy number analysis in pancreatic
cancer using high-density single nucleotide polymorphism arrays. Oncogene. 2008 Mar
20;27(13):1951-60.
309. Lin LJ, Asaoka Y, Tada M, et al. Integrated analysis of copy number alterations and loss of
heterozygosity in human pancreatic cancer using a high-resolution, single nucleotide polymorphism
array. Oncology. 2008;75(1-2):102-12.
310. Fu B, Luo M, Lakkur S, et al. Frequent genomic copy number gain and overexpression of GATA-
6 in pancreatic carcinoma. Cancer Biol Ther. 2008 Oct;7(10):1593-601.
311. Michils G, Tejpar S, Thoelen R, et al. Large deletions of the APC gene in 15% of mutation-
negative patients with classical polyposis (FAP): a Belgian study. Hum Mutat. 2005 Feb;25(2):125-
34.
312. Richards FM, Crossey PA, Phipps ME, et al. Detailed mapping of germline deletions of the von
Hippel-Lindau disease tumour suppressor gene. Hum Mol Genet. 1994 Apr;3(4):595-8.
313. Oliveira C, Senz J, Kaurah P, et al. Germline CDH1 deletions in hereditary diffuse gastric cancer
families. Hum Mol Genet. 2009 May 1;18(9):1545-55.
314. Palanca Suela S, Esteban Cardeñosa E, Barragán González E, et al. Identification of a novel
BRCA1 large genomic rearrangement in a Spanish breast/ovarian cancer family. Breast Cancer Res
Treat. 2008 Nov;112(1):63-7.
315. Vasickova P, Machackova E, Lukesova M, et al. High occurrence of BRCA1 intragenic
rearrangements in hereditary breast and ovarian cancer syndrome in the Czech Republic. BMC Med
Genet. 2007 Jun 11;8:32.
316. Buffone A, Capalbo C, Ricevuto E, et al. Prevalence of BRCA1 and BRCA2 genomic
rearrangements in a cohort of consecutive Italian breast and/or ovarian cancer families. Breast Cancer
Res Treat. 2007 Dec;106(2):289-96.
154
317. Smith LD, Tesoriero AA, Ramus SJ, et al. BRCA1 promoter deletions in young women with
breast cancer and a strong family history: a population-based study. Eur J Cancer. 2007
Mar;43(5):823-7.
318. Casilli F, Tournier I, Sinilnikova OM, et al. The contribution of germline rearrangements to the
spectrum of BRCA2 mutations. J Med Genet. 2006 Sep;43(9):e49.
319. Walsh T, Casadei S, Coats KH, et al. Spectrum of mutations in BRCA1, BRCA2, CHEK2, and
TP53 in families at high risk of breast cancer. JAMA. 2006 Mar 22;295(12):1379-88.
320. Gad S, Caux-Moncoutier V, Pagès-Berhouet S, et al. Significant contribution of large BRCA1
gene rearrangements in 120 French breast and ovarian cancer families. Oncogene. 2002 Oct
3;21(44):6841-7.
321. Taylor CF, Charlton RS, Burn J, et al. Genomic deletions in MSH2 or MLH1 are a frequent cause
of hereditary non-polyposis colorectal cancer: identification of novel and recurrent deletions by
MLPA. Hum Mutat. 2003 Dec;22(6):428-33.
322. Gylling A, Ridanpää M, Vierimaa O, et al. Large genomic rearrangements and germline
epimutations in Lynch syndrome. Int J Cancer. 2009 May 15;124(10):2333-40.
323. Hearle NC, Rudd MF, Lim W, et al. Exonic STK11 deletions are not a rare cause of Peutz-
Jeghers syndrome. J Med Genet. 2006 Apr;43(4):e15.
324. van Hattem WA, Brosens LA, de Leng WW, et al. Large genomic deletions of SMAD4,
BMPR1A and PTEN in juvenile polyposis. Gut. 2008 May;57(5):623-7.
325. Blanco A, de la Hoya M, Balmaña J, et al. Detection of a large rearrangement in PALB2 in
Spanish breast cancer families with male breast cancer. Breast Cancer Res Treat. 2012
Feb;132(1):307-15.
326. Sabatier R, Adélaïde J, Finetti P, et al. BARD1 homozygous deletion, a possible alternative to
BRCA1 mutation in basal breast cancer. Genes Chromosomes Cancer. 2010 Dec;49(12):1143-51.
327. Ahvenainen T, Lehtonen HJ, Lehtonen R, et al. Mutation screening of fumarate hydratase by
multiplex ligation-dependent probe amplification: detection of exonic deletion in a patient with
leiomyomatosis and renal cell cancer. Cancer Genet Cytogenet. 2008 Jun;183(2):83-8.
328. Chibon F, Primois C, Bressieux JM, et al. Contribution of PTEN large rearrangements in Cowden
disease: a multiplex amplifiable probe hybridisation (MAPH) screening approach. J Med Genet. 2008
Oct;45(10):657-65.
329. Knappskog S, Geisler J, Arnesen T, et al. A novel type of deletion in the CDKN2A gene
identified in a melanoma-prone family. Genes Chromosomes Cancer. 2006 Dec;45(12):1155-63.
330. Wu R, López-Correa C, Rutkowski JL, et al. Germline mutations in NF1 patients with
malignancies. Genes Chromosomes Cancer. 1999 Dec;26(4):376-80.
155
331. Broeks A, de Klein A, Floore AN, et al. ATM germline mutations in classical ataxia-
telangiectasia patients in the Dutch population. Hum Mutat. 1998;12(5):330-7.
332. Plummer SJ, Santibáñez-Koref M, Kurosaki T, et al. A germline 2.35 kb deletion of p53
genomic DNA creating a specific loss of the oligomerization domain inherited in a Li-Fraumeni
syndrome family. Oncogene. 1994 Nov;9(11):3273-80.
333. Otterson GA, Chen W, Coxon AB, et al. Incomplete penetrance of familial retinoblastoma linked
to germ-line mutations that result in partial loss of RB function. Proc Natl Acad Sci U S A. 1997 Oct
28;94(22):12036-40.
334. Fukuuchi A, Nagamura Y, Yaguchi H, et al. A whole MEN1 gene deletion flanked by Alu
repeats in a family with multiple endocrine neoplasia type 1. Jpn J Clin Oncol. 2006 Nov;36(11):739-
44.
335. Rumilla K, Schowalter KV, Lindor NM, et al. Frequency of deletions of EPCAM (TACSTD1) in
MSH2-associated Lynch syndrome cases. J Mol Diagn. 2011 Jan;13(1):93-9.
336. Kuiper RP, Vissers LE, Venkatachalam R, et al. Recurrence and variability of germline EPCAM
deletions in Lynch syndrome. Hum Mutat. 2011 Apr;32(4):407-14.
337. Calva-Cerqueira D, Dahdaleh FS, Woodfield G, et al. Discovery of the BMPR1A promoter and
germline mutations that cause juvenile polyposis. Hum Mol Genet. 2010 Dec 1;19(23):4654-62.
338. Nørskov MS, Frikke-Schmidt R, Bojesen SE, et al. Copy number variation in glutathione-S-
transferase T1 and M1 predicts incidence and 5-year survival from prostate and bladder cancer, and
incidence of corpus uteri cancer in the general population. Pharmacogenomics J. 2011
Aug;11(4):292-9.
339. Frank B, Bermejo JL, Hemminki K, et al. Copy number variant in the candidate tumor
suppressor gene MTUS1 and familial breast cancer risk. Carcinogenesis. 2007 Jul;28(7):1442-5.
340. Diskin SJ, Hou C, Glessner JT, et al. Copy number variation at 1q21.1 associated with
neuroblastoma. Nature. 2009 Jun 18;459(7249):987-91.
341. Liu W, Sun J, Li G, et al. Association of a germ-line copy number variation at 2p24.3 and risk
for aggressive prostate cancer. Cancer Res. 2009 Mar 15;69(6):2176-9.
342. Jin G, Sun J, Liu W, et al. Genome-wide copy-number variation analysis identifies common
genetic variants at 20p13 associated with aggressiveness of prostate cancer. Carcinogenesis. 2011
Jul;32(7):1057-62.
343. Tse KP, Su WH, Yang ML, et al. A gender-specific association of CNV at 6p21.3 with NPC
susceptibility. Hum Mol Genet. 2011 Jul 15;20(14):2889-96.
344. Huang L, Yu D, Wu C, et al. Copy number variation at 6q13 functions as a long-range regulator
and is associated with pancreatic cancer risk. Carcinogenesis. 2012 Jan;33(1):94-100.
156
345. Lucito R, Suresh S, Walter K, et al. Copy-number variants in patients with a strong family history
of pancreatic cancer. Cancer Biol Ther. 2007 Oct;6(10):1592-9.
346. Yoshihara K, Tajima A, Adachi S, et al. Germline copy number variations in BRCA1-associated
ovarian cancer patients. Genes Chromosomes Cancer. 2011 Mar;50(3):167-77.
347. Venkatachalam R, Verwiel ET, Kamping EJ, et al. Identification of candidate predisposing copy
number variants in familial and early-onset colorectal cancer patients. Int J Cancer. 2011 Oct
1;129(7):1635-42.
348. Shlien A, Tabori U, Marshall CR, et al. Excessive genomic DNA copy number variation in the
Li-Fraumeni cancer predisposition syndrome. Proc Natl Acad Sci U S A. 2008 Aug
12;105(32):11264-9.
349. Talos F, Moll UM. Role of the p53 family in stabilizing the genome and preventing
polyploidization. Adv Exp Med Biol. 2010;676:73-91.
350. McPherson JD, Marra M, Hillier L, et al. A physical map of the human genome. Nature. 2001
Feb 15;409(6822):934-41.
351. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001 Feb
16;291(5507):1304-51.
352. Sachidanandam R, Weissman D, Schmidt SC, et al. A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms. Nature. 2001 Feb 15;409(6822):928-33.
353. Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated high-density
picolitre reactors. Nature. 2005 Sep 15;437(7057):376-80.
354. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Jan;11(1):31-
46.
355. Wadman M. James Watson's genome sequenced at high speed. Nature. 2008 Apr
17;452(7189):788.
356. Kitzman JO, Mackenzie AP, Adey A, et al. Haplotype-resolved genome sequencing of a Gujarati
Indian individual. Nat Biotechnol. 2011 Jan;29(1):59-63.
357. Cirulli ET, Singh A, Shianna KV, et al. Screening the human exome: a comparison of whole
genome and whole transcriptome sequencing. Genome Biol. 2010;11(5):R57.
358. Tong P, Prendergast JG, Lohan AJ, et al. Sequencing and analysis of an Irish human genome.
Genome Biol. 2010;11(9):R91.
359. Fujimoto A, Nakagawa H, Hosono N, et al. Whole-genome sequencing and comprehensive
variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet. 2010
Nov;42(11):931-6.
360. Mardis ER. A decade's perspective on DNA sequencing technology. Nature. 2011 Feb
10;470(7333):198-203.
157
361. Kahn SD. On the future of genomic data. Science. 2011 Feb 11;331(6018):728-9.
362. McPherson JD. Next-generation gap. Nat Methods. 2009 Nov;6(11 Suppl):S2-5.
363. Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet.
2010 Oct 15;19(R2):R227-40.
364. Ashley EA, Butte AJ, Wheeler MT, et al. Clinical assessment incorporating a personal genome.
Lancet. 2010 May 1;375(9725):1525-35.
365. Roach JC, Glusman G, Smit AF, et al. Analysis of genetic inheritance in a family quartet by
whole-genome sequencing. Science. 2010 Apr 30;328(5978):636-9.
366. Lupski JR, Reid JG, Gonzaga-Jauregui C, et al. Whole-genome sequencing in a patient with
Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010 Apr 1;362(13):1181-91.
367. Sobreira NL, Cirulli ET, Avramopoulos D, et al. Whole-genome sequencing of a single proband
together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 2010 Jun
17;6(6):e1000991.
368. Bainbridge MN, Wiszniewski W, Murdock DR, et al. Whole-genome sequencing for optimized
patient management. Sci Transl Med. 2011 Jun 15;3(87):87re3.
369. Dewey FE, Chen R, Cordero SP, et al. Phased whole-genome genetic risk in a family quartet
using a major allele reference sequence. PLoS Genet. 2011 Sep;7(9):e1002280.
370. Baranzini SE, Mudge J, van Velkinburgh JC, et al. Genome, epigenome and RNA sequences of
monozygotic twins discordant for multiple sclerosis. Nature. 2010 Apr 29;464(7293):1351-6.
371. Rios J, Stein E, Shendure J, et al. Identification by whole-genome resequencing of gene defect
responsible for severe hypercholesterolemia. Hum Mol Genet. 2010 Nov 15;19(22):4313-8.
372. Hodges E, Xuan Z, Balija V, et al. Genome-wide in situ exon capture for selective resequencing.
Nat Genet. 2007 Dec;39(12):1522-7.
373. Garber K. Fixing the front end. Nat Biotechnol. 2008 Oct;26(10):1101-4.
374. Pruitt KD, Harrow J, Harte RA, et al. The consensus coding sequence (CCDS) project:
Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009
Jul;19(7):1316-23.
375. Asan, Xu Y, Jiang H, et al. Comprehensive comparison of three commercial human whole-exome
capture platforms. Genome Biol. 2011 Sep 28;12(9):R95.
376. Ng PC, Levy S, Huang J, et al. Genetic variation in an individual human exome. PLoS Genet.
2008 Aug 15;4(8):e1000160.
377. Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing of
12 human exomes. Nature. 2009 Sep 10;461(7261):272-6.
378. Vissers LE, de Ligt J, Gilissen C, J et al. A de novo paradigm for mental retardation. Nat Genet.
2010 Dec;42(12):1109-12.
158
379. Walsh T, Shahin H, Elkan-Miller T, et al. Whole exome sequencing and homozygosity mapping
identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss
DFNB82. Am J Hum Genet. 2010 Jul 9;87(1):90-4.
380. Lalonde E, Albrecht S, Ha KC, et al. Unexpected allelic heterogeneity and spectrum of mutations
in Fowler syndrome revealed by next-generation exome sequencing. Hum Mutat. 2010
Aug;31(8):918-23.
381. Pierce SB, Walsh T, Chisholm KM, et al. Mutations in the DBP-deficiency protein HSD17B4
cause ovarian dysgenesis, hearing loss, and ataxia of Perrault Syndrome. Am J Hum Genet. 2010 Aug
13;87(2):282-8.
382. Ng SB, Bigham AW, Buckingham KJ, et al. Exome sequencing identifies MLL2 mutations as a
cause of Kabuki syndrome. Nat Genet. 2010 Sep;42(9):790-3.
383. Bilgüvar K, Oztürk AK, Louvi A, et al. Whole-exome sequencing identifies recessive WDR62
mutations in severe brain malformations. Nature. 2010 Sep 9;467(7312):207-10.
384. Gilissen C, Arts HH, Hoischen A, et al. Exome sequencing identifies WDR35 variants involved
in Sensenbrenner syndrome. Am J Hum Genet. 2010 Sep 10;87(3):418-23.
385. Krawitz PM, Schweiger MR, Rödelsperger C, et al. Identity-by-descent filtering of exome
sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome. Nat
Genet. 2010 Oct;42(10):827-9.
386. Anastasio N, Ben-Omran T, Teebi A, et al. Mutations in SCARF2 are responsible for Van Den
Ende-Gupta syndrome. Am J Hum Genet. 2010 Oct 8;87(4):553-9.
387. Johnson JO, Gibbs JR, Van Maldergem L, Houlden H, Singleton AB. Exome sequencing in
Brown-Vialetto-van Laere syndrome. Am J Hum Genet. 2010 Oct 8;87(4):567-9; author reply 569-
70.
388. Sirmaci A, Walsh T, Akay H, et al. MASP1 mutations in patients with facial, umbilical,
coccygeal, and auditory findings of Carnevale, Malpuech, OSA, and Michels syndromes. Am J Hum
Genet. 2010 Nov 12;87(5):679-86.
389. Haack TB, Danhauser K, Haberberger B, et al. Exome sequencing identifies ACAD9 mutations
as a cause of complex I deficiency. Nat Genet. 2010 Dec;42(12):1131-4.
390. Wang JL, Yang X, Xia K, et al. TGM6 identified as a novel causative gene of spinocerebellar
ataxias using exome sequencing. Brain. 2010 Dec;133(Pt 12):3510-8.
391. Musunuru K, Pirruccello JP, Do R, et al. Exome sequencing, ANGPTL3 mutations, and familial
combined hypolipidemia. N Engl J Med. 2010 Dec 2;363(23):2220-7.
392. Johnson JO, Mandrioli J, Benatar M, et al. Exome sequencing reveals VCP mutations as a cause
of familial ALS. Neuron. 2010 Dec 9;68(5):857-64.
159
393. Bolze A, Byun M, McDonald D, et al. Whole-exome-sequencing-based discovery of human
FADD deficiency. Am J Hum Genet. 2010 Dec 10;87(6):873-81.
394. Liu W, Morito D, Takashima S, et al. Identification of RNF213 as a susceptibility gene for
moyamoya disease and its possible role in vascular development. PLoS One. 2011;6(7):e22542.
395. Züchner S, Dallman J, Wen R, et al. Whole-exome sequencing links a variant in DHDDS to
retinitis pigmentosa. Am J Hum Genet. 2011 Feb 11;88(2):201-6.
396. Glazov EA, Zankl A, Donskoi M, et al. Whole-exome re-sequencing in a family quartet identifies
POP1 mutations as the cause of a novel skeletal dysplasia. PLoS Genet. 2011 Mar;7(3):e1002027.
397. Worthey EA, Mayer AN, Syverson GD, et al. Making a definitive diagnosis: successful clinical
application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet
Med. 2011 Mar;13(3):255-62.
398. Simpson MA, Irving MD, Asilmaz E, et al. Mutations in NOTCH2 cause Hajdu-Cheney
syndrome, a disorder of severe and progressive bone loss. Nat Genet. 2011 Mar 6;43(4):303-5.
399. Becker J, Semler O, Gilissen C, et al. Exome sequencing identifies truncating mutations in human
SERPINF1 in autosomal-recessive osteogenesis imperfecta. Am J Hum Genet. 2011 Mar
11;88(3):362-71.
400. Ostergaard P, Simpson MA, Brice G, et al. Rapid identification of mutations in GJC2 in primary
lymphoedema using whole exome sequencing combined with linkage analysis with delineation of the
phenotype. J Med Genet. 2011 Apr;48(4):251-5.
401. Çalışkan M, Chong JX, Uricchio L, et al. Exome sequencing reveals a novel mutation for
autosomal recessive non-syndromic mental retardation in the TECR gene on chromosome 19p13.
Hum Mol Genet. 2011 Apr 1;20(7):1285-9.
402. Erlich Y, Edvardson S, Hodges E, et al. Exome sequencing and disease-network analysis of a
single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res. 2011
May;21(5):658-64.
403. Sundaram SK, Huq AM, Sun Z, et al. Exome sequencing of a pedigree with Tourette syndrome or
chronic tic disorder. Ann Neurol. 2011 May;69(5):901-4.
404. Puente XS, Quesada V, Osorio FG, et al. Exome sequencing and functional analysis identifies
BANF1 mutation as the cause of a hereditary progeroid syndrome. Am J Hum Genet. 2011 May
13;88(5):650-6.
405. Vissers LE, Lausch E, Unger S, et al. Chondrodysplasia and abnormal joint development
associated with mutations in IMPAD1, encoding the Golgi-resident nucleotide phosphatase, gPAPP.
Am J Hum Genet. 2011 May 13;88(5):608-15.
160
406. O'Sullivan J, Bitu CC, Daly SB, et al. Whole-Exome sequencing identifies FAM20A mutations as
a cause of amelogenesis imperfecta and gingival hyperplasia syndrome. Am J Hum Genet. 2011 May
13;88(5):616-20.
407. Götz A, Tyynismaa H, Euro L, et al. Exome sequencing identifies mitochondrial alanyl-tRNA
synthetase mutations in infantile mitochondrial cardiomyopathy. Am J Hum Genet. 2011 May
13;88(5):635-42.
408. Shi Y, Li Y, Zhang D, et al. Exome sequencing identifies ZNF644 mutations in high myopia.
PLoS Genet. 2011 Jun;7(6):e1002084.
409. Klein CJ, Botuyan MV, Wu Y, et al. Mutations in DNMT1 cause hereditary sensory neuropathy
with dementia and hearing loss. Nat Genet. 2011 Jun;43(6):595-600.
410. Barak T, Kwan KY, Louvi A, et al. Recessive LAMC3 mutations cause malformations of
occipital cortical development. Nat Genet. 2011 Jun;43(6):590-4.
411. O'Roak BJ, Deriziotis P, Lee C, et al. Exome sequencing in sporadic autism spectrum disorders
identifies severe de novo mutations. Nat Genet. 2011 Jun;43(6):585-9.
412. Alvarado DM, Buchan JG, Gurnett CA, et al. Exome sequencing identifies an MYH3 mutation in
a family with distal arthrogryposis type 1. J Bone Joint Surg Am. 2011 Jun 1;93(11):1045-50.
413. de Greef JC, Wang J, Balog J, et al. Mutations in ZBTB24 are associated with immunodeficiency,
centromeric instability, and facial anomalies syndrome type 2. Am J Hum Genet. 2011 Jun
10;88(6):796-804.
414. Yamaguchi T, Hosomichi K, Narita A, et al. Exome resequencing combined with linkage analysis
identifies novel PTH1R variants in primary failure of tooth eruption in Japanese. J Bone Miner Res.
2011 Jul;26(7):1655-61.
415. Zhou C, Zang D, Jin Y, et al. Mutation in ribosomal protein L21 underlies hereditary
hypotrichosis simplex. Hum Mutat. 2011 Jul;32(7):710-4.
416. Le Goff C, Mahaut C, Wang LW, et al. Mutations in the TGFβ binding-protein-like domain 5 of
FBN1 are responsible for acromicric and geleophysic dysplasias. Am J Hum Genet. 2011 Jul
15;89(1):7-14.
417. Hanson D, Murray PG, O'Sullivan J, et al. Exome sequencing identifies CCDC8 mutations in 3-
M syndrome, suggesting that CCDC8 contributes in a pathway with CUL7 and OBSL1 to control
human growth. Am J Hum Genet. 2011 Jul 15;89(1):148-53.
418. Vilariño-Güell C, Wider C, Ross OA, et al. VPS35 mutations in Parkinson disease. Am J Hum
Genet. 2011 Jul 15;89(1):162-7.
419. Zimprich A, Benet-Pagès A, Struhal W, et al. A mutation in VPS35, encoding a subunit of the
retromer complex, causes late-onset Parkinson disease. Am J Hum Genet. 2011 Jul 15;89(1):168-75.
161
420. Sergouniotis PI, Davidson AE, Mackay DS, et al. Recessive mutations in KCNJ13, encoding an
inwardly rectifying potassium channel subunit, cause leber congenital amaurosis. Am J Hum Genet.
2011 Jul 15;89(1):183-90.
421. Albers CA, Cvejic A, Favier R, et al. Exome sequencing identifies NBEAL2 as the causative
gene for gray platelet syndrome. Nat Genet. 2011 Jul 17;43(8):735-7.
422. Sanna-Cherchi S, Burgess KE, Nees SN, et al. Exome sequencing identified MYO1E and NEIL1
as candidate genes for human autosomal recessive steroid-resistant nephrotic syndrome. Kidney Int.
2011 Aug;80(4):389-96.
423. Liu L, Okada S, Kong XF, et al. Gain-of-function human STAT1 mutations impair IL-17
immunity and underlie chronic mucocutaneous candidiasis. J Exp Med. 2011 Aug 1;208(8):1635-48.
424. Yariz KO, Walsh T, Uzak A, et al. Inherited mutation of the luteinizing
hormone/choriogonadotropin receptor (LHCGR) in empty follicle syndrome. Fertil Steril. 2011
Aug;96(2):e125-30.
425. Xu B, Roos JL, Dexheimer P, et al. Exome sequencing supports a de novo mutational paradigm
for schizophrenia. Nat Genet. 2011 Aug 7;43(9):864-8.
426. Sirmaci A, Spiliopoulos M, Brancati F, et al. Mutations in ANKRD11 cause KBG syndrome,
characterized by intellectual disability, skeletal malformations, and macrodontia. Am J Hum Genet.
2011 Aug 12;89(2):289-94.
427. Shaheen R, Faqeih E, Sunker A, et al. Recessive mutations in DOCK6, encoding the guanidine
nucleotide exchange factor DOCK6, lead to abnormal actin cytoskeleton organization and Adams-
Oliver syndrome. Am J Hum Genet. 2011 Aug 12;89(2):328-33.
428. Nosková L, Stránecký V, Hartmannová H, et al. Mutations in DNAJC5, encoding cysteine-string
protein alpha, cause autosomal-dominant adult-onset neuronal ceroid lipofuscinosis. Am J Hum
Genet. 2011 Aug 12;89(2):241-52.
429. Weedon MN, Hastings R, Caswell R, et al. Exome sequencing identifies a DYNC1H1 mutation
in a large pedigree with dominant axonal Charcot-Marie-Tooth disease. Am J Hum Genet. 2011 Aug
12;89(2):308-12.
430. Ozgül RK, Siemiatkowska AM, Yücel D, et al. Exome sequencing and cis-regulatory mapping
identify mutations in MAK, a gene encoding a regulator of ciliary length, as a cause of retinitis
pigmentosa. Am J Hum Genet. 2011 Aug 12;89(2):253-64.
431. Doi H, Yoshida K, Yasuda T, et al. Exome sequencing reveals a homozygous SYT14 mutation in
adult-onset, autosomal-recessive spinocerebellar ataxia with psychomotor retardation. Am J Hum
Genet. 2011 Aug 12;89(2):320-7.
432. Sloan JL, Johnston JJ, Manoli I, et al. Exome sequencing identifies ACSF3 as a cause of
combined malonic and methylmalonic aciduria. Nat Genet. 2011 Aug 14;43(9):883-6.
162
433. Aldahmesh MA, Khan AO, Mohamed JY, et al. Identification of ADAMTS18 as a gene mutated
in Knobloch syndrome. J Med Genet. 2011 Sep;48(9):597-601.
434. Murdock DR, Clark GD, Bainbridge MN, et al. Whole-exome sequencing identifies compound
heterozygous mutations in WDR62 in siblings with recurrent polymicrogyria. Am J Med Genet A.
2011 Sep;155A(9):2071-7.
435. Regalado ES, Guo DC, Villamizar C, et al. Exome sequencing identifies SMAD3 mutations as a
cause of familial thoracic aortic aneurysm and dissection with intracranial and other arterial
aneurysms. Circ Res. 2011 Sep 2;109(6):680-6.
436. Dickinson RE, Griffin H, Bigley V, et al. Exome sequencing identifies GATA-2 mutation as the
cause of dendritic cell, monocyte, B and NK lymphoid deficiency. Blood. 2011 Sep 8;118(10):2656-
8.
437. Hor H, Bartesaghi L, Kutalik Z, et al. A missense mutation in myelin oligodendrocyte
glycoprotein as a cause of familial narcolepsy with cataplexy. Am J Hum Genet. 2011 Sep
9;89(3):474-9.
438. Marti-Masso JF, Ruiz-Martínez J, Makarov V, et al. Exome sequencing identifies GCDH
(glutaryl-CoA dehydrogenase) mutations as a cause of a progressive form of early-onset generalized
dystonia. Hum Genet. 2012 Mar;131(3):435-42.
439. Tariq M, Belmont JW, Lalani S, et al. SHROOM3 is a novel candidate for heterotaxy identified
by whole exome sequencing. Genome Biol. 2011 Sep 21;12(9):R91.
440. Takata A, Kato M, Nakamura M, et al. Exome sequencing identifies a novel missense variant in
RRM2B associated with autosomal recessive progressive external ophthalmoplegia. Genome Biol.
2011 Sep 28;12(9):R92.
441. Theis JL, Sharpe KM, Matsumoto ME, et al. Homozygosity mapping and exome sequencing
reveal GATAD1 mutation in autosomal recessive dilated cardiomyopathy. Circ Cardiovasc Genet.
2011 Dec;4(6):585-94.
442. Pierson TM, Adams D, Bonn F, et al. Whole-exome sequencing identifies homozygous AFG3L2
mutations in a spastic ataxia-neuropathy syndrome linked to mitochondrial m-AAA proteases. PLoS
Genet. 2011 Oct;7(10):e1002325.
443. Al Badr W, Al Bader S, Otto E, et al. Exome capture and massively parallel sequencing
identifies a novel HPSE2 mutation in a Saudi Arabian child with Ochoa (urofacial) syndrome. J
Pediatr Urol. 2011 Oct;7(5):569-73.
444. Cullinane AR, Vilboux T, O'Brien K, et al. Homozygosity mapping and whole-exome
sequencing to detect SLC45A2 and G6PC3 mutations in a single patient with oculocutaneous
albinism and neutropenia. J Invest Dermatol. 2011 Oct;131(10):2017-25.
163
445. Ovunc B, Otto EA, Vega-Warner V, et al. Exome sequencing reveals cubilin mutation as a
single-gene cause of proteinuria. J Am Soc Nephrol. 2011 Oct;22(10):1815-20.
446. Bowne SJ, Humphries MM, Sullivan LS, et al. A dominant mutation in RPE65 identified by
whole-exome sequencing causes retinitis pigmentosa with choroidal involvement. Eur J Hum Genet.
2011 Oct;19(10):1074-81.
447. Kitamura A, Maekawa Y, Uehara H, et al. A mutation in the immunoproteasome subunit PSMB8
causes autoinflammation and lipodystrophy in humans. J Clin Invest. 2011 Oct;121(10):4150-60.
448. Tyynismaa H, Sun R, Ahola-Erkkilä S, et al. Thymidine kinase 2 mutations in autosomal
recessive progressive external ophthalmoplegia with multiple mitochondrial DNA deletions. Hum
Mol Genet. 2012 Jan 1;21(1):66-75.
449. Bjursell MK, Blom HJ, Cayuela JA, et al. Adenosine kinase deficiency disrupts the methionine
cycle and causes hypermethioninemia, encephalopathy, and abnormal liver function. Am J Hum
Genet. 2011 Oct 7;89(4):507-15.
450. Zangen D, Kaufman Y, Zeligson S, et al. XX ovarian dysgenesis is caused by a PSMC3IP/HOP2
mutation that abolishes coactivation of estrogen-driven transcription. Am J Hum Genet. 2011 Oct
7;89(4):572-9.
451. Galmiche L, Serre V, Beinat M, et al. Exome sequencing identifies MRPL3 mutation in
mitochondrial cardiomyopathy. Hum Mutat. 2011 Nov;32(11):1225-31.
452. Bredrup C, Saunier S, Oud MM, et al. Ciliopathies with skeletal anomalies and renal
insufficiency due to mutations in the IFT-A gene WDR19. Am J Hum Genet. 2011 Nov
11;89(5):634-43.
453. Saitsu H, Osaka H, Sasaki M, et al. Mutations in POLR3A and POLR3B encoding RNA
Polymerase III subunits cause an autosomal-recessive hypomyelinating leukoencephalopathy. Am J
Hum Genet. 2011 Nov 11;89(5):644-51.
454. Clayton-Smith J, O'Sullivan J, Daly S, et al. Whole-exome-sequencing identifies mutations in
histone acetyltransferase gene KAT6B in individuals with the Say-Barber-Biesecker variant of Ohdo
syndrome. Am J Hum Genet. 2011 Nov 11;89(5):675-81.
455. Aldahmesh MA, Mohamed JY, Alkuraya HS, et al. Recessive mutations in ELOVL4 cause
ichthyosis, intellectual disability, and spastic quadriplegia. Am J Hum Genet. 2011 Dec 9;89(6):745-
50.
456. Chen WJ, Lin Y, Xiong ZQ, et al. Exome sequencing identifies truncating mutations in PRRT2
that cause paroxysmal kinesigenic dyskinesia. Nat Genet. 2011 Nov 20;43(12):1252-5.
457. Logan CV, Lucke B, Pottinger C, et al. Mutations in MEGF10, a regulator of satellite cell
myogenesis, cause early onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD).
Nat Genet. 2011 Nov 20;43(12):1189-92.
164
458. Dauber A, Nguyen TT, Sochett E, et al. Genetic defect in CYP24A1, the vitamin D 24-
hydroxylase gene, in a patient with severe infantile hypercalcemia. J Clin Endocrinol Metab. 2012
Feb;97(2):E268-74.
459. Shamseldin HE, Faden MA, Alashram W, et al. Identification of a novel DLX5 mutation in a
family with autosomal recessive split hand and foot malformation. J Med Genet. 2012 Jan;49(1):16-
20.
460. Sergouniotis PI, Davidson AE, Mackay DS, et al. Biallelic mutations in PLA2G5, encoding
group V phospholipase A2, cause benign fleck retina. Am J Hum Genet. 2011 Dec 9;89(6):782-91.
461. Berger I, Ben-Neriah Z, Dor-Wolman T, et al. Early prenatal ventriculomegaly due to an AIFM1
mutation identified by linkage analysis and whole exome sequencing. Mol Genet Metab. 2011
Dec;104(4):517-20.
462. Bhat V, Girimaji SC, Mohan G, et al. Mutations in WDR62, encoding a centrosomal and nuclear
protein, in Indian primary microcephaly families with cortical malformations. Clin Genet. 2011
Dec;80(6):532-40.
463. Wang X, Wang H, Cao M, et al. Whole-exome sequencing identifies ALMS1, IQCB1, CNGA3,
and MYO7A mutations in patients with Leber congenital amaurosis. Hum Mutat. 2011
Dec;32(12):1450-9.
464. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging
missense mutations. Nat Methods. 2010 Apr;7(4):248-9.
465. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on
protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81.
466. Pollard KS, Hubisz MJ, Rosenbloom KR, et al. Detection of nonneutral substitution rates on
mammalian phylogenies. Genome Res. 2010 Jan;20(1):110-21.
467. Cooper GM, Stone EA, Asimenos G, et al. Distribution and intensity of constraint in mammalian
genomic sequence. Genome Res. 2005 Jul;15(7):901-13.
468. Melton PE, Pankratz N. Joint analyses of disease and correlated quantitative phenotypes using
next-generation sequencing data. Genet Epidemiol. 2011;35 Suppl 1:S67-73.
469. Stitziel NO, Kiezun A, Sunyaev S. Computational and statistical approaches to analyzing variants
identified by exome sequencing. Genome Biol. 2011 Sep 14;12(9):227.
470. Ionita-Laza I, Makarov V, Yoon S, et al. Finding disease variants in Mendelian disorders by
using sequence data: methods and applications. Am J Hum Genet. 2011 Dec 9;89(6):701-12.
471. Sjöblom T, Jones S, Wood LD, et al. The consensus coding sequences of human breast and
colorectal cancers. Science. 2006 Oct 13;314(5797):268-74.
472. Parsons DW, Jones S, Zhang X, et al. An integrated genomic analysis of human glioblastoma
multiforme. Science. 2008 Sep 26;321(5897):1807-12.
165
473. Ley TJ, Mardis ER, Ding L, et al. DNA sequencing of a cytogenetically normal acute myeloid
leukaemia genome. Nature. 2008 Nov 6;456(7218):66-72.
474. Mardis ER, Ding L, Dooling DJ, et al. Recurring mutations found by sequencing an acute
myeloid leukemia genome. N Engl J Med. 2009 Sep 10;361(11):1058-66.
475. Ley TJ, Ding L, Walter MJ, et al. DNMT3A mutations in acute myeloid leukemia. N Engl J Med.
2010 Dec 16;363(25):2424-33.
476. Shah SP, Morin RD, Khattra J, et al. Mutational evolution in a lobular breast tumour profiled at
single nucleotide resolution. Nature. 2009 Oct 8;461(7265):809-13.
477. Ding L, Ellis MJ, Li S, et al. Genome remodelling in a basal-like breast cancer metastasis and
xenograft. Nature. 2010 Apr 15;464(7291):999-1005.
478. Pleasance ED, Stephens PJ, O'Meara S, et al. A small-cell lung cancer genome with complex
signatures of tobacco exposure. Nature. 2010 Jan 14;463(7278):184-90.
479. Lee W, Jiang Z, Liu J, et al. The mutation spectrum revealed by paired genome sequences from a
lung cancer patient. Nature. 2010 May 27;465(7297):473-7.
480. Harbour JW, Onken MD, Roberson ED, et al. Frequent mutation of BAP1 in metastasizing uveal
melanomas. Science. 2010 Dec 3;330(6009):1410-3.
481. Timmermann B, Kerick M, Roehr C, et al. Somatic mutation profiles of MSI and MSS colorectal
cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS One.
2010 Dec 22;5(12):e15661.
482. Chapman MA, Lawrence MS, Keats JJ, et al. Initial genome sequencing and analysis of multiple
myeloma. Nature. 2011 Mar 24;471(7339):467-72.
483. Totoki Y, Tatsuno K, Yamamoto S, et al. High-resolution characterization of a hepatocellular
carcinoma genome. Nat Genet. 2011 May;43(5):464-9.
484. Tiacci E, Trifonov V, Schiavoni G, et al. BRAF mutations in hairy-cell leukemia. N Engl J Med.
2011 Jun 16;364(24):2305-15.
485. Pasqualucci L, Trifonov V, Fabbri G, et al. Analysis of the coding genome of diffuse large B-cell
lymphoma. Nat Genet. 2011 Jul 31;43(9):830-7.
486. Jiao Y, Shi C, Edil BH, et al. DAXX/ATRX, MEN1, and mTOR pathway genes are frequently
altered in pancreatic neuroendocrine tumors. Science. 2011 Mar 4;331(6021):1199-203.
487. Wang K, Kan J, Yuen ST, et al. Exome sequencing identifies frequent mutation of ARID1A in
molecular subtypes of gastric cancer. Nat Genet. 2011 Oct 30;43(12):1219-23.
488. International Cancer Genome Consortium, Hudson TJ, Anderson W, et al. International network
of cancer genome projects. Nature. 2010 Apr 15;464(7291):993-8.
489. Byun M, Abhyankar A, Lelarge V, et al. Whole-exome sequencing-based discovery of STIM1
deficiency in a child with fatal classic Kaposi sarcoma. J Exp Med. 2010 Oct 25;207(11):2307-12.
166
490. Snape K, Hanks S, Ruark E, et al. Mutations in CEP57 cause mosaic variegated aneuploidy
syndrome. Nat Genet. 2011 Jun;43(6):527-9.
491. Comino-Méndez I, Gracia-Aznárez FJ, Schiavi F, et al. Exome sequencing identifies MAX
mutations as a cause of hereditary pheochromocytoma. Nat Genet. 2011 Jun 19;43(7):663-7.
492. Saarinen S, Aavikko M, Aittomäki K, et al. Exome sequencing reveals germline NPAT mutation
as a candidate risk factor for Hodgkin lymphoma. Blood. 2011 Jul 21;118(3):493-8.
493. Bodmer W, Tomlinson I. Rare genetic variants and the risk of cancer. Curr Opin Genet Dev.
2010 Jun;20(3):262-7.
494. Kote-Jarai Z, Jugurnauth S, Mulholland S, et al. A recurrent truncating germline mutation in the
BRIP1/FANCJ gene and susceptibility to prostate cancer. Br J Cancer. 2009 Jan 27;100(2):426-30.
495. Zhang S, Phelan CM, Zhang P, et al. Frequency of the CHEK2 1100delC mutation among
women with breast cancer: an international study. Cancer Res. 2008 Apr 1;68(7):2154-7.
496. Yokoyama S, Woods SL, Boyle GM, et al. A novel recurrent mutation in MITF predisposes to
familial and sporadic melanoma. Nature. 2011 Nov 13;480(7375):99-103.
497. Park DJ, Odefrey FA, Hammet F, et al. FAN1 variants identified in multiple-case early-onset
breast cancer families via exome sequencing: no evidence for association with risk for breast cancer.
Breast Cancer Res Treat. 2011 Dec;130(3):1043-9.
498. Risch HA, McLaughlin JR, Cole DEC, et al. Population BRCA1 and BRCA2 mutation
frequencies and cancer penetrances: a kin-cohort study in Ontario, Canada. J Natl Cancer Inst
2006;98:1694–706.
499. The Breast Cancer Linkage Consortium. Cancer risks in BRCA2 mutation carriers. J Natl Cancer
Inst 1999;91:1310–1316.
500. van Asperen CJ, Brohet RM, Meijers-Heijboer EJ, et al. Cancer risks in BRCA2 families:
estimates for sites other than breast and ovary. J Med Genet 2005;42:711–719.
501. Couch FJ, Johnson MR, Rabe KG, et al. The prevalence of BRCA2 mutations in familial
pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2007 Feb;16(2):342-6.
502. Ferrone CR, Levine DA, Tang LH, et al. BRCA germline mutations in Jewish patients with
pancreatic adenocarcinoma. J Clin Oncol. 2009 Jan 20;27(3):433-8.
503. Abbott DW, Freeman ML, Holt JT. Double-strand break repair deficiency and radiation
sensitivity in BRCA2 mutant cancer cells. J Natl Cancer Inst. 1998 Jul 1;90(13):978-85.
504. Goggins M, Hruban RH, Kern SE. BRCA2 is inactivated late in the development of pancreatic
intraepithelial neoplasia: evidence and implications. Am J Pathol. 2000 May;156(5):1767-71.
505. Skoulidis F, Cassidy LD, Pisupati V, et al. Germline Brca2 heterozygosity promotes Kras(G12D)
-driven carcinogenesis in a murine model of familial pancreatic cancer. Cancer Cell. 2010 Nov
16;18(5):499-509.
167
506. Rowley M, Ohashi A, Mondal G, et al. Inactivation of Brca2 promotes Trp53-associated but
inhibits KrasG12D-dependent pancreatic cancer development in mice. Gastroenterology. 2011
Apr;140(4):1303-1313.e1-3.
507. Feldmann G, Karikari C, dal Molin M, et al. Inactivation of Brca2 cooperates with
Trp53(R172H) to induce invasive pancreatic ductal adenocarcinomas in mice: a mouse model of
familial pancreatic cancer. Cancer Biol Ther. 2011 Jun 1;11(11):959-68.
508. Thompson D, Easton DF, the Breast Cancer Linkage Consortium. Cancer Incidence in BRCA1
mutation carriers. J Natl Cancer Inst 2002;94:1358-65.
509. Brose MS, Rebbeck TR, Calzone KA, et al. Cancer risk estimates for BRCA1 mutation carriers
identified in a risk evaluation program. J Natl Cancer Inst 2002;94:1365–72.
510. Beger C, Ramadani M, Meyer S, et al. Down-regulation of BRCA1 in chronic pancreatitis and
sporadic pancreatic adenocarcinoma. Clinical Cancer Res 2004;10:3780-3787.
511. Honrado E, Benitez J, Palacios J. The molecular pathology of hereditary breast cancer: genetic
testing and therapeutic implications. Mod Pathol 2005;18:1305-20.
512. Esteller M, Fraga MF, Guo M, et al. DNA methylation patterns in hereditary human cancers
mimic sporadic tumorigenesis. Hum Mol Genet 2001;10:3001-3007.
513. Gudmundsdottir K, Ashworth A. The roles of BRCA1 and BRCA2 and associated proteins in the
maintenance of genomic stability. Oncogene 2006;25:5864-5874.
514. Struewing JP, Hartge P, Wacholder S, et al. The risk of cancer associated with specific mutations
of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 1997;336:1401-8.
515. Lynch HT, Deters CA, Snyder CL, et al. BRCA1 and pancreatic cancer: pedigree findings and
their causal relationships. Cancer Genetics and Cytogenetics 2005;158:119-125.
516. Tonin P, Weber B, Offit K, et al. Frequency of recurrent BRCA1 and BRCA2 mutations in
Ashkenazi Jewish breast cancer families. Nat Medicine 1996;2:1179-83.
517. Gruber SB, Petersen GM. Cancer risk in BRCA1 carriers: time for the next generation of
studies. J Natl Cancer Inst 2002;94:144-5.
518. Struewing JP, Abeliovich D, Peretz T, et al. The carrier frequency of the BRCA1 185delAG
mutation is approximately 1 percent in Ashkenazi Jewish individuals. Nat Genet. 1995
Oct;11(2):198-200.
519. Ford D, Easton DF, Peto J. Estimates of the gene frequency of BRCA1 and its contribution to
breast and ovarian cancer incidence. Am J Hum Genet. 1995 Dec;57(6):1457-62.
520. Kim DH, Crawford B, Ziegler J, et al. Prevalence and characteristics of pancreatic cancer in
families with BRCA1 and BRCA2 mutations. Fam Cancer. 2009;8(2):153-8.
521. Hall M, Olopade O. Pancreatic cancer and BRCA mutation in familial breast cancer families.
Journal of Clinical Oncology 2005;23(16S):9550.
168
522. Ozcelik H, Schmoker B, Di Nicola N, et al. Germline BRCA2 6174delT mutations in Ashkenazi
Jewish pancreatic cancer patients. Nat Genet 1997;16:17-8.
523. Peng DF, Kanai Y, Sawada M, et al. DNA methylation of multiple tumor-related genes in
association with overexpression of DNA methyltransferase 1(DNMT1) during multistage
carcinogenesis of the pancreas. Carcinogenesis 2006;27:1160-8.
524. Saif MW. Controversies in adjuvant treatment of pancreatic adenocarcinoma. JOP 2007;8:545-
552.
525. McCabe N, Turner NC, Lord CJ, et al. Deficiency in the repair of DNA damage by homologous
recombination and sensitivity to poly(ADP-ribose) polymerase inhibition. Cancer Res. 2006 Aug
15;66(16):8109-15.
526. Yun J, Zhong Q, Kwak JY, et al. Hypersensitivity of Brca1-deficient MEF to the DNA
interstrand crosslinking agent mitomycin C is associated with defect in homologous recombination
repair and aberrant S-phase arrest. Oncogene 2006;24:4009-16.
527. Treszezamsky AD, Kachnic LA, Feng Z, et al. BRCA1- and BRCA2-deficient cells are sensitive
to etoposide-induced DNA double-strand breaks via topoisomerase II. Cancer Res 2007;67:7078-81.
528. James E, Waldron-Lynch MG, Saif MW. Prolonged survival in a patient with BRCA2 associated
metastatic pancreatic cancer after exposure to camptothecin: a case report and review of literature.
Anticancer Drugs. 2009 Aug;20(7):634-8.
529. Sonnenblick A, Kadouri L, Appelbaum L, et al. Complete remission, in BRCA2 mutation carrier
with metastatic pancreatic adenocarcinoma, treated with cisplatin based therapy. Cancer Biol Ther.
2011 Aug 1;12(3):165-8.
530. Lowery M, Shah MA, Smyth E, et al. A 67-year-old woman with BRCA 1 mutation associated
with pancreatic adenocarcinoma. J Gastrointest Cancer. 2011 Sep;42(3):160-4.
531. Gu W, Lupski JR. CNV and nervous system diseases--what's new? Cytogenet Genome Res.
2008;123(1-4):54-64.
532. Alaerts M, Del-Favero J. Searching genetic risk factors for schizophrenia and bipolar disorder:
learn from the past and back to the future. Hum Mutat. 2009;30:1139-52.
533. Schaschl H, Aitman TJ, Vyse TJ. Copy number variation in the human genome and its
implication in autoimmunity. Clin Exp Immunol. 2009;156:12-6.
534. Lanktree M, Hegele RA. Copy number variation in metabolic phenotypes. Cytogenet Genome
Res. 2008;123:169-75.
535. Karageorgi S, Prescott J, Wong JY, et al. GSTM1 and GSTT1 copy number variation in
population-based studies of endometrial cancer risk. Cancer Epidemiol Biomarkers Prev. 2011
Jul;20(7):1447-52.
169
536. Engert S, Wappenschmidt B, Betz B, et al. MLPA screening in the BRCA1 gene from 1,506
German hereditary breast cancer cases: novel deletions, frequent involvement of exon 17, and
occurrence in single early-onset cases. Hum Mutat. 2008;29:948-58.
537. Madlensky L, Berk TC, Bapat BV, et al. A preventive registry for hereditary nonpolyposis
colorectal cancer.Can J Oncol. 1995;5:355-60.
538. Cotterchio M, Manno M, Klar N, et al. Colorectal screening is associated with reduced colorectal
cancer risk: a case-control study within the population-based Ontario Familial Colorectal Cancer
Registry. Cancer Causes Control. 2005;16:865-75.
539. Stewart AF, Dandona S, Chen L, et al. Kinesin family member 6 variant Trp719Arg does not
associate with angiographically defined coronary artery disease in the Ottawa Heart Genomics Study.
J Am Coll Cardiol. 2009;53:1471-2.
540. Krawczak M, Nikolaus S, von Eberstein H, et al. PopGen: population-based recruitment of
patiens and controls for the analysis of complex genotype-phenotype relationships. Community
Genet. 2006;9:55-61.
541. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus
genotype data. Genetics. 2000;155:945-59.
542. Li C, Hung Wong W. Model-based analysis of oligonucleotide arrays: model validation, design
issues and standard error application. Genome Biol. 2001;2(8):RESEARCH0032.
543. Nannya Y, Sanada M, Nakazaki K, et al. A robust algorithm for copy number detection using
high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res.
2005;65:6071-9.
544. Korn JM, Kuruvilla FG, McCarroll SA, et al. Integrated genotype calling and association analysis
of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet. 2008 Oct;40(10):1253-
60.
545. Pinto D, Pagnamenta AT, Klei L, et al. Functional impact of global rare copy number variation in
autism spectrum disorders. Nature. 2010 Jul 15;466(7304):368-72.
546. Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative C(T) method. Nat
Protoc. 2008;3:1101-8.
547. Zhang J, Feuk L, Duggan GE, et al. Development of bioinformatics resources for display and
analysis of copy number and other structural variants in the human genome. Cytogenet Genome Res.
2006;115:205-14.
548. Higgins ME, Claremont M, Major JE, et al. CancerGenes: a gene selection resource for cancer
genome projects. Nucleic Acids Res. 2007;35(Database issue):D721-6.
549. Shepherd R, Forbes SA, Beare D, et al. Data mining using the Catalogue of Somatic Mutations in
Cancer BioMart. Database (Oxford) 2011:bar018. Print 2011.
170
550. Jin Q, Gao G, Mulder KM. Requirement of a dynein light chain in TGFbeta/Smad3 signaling. J
Cell Physiol. 2009 Dec;221(3):707-15.
551. Jiang J, Yu L, Huang X, et al. Identification of two novel human dynein light chain genes,
DNLC2A and DNLC2B, and their expression changes in hepatocellular carcinoma tissues from 68
Chinese patients. Gene. 2001;281:103-13.
552. Malinda KM, Kleinman HK. The laminins. Int J Biochem Cell Biol. 1996 Sep;28(9):957-9.
553. Kim YH, Lee HC, Kim SY, et al. Epigenomic analysis of aberrantly methylated genes in
colorectal cancer identifies genes commonly affected by epigenetic alterations. Ann Surg Oncol.
2011;18:2338-47.
554. Scrideli CA, Carlotti CG Jr, Okamoto OK, et al. Gene expression profile analysis of primary
glioblastomas and non-neoplastic brain tissue: identification of potential target genes by
oligonucleotide microarray and real-time quantitative PCR. J Neurooncol. 2008;88:281-91.
555. Pinto D, Darvishi K, Shi X, et al. Comprehensive assessment of array-based platforms and calling
algorithms for detection of copy number variants. Nat Biotechnol. 2011;29:512-20.
556. Wang H, Linghu H, Wang J, et al. The role of Crk/Dock180/Rac1 pathway in the malignant
behavior of human ovarian cancer cell SKOV3. Tumour Biol. 2010;31:59-67.
557. Sanders MA, Ampasala D, Basson MD. DOCK5 and DOCK1 regulate Caco-2 intestinal
epithelial cell spreading and migration on collagen IV. J Biol Chem. 2009;284:27-35.
558. Buchholz M, Braun M, Heidenblut A, et al. Transcriptome analysis of microdissected pancreatic
intraepithelial neoplastic lesions. Oncogene. 2005;24:6626-36.
559. Gu W, Zhang F, Lupski JR. Mechanisms for human genomic rearrangements. Pathogenetics.
2008 Nov 3;1(1):4.
560. Pruitt KD, Tatusova T, Brown GR, et al. NCBI Reference Sequences (RefSeq): current status,
new features and genome annotation policy. Nucleic Acids Res. 2012 Jan;40(Database issue):D130-5.
561. Griffiths-Jones S. miRBase: microRNA sequences and annotation. Curr Protoc Bioinformatics.
2010 Mar;Chapter 12:Unit 12.9.1-10.
562. Hercus C. 2009 [last accessed date November, 2009]. www.novocraft.com.
563. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework
for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297-303.
564. Affymetrix. BRLMM: An improved genotype calling method for the GeneChip® Mapping 500K
Array Set. http://affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf
565. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on
protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81.
566. Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL:
http://evs.gs.washington.edu/EVS/) [last accessed Dec 2011].
171
567. Ahel I, Ahel D, Matsusaka T, et al. Poly(ADP-ribose)-binding zinc finger motifs in DNA
repair/checkpoint proteins. Nature. 2008 Jan 3;451(7174):81-5.
568. Macrae CJ, McCulloch RD, Ylanko J, et al. APLF (C2orf13) facilitates nonhomologous end-
joining and undergoes ATM-dependent hyperphosphorylation following ionizing radiation. DNA
Repair (Amst). 2008 Feb 1;7(2):292-302.
569. Allen NP, Donninger H, Vos MD, et al. RASSF6 is a novel member of the RASSF family of
tumor suppressors. Oncogene. 2007 Sep 13;26(42):6203-11.
570. Ou YY, Mack GJ, Zhang M, et al. CEP110 and ninein are located in a specific domain of the
centrosome associated with centrosome maturation. J Cell Sci. 2002 May 1;115(Pt 9):1825-35.
571. Carrara S, Cangi MG, Arcidiacono PG, et al. Mucin expression pattern in pancreatic diseases:
findings from EUS-guided fine-needle aspiration biopsies. Am J Gastroenterol. 2011
Jul;106(7):1359-63.
572. Fletcher O, Houlston RS. Architecture of inherited susceptibility to common cancer. Nat Rev
Cancer. 2010 May;10(5):353-61.
573. Mehrotra PV, Ahel D, Ryan DP, et al. DNA repair factor APLF is a histone chaperone. Mol Cell.
2011 Jan 7;41(1):46-55.
574. Okada S, Tokunaga E, Kitao H, et al. Loss of Heterozygosity at BRCA1 Locus Is Significantly
Associated with Aggressiveness and Poor Prognosis in Breast Cancer. Ann Surg Oncol. 2011 Dec 17.
[Epub ahead of print]
575. Lane DP. Cancer. p53, guardian of the genome. Nature. 1992 Jul 2;358(6381):15-6.
576. Chen XR, Zhang WZ, Lin XQ, et al. Genetic instability of BRCA1 gene at locus D17S855 is
related to clinicopathological behaviors of gastric cancer from Chinese population. World J
Gastroenterol. 2006 Jul 14;12(26):4246-9.
577. Pestonjamasp PH, Mittra I. Analysis of BRCA1 involvement in breast cancer in Indian women.
J Biosci. 2000 Mar;25(1):19-23.
578. Garcia-Patiño E, Gomendio B, Lleonart M, et al. Loss of heterozygosity in the region including
the BRCA1 gene on 17q in colon cancer. Cancer Genet Cytogenet. 1998 Jul 15;104(2):119-23.
172
Appendices
1. Appendix Tables
Table S1: Primers for BRCA1 microsatellite markers Microsatellite marker Number of
repeats Expected average amplicon size (bp)
Primer sequences Annealing temp (°C)
D17S855 dinucleotide 151 F: GGA TGG CCT TTT AGA AAG TGG R: ACA CAG ACT TGT CCT ACT GCC
60
D17S1322 trinucleotide 130
F: CTA GCC TGG GCA ACA AAC GA R: GCA GGA AGC AGG AAT GGA AC
57
D17S579 dinucleotide 123 F: AGT CCT GTA GAC AAA ACC TG R: CAG TTT CAT ACC AAG TTC CT
57
D16S2616 trinucleotide 125
F: TGT GAT TCA GTA GGT CTT GGG R: GTG ACT AAA CCT GAC ATT GTG C
62
Table S2: BRCA1 mutations sequencing primers Mutation Expected amplicon size (bp) Primer sequences Annealing temp (°C) 5382insC 109 F: CAG AGG AGA TGT GGT CAA TG
R: GGG GTG AGA TTT TTG TCA AC 55
185delAg 91 F: CGT TGA AGA AGT ACA AAA TGT C R: CCC AAA TTA ATA CAC TCT TGT G
59
2318delG 103 F: CTA AGT GTT CAA ATA CCA GTG R: GCA TTA TTA GAC ACT TTA ACT G
55
Table S3: FPC cases in CNV study
(Table available as excel sheet on attached CD)
Table S4: Controls (OFCCR and FGICR) in CNV study
(Table available as excel sheet on attached CD)
Table S5: Primers for qPCR validation of CNVs
CNV ID F primer R primer D_180 GGAGGACATGGAATTGATGG CTGCAAGCAAAGATCACCAA D_19 GTAGCAGAGTGGGCCAAAAA GGGAAAAATTCACCCCTGAT
D_128 GCAGAATGAAATTTGGCACA AAGCCACCACTGAGGTTCAC D_152 CCAGAGAGGATGGTGAGAGG GCTTTGGGACTGACTGCTTC
D_234 (primer A) AAGGAGGCTGAGTGGCTACA CCTTGAAGACCTGGCTTCTG D_234 (primer B) AGGGAAGAACACCTCCACCT ATCCCTCTTCCTTGCTCCAT D_143 (primer A) TGCTCCATGGTGCTGATTTA CACACATCACTGCCCTTCAC D_143 (primer B) TCTGTTCCTATTCGGCCATC TTCTCCCAAACTCCACAAGC
D_220 GCTCCAAGATCCGTTCTGAG TCATTTGACGCATGACCCTA D_30 & D_36
(same region in two samples) TACAGGCAACCCCAGGTATC CACCCAGCCATGTTTTCTTT
D_40 AAAGAGGCCAACAGGAAACC TCTGAGAAAGCGTAGACATTTCC D_105 (primer A) TTTCTAGCTGGGCTCTCCAA CCAGCAATGGTAGGGTGAGT D_105 (primer B) CTGGCTTTTGTGGATGGTTT TGCATGCTTGAATCTCCTTG
D_83 ACAGCCAAGGGTGAAACATC CTGTGAACCTGGGTGAACCT D_48 CACTGGATTGGAGACCAGAA TTGGAAGAACTCGGCTTGAT
D_125 ACGGATTCCTCAACACTTGC CTGTCCTGGCTACTGCATCA D_134 GCATCCTTGCACTACCCATT GGGGGAAAGTGCTGTGTAAA
D_142 (primer A) CTACCTACTGGGCACCCAAA TTGATGTTGAAATGGGCTGA D_142 (primer B) TGGTGATACCCACTGCTGAA CCAGCTTGCTTTCTTTGTCC
D_56 GCAGATTTCAGGTGTGCTGA AAAGACACCCTGGCAGAGAA G_225 TGCCTTGGCTCCACTTCTAT GTCCAGCTCCACAAGAGAGG
173
G_226 TGTGCCAGTGGACTCTGAAC TTTGTTGACCACTCCCTTCC G_365 (primer A) TCCCAACCATATCACCCAGT AAAACCAACCAAGGCATCAG G_365 (primer B) TGCCTGCTGCTTAAAAAGGT ATATCAACGACTGCCCTTGG
G_369 GGGGCAGCTGTAAATACCAA CCCCAGGTCATAGACCAGAA G_380 GGCAGGTAGACATGACAGCA CCATCTCAGCTCCAGTCACA G_407 TGCCCCCAAAATGAATGTAT CAAAAGTGTTGGCTGCTGAA
G_603/604 TAGGCCTTGGATGGAAATTG GTGATGAGGGGGTGAAGAGA G_69 TGGGAACCCCTGCTATAGTG TGCTCGCTTTGAATTTGATG G_88 AGGTCAGCGCTCCTCAATAA TGCCCCTGTGCATACAAATA
G_97 (primer A) CAGCTCTCCAGGTCATCCAT GAGTTCACCAGGTGGGAAAA G_97 (primer B) AGAACCGAGTGGAAAGAGCA TGAGGCCCAAAGATGGTAAC
Table S6: Primers for qPCR breakpoint mapping of TGFBR3-transecting duplication
CNV ID F primer R primer T_Out_1 CCAAGGCCTCTGGACTAGGT AGACTTGGAGCCCTAGGACAA T_Out_2 TCACTTGGCTTCATGAAAAGG AAATAGCCCCAGATGTGTGC T_Out_3 AGCCAAGAGCTGTGTTTGTGT AAATGCAATCAAGGCAGCTT T_Out_4 GGCCTCTAGCCCGAAATAAC GACTGCAAAATGGGTGTGG O_In_2 CTTGTGGTTTTGCCTGGAAT ACCACTGTGCAGCTCCTGA
O_Out_1 CCAGTTTGGAATGCAATGAA ACTCTCAGTTGTGGCTTGGAG O_Out_5 ACAAATTGCTGTTTCTTTCTACAGC TTACCTGCGAGCTACTGAATATAGG
Sequencing Primers CTGGTAGACAGTTGGGGTTTC ACATCTCTGGTGCCCTTTG
Table S7: High- and low-confidence losses on Affy500K array in FPC cases
(Table available as excel sheet on attached CD)
Table S8: High- and low-confidence gains on Affy500K array in FPC cases
(Table available as excel sheet on attached CD)
Table S9: High- and low-confidence losses on Affy500K array in controls
(Table available as excel sheet on attached CD)
Table S10: High- and low-confidence gains on Affy500K array in controls
(Table available as excel sheet on attached CD)
Table S11: High-confidence CNVs on Affy 6.0 array in FPC cases
(Table available as excel sheet on attached CD)
Table S12: High-confidence CNVs on Affy 6.0 array in controls
(Table available as excel sheet on attached CD)
174
2. Appendix Figures One outlier excluded from each set of sample results if value is outside range of mean +/- 2SD
(for this purpose, 2*SD and range is calculated after removing the value in question)
Fold difference calculated relative to average dCt for control samples (i.e. ddCt for each sample is
dCt(sample)-dCt(average))
(error bars = 2*SD of fold difference)
For all figures, the sample with “Id_” is FPC case containing CNV; samples with “RD-“ identifiers are
controls.
Figure S1 – qPCR of region D_180
Figure S2 – qPCR of region D_19
175
Figure S3 – qPCR of region D_128
Figure S4 – qPCR of region D_152
Figure S5 – qPCR of region D_234 (primer A)
176
Figure S6 – qPCR of region D_234 (primer B)
Figure S7 – qPCR of region D_143 (primer A)
Figure S8 – qPCR of region D_143 (primer B)
177
Figure S9 – qPCR of region D_220
Figure S10 – qPCR of region D_30 & D_36
Figure S11 – qPCR of region D_40
178
Figure S12 – qPCR of region D_105 (primer A)
Figures S13 – qPCR of region D_105 (primer B)
Figure S14 – qPCR of region D_83
179
Figure S15 – qPCR of region D_48
Figure S16 – qPCR of region D_125
Figure S17 – qPCR of region D_134
180
Figure S18 – qPCR of region D_142 (primer A)
Figure S19 – qPCR of region D_142 (primer B)
Figure S20 – qPCR of region D_56
181
Figure S21 – qPCR of region G_225
Figure S22 - Region: G_226
Figure S23 – qPCR of region G_365 (primer A)
182
Figure S24 – qPCR of region G_365 (primer B)
Figure S25 – qPCR of region G_369
Figure S26 – qPCR of region G_380
183
Figure S27 – qPCR of region G_407
Figure S28 – qPCR of region G_603/604
Figure S29 – qPCR of region G_69
184
Figure S30 – qPCR of region G_88
Figure S31 – qPCR of region G_97 (primer A) – ID_27
Figure S32 – qPCR of region G_97 (primer B) – ID_27
185
Figure S33 - Region G_97 (primer A) – qPCR in ID_203 and family members
Figure S34 - Region G_97 (primer A) – qPCR in ID_203’s family members
Figure S35 - Region G_97 (primer A) – qPCR in ID_203 and family members
186
Figure S36 - Region G_97 (primer A) – qPCR in ID_203’s family members
Figure S37 - Region G_97 (primer A) – qPCR in ID_203’s family members
Figure S38 - Region G_97 (primer B) –qPCR in ID_203 and family members
187
Figure S39 – “T_Out_1” – qPCR fine-mapping G_97 breakpoint in Id_203
Figure S40 – “T_Out_2” – qPCR fine-mapping G_97 breakpoint in Id_203
Figure S41 – “T_Out_3” – qPCR fine-mapping G_97 breakpoint in Id_203
188
Figure S42 – “T_Out_4” – qPCR fine-mapping G_97 breakpoint in Id_203
Figure S43 – “O_In_2” – qPCR fine-mapping G_97 breakpoint in Id_203
Figure S44 – “O_Out_1” – qPCR fine-mapping G_97 breakpoint in Id_203
189
Figure S45 – “O_Out_5” – qPCR fine-mapping G_97 breakpoint in Id_203
top related