identifying susceptibility genes for familial pancreatic cancer using novel high-resolution

202
Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-Resolution Genome Interrogation Platforms by Wigdan Ridha Al-Sukhni A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Institute of Medical Science University of Toronto © Copyright by Wigdan Ridha Al-Sukhni 2012

Upload: others

Post on 12-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-Resolution Genome

Interrogation Platforms

by

Wigdan Ridha Al-Sukhni

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Institute of Medical Science University of Toronto

© Copyright by Wigdan Ridha Al-Sukhni 2012

Page 2: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

ii

Identifying Susceptibility Genes for Familial Pancreatic Cancer Using

Novel High-Resolution Genome Interrogation Platforms

Wigdan Ridha Al-Sukhni

Doctor of Philosophy

Institute of Medical Science

University of Toronto

2012

Abstract

Familial Pancreatic Cancer (FPC) is a cancer syndrome characterized by clustering of pancreatic cancer in

families, but most FPC cases do not have a known genetic etiology. Understanding genetic predisposition

to pancreatic cancer is important for improving screening as well as treatment. The central aim of this

thesis is to identify candidate susceptibility genes for FPC, and I used three approaches of increasing

resolution. First, based on a candidate-gene approach, I hypothesized that BRCA1 is inactivated by loss-

of-heterozygosity in pancreatic adenocarcinoma of germline mutation carriers. I demonstrated that 5/7

pancreatic tumors from BRCA1-mutation carriers show LOH, compared to only 1/9 sporadic tumors,

suggesting that BRCA1 inactivation is involved in tumorigenesis in germline mutation carriers. Second, I

hypothesized that the germline genomes of FPC subjects differ in copy-number profile from healthy

genomes, and that regions affected by rare deletions or duplications in FPC subjects overlap candidate

tumor-suppressors or oncogenes. I found no significant difference in the global copy-number profile of

FPC and control genomes, but I identified 93 copy-number variable genomic regions unique to FPC

subjects, overlapping 88 genes of which several have functional roles in cancer development. I

investigated one duplication to sequence the breakpoints, but I found that this duplication did not

segregate with disease in the affected family. Third, I hypothesized that in a family with multiple

pancreatic cancer patients, genes containing rare variants shared by the affected members constitute

Page 3: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

iii

susceptibility genes. Using next-generation sequencing to capture most bases in coding regions of the

genome, I interrogated the germline exome of three relatives who died of pancreatic cancer and a relative

who is healthy at advanced age. I identified a short-list of nine candidate genes with unreported

mutations shared by the three affected relatives and absent in the unaffected relative, of which a few had

functional relevance to tumorigenesis. I performed Sanger sequencing to screen an unrelated cohort of

approximately 70 FPC patients for mutations in the top two candidate genes, but I found no additional

rare variants in those genes. In conclusion, I present a list of candidate FPC susceptibility genes for

further validation and investigation in future studies.

Page 4: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

iv

Acknowledgments My research would not have been possible without the contribution of the following individuals:

A. Borgida, S. Holter, H. Rothenmund, and K. Smith at Ontario Pancreas Cancer Study and Ontario

Familial Gastrointestinal Cancer Registry for patient recruitment and selection. T. Selander of Samuel

Lunenfel Research Institute Biospecimen Repository for DNA extraction. S. Joe (Gallinger Lab) for

script-writing; N. Zwingerman, A. Gropper, and S. Moore (Gallinger Lab) for assistance with qPCR; A.

Lionel (Scherer Lab) for computational analysis of Affy6.0 data on Birdsuite and iPattern; Q. Trinh

(McPherson Lab) for computational analysis of exome data; R. Grant (Gallinger Lab) for assistance with

exome data interpretation; H. Kim and T. McPherson (Gallinger Lab) for assitance with PCR and Sanger

validation of exome variants. K. Hay, J. Keating, and S. Levitt (Gallinger Lab) for administrative support;

J. McPherson (Ontario Institute for Cancer Research) for exome sequencing data; and C. Marshall, D.

Pinto, D. Merico (The Centre for Applied Genomics), A. Shlien and D. Malkin (Malkin Lab) for their

advice on my data analysis and manuscript preparations.

My sincere gratitude to the Pancreatic Cancer Genetic Epidemiology Consortium (PACGENE) (PI - G

Petersen, Mayo) for being an invaluable source of DNA samples and insight into pancreatic cancer

genetics.

I am very grateful to my Program Advisory Committee (Gary Bader, Steven Narod, Stephen Scherer) for

their insightful feedback and advice throughout the five years of my PhD. In particular, their thoughtful

review of my manuscripts and thesis was most helpful and deeply appreciated.

To my supervisor, Steve Gallinger – I cannot adequately thank you in this crowded page for all that your

mentorship has meant to me since I first met you seven years ago. You pushed me when I needed

pushing and supported me when I was afraid of falling. You listened patiently to my complaints. You

cared about my success. I will always appreciate your open-mindedness, your integrity, and your

compassion. I feel most fortunate that I am able to call you my mentor and friend. Thank you for

everything.

A special thank you to M. Crump for helping me maneuver around some unexpected bumps in the road of

my PhD, and for exemplifying the compassionate clinician.

Page 5: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

v

I dedicate this thesis to my beautiful family:

To Mama and Baba – Your love for me has been the greatest gift and blessing in my life, it is the reason

for who I am today. Thank you for supporting my aspirations even when you did not always understand

where they were taking me.

To Eisar, Mayce, Mohammed, and Bann – Thank you for putting up with me in my worst days… I am

proud of you all.

To my aunts, uncles, and cousins in Iraq and elsewhere – Thank you for keeping me alive in your hearts

despite the long years and oceans separating us. You inspire me.

I am grateful for the financial support received from the CIHR Vanier Doctoral Research Award,

Lustgarten grant, Invest-in-Research grant from Princess Margarte Hospital, Canadian Society for

Surgical Oncology grant, Johnson & Johnson research award, American HepatoPancreaticoBiliary

Association grant, and the Department of Surgery at the University of Toronto.

Page 6: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

vi

Table of Contents Abstract..........................................................................................................................................................ii

Acknowledgments.........................................................................................................................................iv

List of Tables...............................................................................................................................................vii

List of Figures.............................................................................................................................................viii

List of Appendices........................................................................................................................................ix

Abbreviations................................................................................................................................................xi

Chapter 1 Literature Review.........................................................................................................................1

1. Pancreatic Cancer.................................................................................................................1

2. Copy Number Variation.......................................................................................................12

3. Whole-Exome Sequencing..................................................................................................37

Chapter 2 Loss of Heterozygosity at BRCA1 Locus in Pancreatic Adenocarcinoma.................................51

1. Abstract................................................................................................................................51

2. Introduction..........................................................................................................................51

3. Materials & Methods...........................................................................................................52

4. Results..................................................................................................................................55

5. Discussion............................................................................................................................58

Chapter 3 Germline Genomic Copy Number Variation in Familial Pancreatic Cancer.............................63

1. Abstract................................................................................................................................63

2. Introduction..........................................................................................................................63

3. Materials & Methods...........................................................................................................64

4. Results..................................................................................................................................73

5. Discussion............................................................................................................................94

Chapter 4 Exome Sequencing in a Familial Pancreatic Cancer Kindred..................................................100

1. Abstract..............................................................................................................................100

2. Introduction........................................................................................................................100

3. Materials & Methods.........................................................................................................101

4. Results................................................................................................................................106

5. Discussion..........................................................................................................................119

Chapter 5 General Discussion, Conclusions, and Future Directions......................................................122

References..................................................................................................................................................133

Appendices.................................................................................................................................................172

Page 7: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

vii

List of Tables Table 1 Studies estimating risk of pancreatic adenocarcinoma in relatives of affected patients

Table 2 Summary of published studies reporting germline genomic copy-number variation in non-

disease samples

Table 3 Studies using exome-sequencing to identify genetic cause of disease

Table 4 Characteristics of BRCA1 mutation carriers and sporadic pancreatic cancer patients

Table 5 Pedigree summary for BRCA1 mutation carriers

Table 6 LOH results for BRCA1 mutation carriers and sporadic pancreatic cancer cases

Table 7 Proportion of high-confidence losses in cases and controls

Table 8 Proportion of high-confidence gains in cases and controls

Table 9 CNVs called by each of Birdsuite and iPattern in 36 samples on Affymetrix 6.0 array

Table 10 High confidence CNV profile of cases vs. controls (excluding EBV-derived samples and

excluding controls with data from only one chip)

Table 11 FPC specific CNVs

Table 12 Genes whose coding regions are affected by FPC-specific CNVs

Table 13 Summary of raw sequence data from Illumina GAII for each subject

Table 14 Sanger validation data for selected SNVs in each exome subject

Table 15 Sanger validation data for selected indels in each exome subject

Table 16 Number of variants identified in each exome subject

Table 17 Genes containing variants identified by filtration model #1, 2, 3, and/or 4

Table 18 Additional candidate variants in untranslated regions shared by exome subjects

Page 8: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

viii

List of Figures Figure 1 Location of BRCA1 microsatellite markers on chromosome 17

Figure 2 Sample electropherogram of microsatellite marker fragment analysis

Figure 3 Three representative matched-pair electropherograms for microsatellite LOH

Figure 4 Representative sequencing result for an individual with 5382insC germline BRCA1 mutation

Figure 5 Analysis of 500K arrays in FPC cases and controls

Figure 6 Criteria for merging CNVs

Figure 7 CNV prioritization plan

Figure 8 Gains and losses identified in FPC cases by each algorithm/chip

Figure 9 Gains and losses identified in controls by each algorithm/chip

Figure 10 Duplications overlapping TGFBR3 gene

Figure 11 Pedigree of case ID-203, indicating results of qPCR testing for duplication G_97

Figure 12 Fine-mapping the breakpoint of duplication overlapping TGFBR3 using qPCR walk-along

method

Figure 13 PCR gel demonstrating amplification of ~1.5-2kb fragment containing G_97 duplication

breakpoint in case Id_203

Figure 14 G_97 duplication breakpoint mapping by Sanger sequencing

Figure 15 PCR gel illustrating amplification of test regions and duplication breakpoint in case Id-203 and

affected sister

Figure 16 FPC-specific losses and gains on autosomal chromosomes

Figure 17 Pedigree of FPC kindred investigated by exome sequencing

Figure 18 Average coverage of bases in target region of exome per subject

Figure 19 Read-depth per base in target region of exome in each subject

Figure 20 Genome-wide distribution of all SNVs identified in each exome subject

Figure 21 Genome-wide distribution of SNVs excluding synonymous variants in each exome subject

Figure 22 Genome-wide distribution of SNVs not reported in dbSNP131 in each exome subject

Page 9: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

ix

List of Appendices Table S1 Primers for BRCA1 microsatellite markers

Table S2 BRCA1 mutations sequencing primers

Table S3 FPC cases in CNV study

Table S4 Controls (OFCCR and FGICR) in CNV study

Table S5 Primers for qPCR validation of CNVs

Table S6 Primers for qPCR breakpoint mapping of TGFBR3-transecting duplication

Table S7 High- and low-confidence losses on Affy500K array in FPC cases

Table S8 High- and low-confidence gains on Affy500K array in FPC cases

Table S9 High- and low-confidence losses on Affy500K array in controls

Table S10 High- and low-confidence gains on Affy500K array in controls

Table S11 High-confidence CNVs on Affy 6.0 array in FPC cases

Table S12 High-confidence CNVs on Affy 6.0 array in controls

Figure S1 qPCR of region D_180

Figure S2 qPCR of region D_19

Figure S3 qPCR of region D_128

Figure S4 qPCR of region D_152

Figure S5 qPCR of region D_234 (primer A)

Figure S6 qPCR of region D_234 (primer B)

Figure S7 qPCR of region D_143 (primer A)

Figure S8 qPCR of region D_143 (primer B)

Figure S9 qPCR of region D_220

Figure S10 qPCR of region D_30 & D_36

Figure S11 qPCR of region D_40

Figure S12 qPCR of region D_105 (primer A)

Figure S13 qPCR of region D_105 (primer B)

Figure S14 qPCR of region D_83

Figure S15 qPCR of region D_48

Figure S16 qPCR of region D_125

Figure S17 qPCR of region D_134

Figure S18 qPCR of region D_142 (primer A)

Figure S19 qPCR of region D_142 (primer B)

Figure S20 qPCR of region D_56

Figure S21 qPCR of region G_225

Page 10: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

x

Figure S22 qPCR of region G_226

Figure S23 qPCR of region G_365 (primer A)

Figure S24 qPCR of region G_365 (primer B)

Figure S25 qPCR of region G_369

Figure S26 qPCR of region G_380

Figure S27 qPCR of region G_407

Figure S28 qPCR of region G_603/604

Figure S29 qPCR of region G_69

Figure S30 qPCR of region G_88

Figure S31 Region: G_97 (primer A) – ID_27

Figure S32 Region: G_97 (primer B) – ID_27

Figure S33 Region: G_97 (primer A) – ID_203 and family members

Figure S34 Region: G_97 (primer A) – ID_203’s family members

Figure S35 Region: G_97 (primer A) – ID_203 and family members

Figure S36 Region: G_97 (primer A) – ID_203’s family members

Figure S37 Region: G_97 (primer A) – ID_203’s family members

Figure S38 Region: G_97 (primer B) – ID_203 and family members

Figure S39 “T_Out_1” – Fine-mapping G_97 breakpoint in Id_203

Figure S40 “T_Out_2” – Fine-mapping G_97 breakpoint in Id_203

Figure S41 “T_Out_3” – Fine-mapping G_97 breakpoint in Id_203

Figure S42 “T_Out_4” – Fine-mapping G_97 breakpoint in Id_203

Figure S43 “O_In_2” – Fine-mapping G_97 breakpoint in Id_203

Figure S44 “O_Out_1” – Fine-mapping G_97 breakpoint in Id_203

Figure S45 “O_Out_5” – Fine-mapping G_97 breakpoint in Id_203

Page 11: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

xi

Abbreviations AD – autosomal dominant

AGTC - Analytical Genetics Technology Centre

AJ – Ashkenazi Jewish

AML – acute myeloid leukemia

AR – autosomal recessive

BAC – bacterial artificial chromosome

BC – breast cancer

CCDS - Collaborative Consensus Coding Sequence

CGH – comparative genomic hybridization

ChIP-seq - chromatin immunoprecipitation sequencing

CIN – chromosomal instability

CNV – copy number variation

Conc – concordant

COSMIC - Catalogue of Somatic Mutations in Cancer

CRC – colorectal cancer

CSI – chromosomal structure instability

ddNTPs - dideoxy trinucleotide triphosphates

del - deletion

DGV – Database of Genomic Variants

Disc - discordant

EBV – Epstein-Barr virus

FAMMM - familial atypical multiple mole melanoma

FDR – first degree relative

FFPE – formalin-fixed paraffin-embedded

FGICR – familial gastrointestinal cancer registry

FISH – fluorescence in-situ hybridization

FN – false negative

FoSTeS - fork stalling and template switching

FP – false positive

FPC – familial pancreatic cancer

GB – gallbladder

GDB – human genome database

GST – glutathione-S-transferase

Page 12: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

xii

GTC – genotyping console

GWAS – genome wide association study

HBOC - hereditary breast and ovarian cancer

Het - heterozygous

HMM – hidden Markov model

Homo - homozygous

HP – hereditary pancreatitis

HR – hazard ratio

ICGC - International Cancer Genome Consortium

IHGSC - International Human Genome Sequencing Consortium

Ins - insertion

IPMN – intraductal pancreatic mucinous neoplasm

LCL – lymphoblastoid cell lines

LD – linkage disequilibrium

LOD – logarithm of odds

LOH – loss of heterozygosity

MAF - minor allele frequency

MCN – mucinous cystic neoplasm

MEI – mobile element insertion

MLPA – multiplex ligation probe amplification

MMBIR - microhomology-mediated break-induced replication

MSKCC - Memorial Sloan Kettering Cancer Centre

NAHR – nonallelic homologous recombination

NBPF – neuroblastoma breakpoint family

NCBI – National Centre for Biotechnology Information

NFPTR - National Familial Pancreas Tumor Registry

NHEJ – nonhomologous end joining

NIH – National Institute of Health

NGS – next generation sequencing

NK – natural killer cell

nsSNV – nonsynonymous single nucleotide variants

OC – ovarian cancer

OFCCR - Ontario Familial Colon Cancer Registry

OHI – Ottawa Heart Institute

OMIM - Online Mendelian Inheritance in Man

Page 13: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

xiii

OPCS – Ontario Pancreas Cancer Study

OR – odds ratio

OR genes – olfactory receptor genes

QC – quality control

PACGENE - Pancreatic Cancer Genetic Epidemiology Consortium

PanIN – pancreatic intraepithelial neoplasia

PARP – poly-(ADP-ribose)-polymerase

PC – pancreatic cancer

PCR – polymerase chain reaction

PGFE – pulsed gel field electrophoresis

PJS - Peutz-Jeghers syndrome

qPCR – quantitative polymerase chain reaction

qRT-PCR – quantitative reverese-transcription polymerase chain reaction

ROMA – representational oligonucleotide microarray analysis

RR – relative risk

SDR – second degree relative

SEER – surveillance, epidemiology and end results

SIR – standardized incidence ratio

SNP – single nucleotide polymorphism

SNV – single nucleotide variants

SPC – sporadic pancreatic cancer

TCAG – The Centre for Applied Genomics

TN – true negative

TP – true positive

UCSC - University of California, Santa Cruz

UPD – uniparental disomy

UTR – untranslated region

VNTR - variable nucleotide tandem repeat

WT - wildtype

Page 14: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

1

Chapter 1 - Literature Review

1. Pancreatic Cancer

1.1 Pathology and epidemiology Pancreatic ductal adenocarcinoma (otherwise known as pancreatic cancer) is a highly lethal invasive

epithelial neoplasm with ductal differentiation, obscuring the lobular pattern of normal pancreatic

parenchyma. Pancreatic cancer grossly appears as a firm highly sclerotic mass with poorly circumscribed

borders. Microscopically, infiltrating gland-forming neoplastic cells are commonly surrounded by non-

neoplastic stroma in a characteristically intense desmoplastic reaction which often results in low tumor

cellularity.1

Pancreatic cancer is the fourth leading cause of cancer death in North America. The estimated number of

incident cases and deaths due to pancreatic cancer in the US in 2010 was 43,140 and 36,800,

respectively.2 In Canada, the estimated number of new cases and deaths from pancreatic cancer in 2011

was 4,100 and 3,800, respectively.3 Age-adjusted incidence in the U.S. based on SEER (Surveilance,

Epidemiology and End Results) data between 2004-2008 was 12 per 100,000 men and women; total

lifetime risk was 1.45% (approximately 0.5% by age 70).2

Due to the retroperitoneal location of the pancreas and lack of specific symptoms of early pancreatic

cancer, most patients present with advanced disease that precludes surgical resection. For those patients,

the only treatment option is palliation, and despite many trials of various chemotherapeutic and

molecular-target drugs and/or radiotherapy, median survival is 9-11 months.4 For patients who do

undergo surgical resection of localized pancreatic cancer, 80-85% ultimately recur locally and/or

systemically, resulting in 5-year survival of < 20% and overall 5-year survival for all pancreatic cancer

patients of <5%.5

1.2 Molecular biology Three distinct pre-invasive lesions have been identified as precursors for pancreatic adenocarcinoma:

pancreatic intraepithelial neoplasia (PanIN), intraductal papillary mucinous neoplasms (IPMNs), and

mucinous cystic neoplasms (MCNs). Each of these lesions has been associated with increased risk of

cancer and the arising cancer has been shown to develop from cells within the precursor. PanINs are

microscopic lesions in the smaller pancreatic ducts, and they are associated with a progressive spectrum

of cytologic and architectural atypia (corresponding to the classification of PanIN1-A, PanIN1-B, PanIN-

2, and PanIN-3).6 Mouse models of pancreatic cancer develop very similar lesions to human PanINs, and

Page 15: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

2

molecular analyses have demonstrated that PanINs sequentially accumulate genetic alterations found in

invasive cancer, suggesting an “adenoma-to-carcinoma” progressive model akin to that of colorectal

cancer.7

However, the natural history of PanINs is not yet clear: while it is evident that advanced stage PanIN-3

lesions are tightly associated with cancer8, early-stage PanIN-1 lesions are quite common and are most

prevalent in older subjects.9 Moreover, PanINs are frequently multi-focal, and although endoscopic

ultrasound can detect parenchymal changes associated with PanINs, it does so at less than 100%

specificity.10,11 Therefore, deciding if and when to resect pancreata with suspected PanIN lesions is

contentious. IPMNs are grossly visible cystic lesions with direct communication to the main or branch

pancreatic ducts. The mutational spectrum of IPMNs differs somewhat from that of PanINs and invasive

adenocarcinoma, suggesting an alternate path of development.12 Main-duct IPMNs are associated with up

to 40% risk of malignant transformation and usually are resected, especially if they are growing and/or

larger than 3 cm, demonstrate mural nodularity on imaging, or are associated with main duct dilation.13

However, branch-duct IPMNs are more challenging to manage as their natural history is less clear. They

are associated with up to 15% risk of malignancy, and most authorities recommend resection if the

branch-duct IPMN exceeds 3 cm in size or has mural nodules or other suggestion of malignancy, but it is

unclear what to do with smaller lesions since most branch-duct IPMNs remain unchanged over long-term

follow-up.13,14 Since IPMNs are often multifocal, patients who undergo subtotal pancreatic resections

would need to continue surveillance for potential cancer recurrence. MCNs are rare, mucin-producing

cystic lesions not directly communicating with the pancreatic ducts and with a distinctive ovarian-type

stromal epithelium.15 They only account for approximately 1% of pancreatic cancers, but if detected they

should always be resected because they have a 40% chance of malignancy and have a 100% cure rate if

the MCN is resected before invasive carcinoma develops whereas the cure rate is only 50-60% if cancer is

present at time of resection.15

Molecular analyses have identified a variety of genetic, epigenetic, and genomic alterations in pancreatic

adenocarcinoma. The most common genetic mutation is Kras2 activation, present in 90-95% of cases; it

also appears to be one of the earliest changes that promote tumor development, as evidenced by its

presence in 36% of PanIN-1A and the fact that mice engineered to express the activated KrasG12D mutant

develop PanIN-like lesions and eventually invasive pancreatic carcinoma.7 Kras2 is a well-established

proto-oncogene, part of the RAS family of GTP-binding protein which are involved in proliferation, cell

survival, cytoskeletal modeling, motility, and other cellular functions.16 In pancreatic cancer, activating

mutations primarily occurring in codon 12 cause constitutive activation of the intracellular signal

transduction function of the expressed protein. This constitutive signaling appears to be necessary for

maintenance of pancreatic cancer, in addition to initiating its development.17 Other oncogenes activated

Page 16: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

3

in pancreatic cancer include BRAF18, AKT219, cMYC17, and EGFR17. Moreover, constitutive activation of

the Hedgehog developmental signaling pathways has also been implicated in the development of

pancreatic cancer. The mammalian Hedgehog signaling pathway appears to play a critical role in

developmental patterning and mature tissue homeostasis, and it has been observed to be dysregulated in

many cancers, including pancreas.20 In fact, Hedgehog signaling activation appears to be one of the

initiating events in pancreatic cancer, as evidenced by ligand overexpression in PanINs21 and IPMNs22

and the fact that Hedgehog signaling cooperates with KrasG12D mutant in mouse models to promote

development of PanINs.23 Hedgehog signaling also appears to be important in regulating metastases.24

While the KrasG12D mutation is necessary for development of pancreatic cancer in mice, latency to tumor

development is significantly shortened if additional inactivating mutations of the tumor suppressor genes

TP53, p16, or BRCA2 are added.25 All three tumor suppressor genes, along with others, have been

identified in pancreatic adenocarcinoma. Inactivating mutations (homozygous deletions, intragenic

mutations plus loss of second allele, or epigenetic silencing) of p16 are found in approximately 90% of

tumors.26 This gene is a well-known tumor suppressor that codes for a cyclin-dependent kinase involved

in inhibiting progression through the G1-S checkpoint of the cell cycle. TP53, the “guardian of the

genome”, is involved in maintenance of genomic stability, apoptosis, and activation of DNA repair

(among its many functions), and is inactivated in 50-75% of pancreatic cancers (almost always via

intragenic mutations coupled with loss of the second allele).27 Another tumor suppressor gene commonly

inactivated in pancreatic cancer (in about 55% of cases) is SMAD4, a critical signaling intermediate in the

transforming growth factor (TGF)-beta pathway, providing selective growth advantage to affected cells.28

Patients who undergo resection and whose pancreatic cancer has loss of SMAD4 function have worse

prognosis than age- and stage-matched patients without SMAD4 mutations.29 Other tumor suppressor

genes inactivated at a lower frequency (5-10%) include BRCA2, STK11, TGFBR1, and TGFBR2.26 Of

note, p16 inactivation appears to be a relatively early event in tumor development, as it is detectable in

PanIN-2 lesions, whereas TP53, SMAD4, and BRCA2 mutations are not seen until the PanIN-3 stage.7

Genomic instability is a hallmark of most solid tumors, including pancreatic cancer. The types of

genomic rearrangements commonly identified in pancreatic adenocarcinoma are reviewed elsewhere (see

“Literature Review - CNVs and Cancer”). Telomere shortening, which predisposes to end-to-end

chromosomal fusions and breakage during anaphase thus generating amplifications and deletions in the

daughter cell genomes, is a very frequent and early event in pancreatic cancer development, demonstrated

in over 90% of the earliest stage PanINs.30 It is believed that the inactivation of TP53 allows the survival

of the pre-invasive cells which develop a heavy burden of genomic instability as a result of telomere

attrition, permitting them to progress through the activation of oncogenes and inactivation of tumor

suppressor genes to invasive status.31 It should be noted that most invasive pancreatic cancers appear to

Page 17: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

4

reactivate telomerase, mitigating the degree of genomic instability and helping to stabilize the neoplastic

cells.32

In addition to genetic and genomic alterations, epigenetic silencing of tumor suppressor genes (via

methylation of CpG islands in the 5’ regulatory regions) is frequently observed in pancreatic

adenocarcinoma.33 Alternatively, hypomethylation of candidate oncogenes (which are overexpressed in

pancreatic cancer) has also been observed.34 MicroRNAs have also been implicated in pancreatic cancer

tumorigenesis, both as potential tumor suppressor as well as oncogenes.35 Furthermore, inflammation and

the tumor micro-environment appears to have a role in pancreatic tumorigenesis.36

Jones et al.37 examined the genomic profile of pancreatic adenocarcinoma in depth by sequencing the

coding regions of 20,661 genes in 24 pancreatic adenocarcinoma as well as hybridizing tumor DNA to a

high-resolution single nucleotide polymorphism (SNP) array to detect genomic rearrangements. The

authors identified 1,562 somatic mutations in 1,007 genes, of which 74.5% mutations were missense,

nonsense, small insertions/deletions, or splice-site/untranslated region (UTR) changes. The average

number of mutated genes per tumor (48) was much less than the number of mutations discovered in breast

cancer (101) or colorectal cancer (77) in previous studies, and one potential explanation given is that the

cells which initiate pancreatic tumorigenesis are likely to have undergone fewer divisions than tumor

initiating cells in breast or colorectal cancer. Gene-set analyses of the genes mutated in pancreatic cancer

identified 69 gene sets that were altered in most pancreatic tumors, of which 31 gene sets can be grouped

in 12 core signaling pathways with discernible functional relevance to neoplasia, which were affected in

67-100% of the pancreatic tumors. Notably, although the 12 core pathways were altered in almost all

cancers, the specific genes that are mutated in each tumor differed significantly across patients, aside

from the few frequently mutated genes discussed above.

These results emphasized the importance of the pathway approach to understanding tumorigenesis, and

suggest that successful anti-cancer therapy may depend more on targeting pathways rather than individual

genes. A subsequent study applied massively parallel sequencing to sequence the entire genome of

metastases from seven of the subjects included in the previous study.38 On average, two-thirds of

mutations detected in each metastasis were also present in the paired primary tumor and were called

“founders”, while the remaining mutations that were only identified in metastases were termed

“progressors”. Subclones that led to the development of metastases were identified within each primary

tumor. The authors devised a mathematical model for calculating the timing of different stages of

pancreatic cancer development and estimated that it takes an average of 11.7 years from the initiation of

tumorigenesis until the generation of the cell that develops into the parental clone; another 6.8 years were

estimated for the evolution into subclones with metastatic capacity, and 2.7 years until the death of the

Page 18: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

5

patient. It should be noted that most of the tumors in this study were not from familial cases, and tumors

with highly-penetrant germline predisposing mutations may follow a different evolutionary timeline and

pathway. Nonetheless, it appears that a significant window of opportunity for screening and curative

intervention exists, if it is possible to identify tumors before metastatic subclones develop.

1.3 Risk factors The list of putative risk factors for pancreatic cancer is long, with wide variability in degree of risk

conferred and strength of evidence for the association. Age is strongly correlated with increased risk of

pancreatic cancer, with the median age for diagnosis at 72 years and more than two-thirds of cases

occurring after age 65.2 Race is also a factor, with African-Americans having substantially higher rates of

pancreatic cancer than white, Asian, or Hispanic Americans.2 Perhaps the strongest association of a risk

factor exists for tobacco use, as numerous studies have demonstrated that smoking can double lifetime

risk and the estimated population attributable risk is 25%.39 Other risk factors with low-to-moderate

contribution to pancreatic cancer include alcohol consumption40, obesity40, occupational exposure to

certain chemicals41, long-standing diabetes mellitus42, and Helicobacter pylori infection43. However, only

smoking has been consistently associated with pancreatic cancer. Chronic pancreatitis is associated with

up to 13-fold increased risk in pancreatic cancer, and even higher risk in patients with hereditary

pancreatitis, caused by genetic mutations (e.g. PRSS1, SPINK1).44 Possible protective factors include

allergies45, Vitamin D intake46 (although this is contentious47), and consumption of citrus fruit48 and

“Mediterranean diet”49.

The role of germline genetic factors predisposing to pancreatic cancer is a subject of numerous studies

and ongoing collaborations. Polymorphisms in the following genes have been associated with increased

or decreased risk of sporadic pancreatic cancer: GCKR (odds ratio (OR) = 2.14 )50, IGF1 and IGF1R (OR

= 0.6-0.7)51, IGFPB1 (OR = 1.46)51, SSTR5 (OR = 1.62)52, [MGMT (OR = 0.6), PMS2 (OR = 1.44),

PMS2L3 (OR = 5.54)]53, HNF1A (OR = 1.16-1.22)54, SDF1 (OR = 2.74)55, [FTO (OR = 1.12), MNTR1B

(OR = 1.11), MADD (OR = 1.14)]56, ALDH2 (OR = 1.37)57, HK2 (OR= 0.68 in diabetic/3.69 in non-

diabetic)58, [PPARG (OR = 0.21), NR5A2 (OR = 0.57-0.77), ADIPOQ (OR = 0.67), GGT1 (OR = 1.86)59,

CASP9 (OR = 4.09-16.26)60, CAPN10 (OR = 1.57)61, p21 (OR = 1.70)62, CYP1B1 (OR = 0.67)63, CFTR

(OR = 1.4; OR = 1.83 if diagnosed under age 60)64, GSTP1 (OR = 3.09 if diagnosed under age 50)65,

CYP17A1 (OR = 0.63-0.77)66, PPARG in conjunction with high-dose Vitamin A (OR = 2.80)67, PTGS2

(OR = 1.34-1.63)68, MMS19L (OR = 0.7/1.34)69, IL1beta (OR = 2.0 for unresectable cancer)70, [LIG3 (OR

= 0.23), ATM (OR = 2.55)]71, IGF2 (OR = 0.07)72, [MTHFR (OR = 4.50), MTR (OR = 2.65), MTRR (OR

= 3.35) in heavy drinkers]73, MTRR (OR = 1.44-1.52)74, [FasL (OR = 0.35-0.73), CASP8 (OR = 0.56-

0.65)]75, NAT2 (slow-type, OR = 5.7)76, XRCC2 in smokers (OR = 2.32)77, ERCC2 in smokers (OR =

Page 19: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

6

0.46)78, [MTHFR (OR = 2.6-5.12), TYMS (OR = 2.19)]79, NAT1-rapid type (OR = 1.5)80, RNASEL (OR =

2.12-3.5)81, UGT1A17 (OR= 1.98-4.7)82, XRCC1 in smokers (OR = 7.0 in women/OR = 2.4 in men)83.

Pathways affected by those genes include diabetes mellitus type II and glucose metabolism, insulin

growth factors, somatostatin, DNA repair, tumor growth, alcohol metabolization, obesity, glutathione

metabolism, cytochrome P450, cystic fibrosis transductance regulator, fatty acid storage,

cyclooxygensase-2, nucleotide excision repair, inflammation, folate metabolism, cell cycle and cell death,

and toxin detoxification. Many of the aforementioned studies suggest gene-environment interactions.

To date, four genome-wide association studies (GWAS) of pancreatic cancer have been published: two

related GWAS were conducted on subjects drawn from 12 cohort studies and 9 case-control studies

(mostly of European ancestry)84-85, a study performed in a Japanese population86, and the most recent

study was in a Chinese population.87 While SNPs in several loci were observed to be associated at

sufficiently low p-values to suggest statistical significance (7q36-SHH, 15q14-gene desert)84, (13q22.1-

near KLF5 and KLF12,1q32.1-NR5A2, 5p15.33-CLPTM1L-TERT)85, (6p25.3-FOXQ1, 12p11.21-BICD1,

7q36.2-DPP6)86, (21q21.3 – BACH1, 5p13.1 – DAB2, 10q26.11 – near PRLHR, 21q22.3 – near TFF1,

22q13.32 – near FAM19A5)87, to date only one association has been successfully replicated in additional

studies: the ABO blood group locus at 9q34. In the GWAS by Amundadottir et al.84, the ABO locus was

identified as a potential associated locus in the initial phase of the study and confirmed in a replication

case-control set (odds ratio (OR) per non-O allele = 1.20). This association of non-O blood group with

pancreatic cancer risk was further replicated in other case-control studies (OR 1.33-2.4288, OR 1.3789, OR

1.4390, protective O-blood type OR 0.5391). Furthermore, Wolpin et al.92 reported a higher risk of

pancreatic cancer for carriers of the A(1) variant of the A-allele, which has a higher glycosyltransfrase

activity than the A(2) allele (OR 1.38). In addition, Risch et al.89 observed increased risk of pancreatic

cancer in non-O blood group subjects who are seropositive for H.pylori but negative for its virulence

protein CagA (OR 2.78). Analyses in non-Caucasian populations found similar risk effects of the non-O

alleles (OR 1.37-1.3993; OR 1.67-3.2894). Wang et al.95 also found evidence for an additive effect of A

blood type with Hepatitis B infection. It should be noted that the association of non-O blood type with

pancreatic cancer predates these GWAS; one of the earliest reports suggesting an association was in The

British Medical Journal in 1960.96 How blood type mediates pancreatic cancer risk and tumorigenesis is

unknown97, but it appears that approximately 20% of pancreatic cancers in European populations is

attributable to having a non-O blood type status.88

Higher-penetrant genes may also predispose to pancreatic cancer, as shown by the co-occurrence of

pancreatic cancer with several known cancer syndromes. The highest-known risk is associated with

Peutz-Jeghers syndrome (PJS), caused by germline mutations of STK11. This autosomal dominant

syndrome is associated with melanocytic macules on the lips and buccal mucosa, gastrointestinal

Page 20: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

7

hamartomas, and cancer. The lifetime risk of pancreatic cancer in PJS patients is up to 132-fold relative

to the general population, or about 66% by age 70.98,99 Another condition associated with up to 80-fold

higher risk of pancreatic cancer is hereditary pancreatitis, most commonly caused by mutations in PRSS1

in an autosomal dominant fashion (although SPINK1 mutations have also been implicated).100-101 Familial

atypical multiple mole melanoma (FAMMM) is an autosomal dominant syndrome characterized by

multiple nevi and increased risk of cancers, predominantly melanoma and pancreatic adenocarcinoma.

The primary genetic cause of FAMMM is mutations in CDKN2A/p16, and carriers (particularly of the

p16-Leiden founder) have up to 47-fold increased risk of developing pancreatic cancer.102 Some genes

that cause hereditary breast and ovarian cancer also raise risk of pancreatic cancer. To date, the gene

contributing to the largest proportion of hereditary pancreatic cancer is BRCA2, which is estimated to

raise lifetime risk of pancreatic cancer by 3.5- to -10-fold and accounts for up to 19% of high-risk

families103-107 (although the contribution of BRCA2 may be population dependent, as it appears to be

significantly lower in German, Korean, and Spanish populations108-111). Although most BRCA2 families

with pancreatic cancer also cluster breast and/or ovarian cancer, some families are characterized by

exclusive presence of pancreatic cancer112, and even apparently sporadic cases have been demonstrated to

carry deleterious germline BRCA2 mutations.113 Interestingly, while the BRCA2 locus was first proposed

to contain a cancer-associated gene via linkage to familial breast cancer,114 the localization of the gene

itself and suggestion of its tumor-suppressor role was facilitated by discovery of a homozygous deletion

at 13q12 in a pancreatic adenocarcinoma.115-116 Germline mutations of other Fanconi-anemia pathway

genes have been reported in pancreatic cancer families but the magnitude of risk associated with these

genes is unclear: PALB2 in ~0.9-4% of families117-120), BRCA1 in 2.6-4.4% of families121-122 (although

Axilbund et al. failed to find mutations in a series of 66 familial pancreatic cancer patients123), ATM in

2.4% of families124, and mutations in FANCC and FANCG have been reported in young-onset pancreatic

cancer subjects125 although these genes do not appear to contribute significantly to familial pancreatic

cancer.126-128

Several other syndromes associated with risk of pancreatic cancer include Lynch syndrome (caused by

mutations of the mismatch repair genes MLH1, MSH2, MSH6, PMS2 or TACSTD1-3’ deletion),129- 132 Li-

Fraumeni syndrome (caused by mutations of TP53)133, Familial Adenomatous Polyposis (caused by

mutations of APC)134, and cystic fibrosis (caused by mutations of CFTR)135.

However, the contribution of known genetic syndromes to the overall heritability of pancreatic cancer is

limited; approximately 10% of all pancreatic cancer cases appear to be familial or hereditary and most do

not have a known genetic explanation.136 Perhaps the earliest indications that a familial pancreatic cancer

syndrome exists were several case reports and case series in the 1970s and 1980s describing clusters of

pancreatic cancer in first- and second-degree blood relatives.(137-143). Subsequently, both retrospective

Page 21: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

8

case-control and prospective cohort studies suggested increased risk of pancreatic cancer in close relatives

of patients compared to the general population. (Table 1)

Table 1- Studies estimating risk of pancreatic adenocarcinoma in relatives of affected patients

Paper Type of Study

Description Risk of pancreatic cancer in relatives of patients

Ghadirian et al.144 Case-control 179 cases vs 179 controls (French Canadian)

OR in subjects with positive family history = 13 (p<0.001)

Fernandez et al.145 Case-control 362 cases vs. 1408 controls (Italian)

OR in FDR of affected cases = 3.0 (95% CI 1.4-6.6)

Silverman et al.146 Case-control 484 cases vs. 2099 controls (US)

OR in FDR of affected cases = 3.2 (95% CI 1.8-5.6)

Schenk et al.147 Case-control 247 cases vs. 420 controls (US)

OR in FDR of affected cases = 2.49 (95% CI 1.32-4.69)

Ghadirian et al.148 Case-control 174 cases vs. 136 control s (Canada)

OR in FDR of affected cases = 5.0 (p=0.01)

Inoue et al.149 Case-control 200 cases vs. 2000 controls (Japan)

OR in subjects with positive family history = 2.09 (95% CI 1.01-4.33)

Rulyak et al.150 Nested case-control

251 members of 28 families (US)

OR with each affected FDR = 1.8 (95% CI 1.1-2.7)

Cote et al.151 Case-control 247 cases vs. 420 controls (US)

OR in subjects with positive family history = 2.49 (95% CI 1.32-4.69)

Hassan et al.152 Case-control 808 cases vs. 808 controls (US)

OR in FDR of affected cases = 3.3 (95% CI 1.8-6.1); OR in SDR of affected cases = 2.9 (95% CI 1.3-6.3)

Jacobs et al.153 Case-control 1,183 cases vs. 1,205 controls (US,Europe,China)

OR in FDR of affected cases = 1.76 (95% CI 1.19-2.61)

Matsabuyashi et al.154

Case-control 577 cases vs. 577 controls (Japan)

OR in FDR of affected cases = 2.5 (p=0.02)

Coughlin et al.155 Cohort 1.1 million US RR for PC mortality in FDR of affected cases (males) = 1.5 (95% CI 1.1-2.1); (females) = 1.7 (95% CI 1.3-2.3)

Tersmette et al.156 Cohort Prospectively followed 150 FPC kindreds and 191 SPC kindreds from NFPTR

SIR in FPC relatives if 2 or more affecteds = 18.3 (95% CI 4.74-44.5); SIR in FPC relatives if 3 or more affecteds (56.6 (12.4-175) [no significant elevated risk in SPC relatives – SIR in FDRs = 6.5 (0.78-23.3)]

Hemminki et al.157 Cohort 10.2 million Swedish (21,000 PC cases)

SIR for children of affected cases = 1.73 (95% CI 1.13-2.54)

Klein et al.158 Cohort Prospectively followed 370 FPC kindreds and 468 SPC kindreds from NFPTR

SIR in FDRs of FPC affecteds = 9.0 (4.5-16.1) if 1 FDR affected, SIR = 4.5 (95% CI 0.54-16.3); if 2 FDRs affected, SIR = 6.4 (95% CI 1.8-16.4); if 3 or more FDRs affected, SIR = 32 (95% CI 10.4-74.7) [no significant elevated risk in FDRs of SPC affecteds, Sir =1.8 (95% CI 0.2—6.42) or spouses/unrelated relatives, SIR =2.4 (95% CI 0.06-13.5)

Jacob et al.159 Cohort 1.1 million (US) RR for PC mortality in FDR of affected cases = 1.66 (95% CI 1.43-1.94)

Brune et al.160 Cohort Prospectively followed SIR in FDR of FPC affected = 6.79 (95% CI

Page 22: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

9

1,718 kindreds from NFPTR

4.59-9.75) if 1 FDR affected, SIR = 6.86 (95% CI 3.75-11.04); if 2 FDRs affected, SIR = 3.97 (95% CI 1.59-8.2); if 3 or more FDRs affected, SIR = 17.02 (95% CI 7.34-33.5) Young-onset (< 50 years) in FDR associated with SIR=9.31 (95% CI 3.42-20.28); Late-onset (> 50 years) in FDR associated with SIR=6.34 (95% CI 4.02-9.51)

OR = odds ratio; 95% CI= 95% confidence interval; FDR= first-degree relative; SDR = second-degree relative; PC = pancreatic cancer; SIR = standardized incidence ratio; RR = relative risk; FPC = familial pancreatic cancer (at least 1 pair of affected FDRs); SPC = sporadic pancreatic cancer (no affected FDR pairs); NFPTR = National Familial Pancreas Tumor Registry at Johns Hopkins University (http://pathology.jhu.edu/pc/nfptr/index.php)

Segregation analysis of 287 families with an index case of pancreatic cancer recruited by Johns Hopkins

Medical Institutions supports the hypothesis that a major gene is involved in pancreatic cancer risk, with

the most likely model including the autosomal dominant inheritance of a rare allele.161 The degree of risk

is linked to the number of affected relatives, the degree of relation, as well as the age of onset of disease

in relatives. Three large cohort studies following kindreds recruited by the National Familial Pancreas

Tumor Registry (NFPTR) at Johns Hopkins Medical Institutes found risk in first-degree relatives (FDR)

of affected patients in families with at least one pair of affected first-degree relatives of 4.5-6.79 if only

one FDR is affected, 3.97-18.3 if two FDRs are affected, and 17.02-56.6 if three or more FDRs are

affected.156,158,160 Moreover, the younger the age of onset of cancer in the affected relative, the higher the

risk in first-degree relatives (hazard ratio (HR) 1.55 per decreased year of onset).160

It is not clear whether the average age of onset of pancreatic cancer is significantly lower in FPC, as many

studies found no difference in age of onset of disease between FPC and sporadic cases143,144,156,162,163 and

even the few studies that identified a difference found it to be rather small (65-68 yrs in FPC vs. 70 yrs in

SEER database).160,164,165 However, there is evidence for genetic anticipation in FPC families, with

members of each successive generation developing cancer on average 6-15 years younger than the

previous generation.166,167;168,169 There is strong evidence for gene-environment interaction in FPC,

particularly with respect to tobacco use; FPC kindred smokers developed pancreatic cancer a decade

earlier than non-smokers168 and the relative risk of developing cancer is approximately 19-fold that of the

average population in smokers from FPC families.158

In some cancer syndromes, there is a significant difference in survival between familial and sporadic

cases (e.g. colorectal cancer), but it is not clear that there is such a difference in FPC. Several studies

have found no difference in survival between sporadic and familial pancreatic cancer.143,164,170,171 Ji et

al.172 found that familial cases had worse outcome than sporadic cases (HR=1.37) in a Swedish Family

Cancer database, while Yeo et al.173 identified significantly worse survival in unresected FPC cases

compared to unresected sporadic cases but no significant difference for resected cases. Interestingly,

Page 23: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

10

recent anecdotal reports and small series of FPC patients with mutations in BRCA-related genes who were

treated with platinum-based chemotherapy, topoisomerase inhibitors, or poly-ADP-ribose-polymerase

(PARP1)-inhibitors suggest that this subset of familial cases may have good chemotherapy responses and

improved survival compared to sporadic cases.174-178

Aside from the difference in inactivation of BRCA-related pathway between familial and sporadic cases

(up to a fifth of FPC tumors vs. less than 10% in sporadic cases), there has been limited investigation into

molecular genetic and pathologic differences between familial and sporadic pancreatic cancers. Pancreata

from FPC subjects appear to have increased prevalance of precursor lesions (PanINs and IPMNs)

compared to sporadic pancreatic cancer.179,180 Studies analyzing the rate and genome-wide distribution of

loss-of-heterozygosity (LOH) have shown conflicting results: Abe et al.181 identified LOH at

approximately 50% of informative markers in 20 FPC tumors while a similar study in 82 sporadic tumors

found the average LOH rate to be 25%182, but a third study that used a SNP array to identify LOH in 26

pancreatic cancer cell lines found a rate of LOH similar to that in familial tumors (average 43%).183

Differences in LOH rates aside, the pattern of LOH across the genome appeared similar across all three

studies. Brune et al.184 analyzed familial tumors for Kras mutations, Tp53 and SMAD4 expression, and

methylation rate of seven genes previously shown to be hypermethylated in sporadic tumors, and found

no significant difference between familial and sporadic tumors.

Given all the evidence supporting the existence of at least one major gene explaining the heritability of

pancreatic cancer in high-risk families, much effort has been directed at attempting to identify the

responsible gene, including genetic linkage. Linkage analysis is a statistical tool which uses family-based

data and the likelihood of recombination between loci on a chromosomal arm to identify genomic regions

that appear to be transmitted to affected members of the family more frequently than by chance alone.

Since linkage analysis was successful in mapping the location of and facilitating the identification of

highly-penetrant genes in many cancer syndromes (e.g. APC in Familial Adenomatous Polyposis185;

BRCA1 and BRCA2 in Hereditary Breast and Ovarian Cancer syndrome114,186), this technique has been

applied to the study of FPC. Familial registries fostered the collection of high-risk families, and a large

North American consortium has pooled the resources of six major sites: the Pancreatic Cancer Genetic

Epidemiology Consortium (PACGENE).165 This National Institute of Health (NIH)-funded collaboration

includes the University of Toronto, Mayo Clinic, Johns Hopkins University, MD Anderson Cancer

Centre, Dana Farber Cancer Institute, and Karmanos Cancer Institute. Each site prospectively identifies

pancreatic cancer patients with a family history of at least two affected members. If a pedigree is deemed

suitable for linkage analysis (with the help of linkage simulation programs), probands are asked to

consent to contact their relatives for recruitment to the study. Consenting individuals complete

questionnaires about clinical and family history and provide blood samples for DNA extraction.

Page 24: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

11

Linkage efforts in FPC have yielded limited results. The linkage work by PACGENE is ongoing, but to

date no highly significant loci have emerged. Investigators at the University of Washington (not

connected to PACGENE) published results of a linkage analysis conducted in a single FPC family

(identified as “Family X”) characterized by four generations of affected members with an autosomal

dominant pattern of inheritance suggesting high penetrance, young age of onset (median age 43), and

concomitant endocrine and/or exocrine pancreatic insufficiency.187 Based on a genome-wide screen using

373 microsatellite markers, significant linkage with LOD (logarithm of odds) scores 4.56-5.36 was

identified on chromosome 4q32-34. Although other centres failed to find a significant association at this

locus in European188 or North American189 FPC kindreds, the University of Washington group

subsequently claimed to have pinpointed PALLD, coding for palladin, a cytoskeleton scaffold protein.190

They demonstrated a variant (P239S) that segregated only with the affected members of the family linked

to 4q32-34, and they further presented evidence of PALLD overexpression in premalignant and cancerous

pancreatic tissue. However, significant doubt has been cast on the likelihood that PALLD is the

responsible gene for FPC, or at least that it is a significant cause of this cancer syndrome. Due to the

large number of candidate genes in the 4q32-34 locus, Pogue-Geile et al.187 were unable to screen all

candidates for mutations in Family X. Rather, they used a custom expression microarray to analyze RNA

extracted from whole tissue PanIN in one of the affected members of Family X and in another 10 sporadic

pancreatic cancers. PALLD appeared to have the highest expression, and it was based on this finding that

this gene was sequenced in Family X. However, Salaria et al.191 used immunohistochemistry of 177

pancreatic adenocarcinomas to show that palladin overexpression was primarily localized to non-

neoplastic stroma, with 96.6% of tumors demonstrating overexpression in the stroma and only 12.4% of

tumors had overexpressed palladin in neoplastic cells. Furthermore, three studies of Canadian, US, and

European families found no deleterious PALLD mutations in any other FPC families. Zogopoulos et al.192

genotyped the P239S variant in 51 familial cases, 33 early-onset cases, and 555 controls and found only

one familial case diagnosed at age 74 (they did not have DNA available for the other family members)

and in one 91-year-old unaffected control. Slater et al.193 sequenced the locus containing the variant in 74

FPC families and found no mutations. Finally, Klein et al.194 performed sequencing on 92% of the coding

region of the entire PALLD gene in 48 FPC cases and found no deleterious mutations.

Since the PACGENE linkage study has not yet been completed, it is not known if any other loci will be

reliably linked to FPC. Some of the challenges associated with applying linkage analysis to FPC are: (1)

small number of affected individuals per family and rapid mortality, precluding recruitment and limiting

the number of meioses available to perform the analysis; (2) penetrance of the FPC gene(s) is likely lower

than in previously mapped hereditary cancer syndromes, reducing the power of linkage analysis; (3) there

is increasing evidence for locus heterogeneity in the etiology of FPC. To date, only BRCA2 has been

Page 25: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

12

shown to account for a substantial portion of familial cases, while all other identified genes appear to be

responsible for fewer than 5% of cases each. Locus heterogeneity is a significant confounder of linkage

analysis, and the lack of distinguishing phenotypic or pedigree characteristics among families makes it

very difficult to confidently separate cases that are likely caused by different genes; (4) reduction of

power in linkage analysis due to phenocopies. Given all these challenges, it is evident that other

techniques are needed in the effort to identify germline genetic alterations that predispose to FPC.

2. Copy Number Variation

2.1 Copy Number Variation – a novel paradigm Our understanding of the nature and degree of variation in the human genome has accelerated in the past

few years. Until recently, single nucleotide polymorphisms (SNPs) appeared to be the most frequent and

important source of genomic variation in humans. Significant efforts have been directed at identifying

and genotyping SNPs in different populations, and numerous disease association and linkage studies have

been conducted using SNPs as genomic markers. Yet, the development of higher-resolution genomic

scanning technologies has highlighted a previously under-recognized but clearly significant

submicroscopic structural variation in the human genome. Structural variants encompass copy-number

variants (CNVs) (defined as genomic segments which are present in variable copy numbers when

comparing two or more genomes) as well as inversions, novel sequence or mobile element insertions, and

translocations.195 The original definition of CNVs used 1,000 base pairs as a lower-limit size threshold, to

differentiate from smaller “insertions/deletions”. However, more recently the spectrum of CNVs has

been expanded to include any variants larger than 50bp, reflecting the identification of smaller variants

using sequencing technologies.195

Although CNVs at certain loci had long been recognized as polymorphisms in normal individuals (e.g.

alpha-globin gene family; Rhesus blood group) as well as the cause of genomic disorders (e.g. Charcot-

Marie-Tooth neuropathy type IA; Williams-Beuren syndrome; Potocki-Lupski syndrome),196 the

ubiquitous presence of CNVs in normal human genomes first became apparent with the publication of

two genome-wide studies in 2004.197-198 Since that time, more CNV-detection surveys, with continually

improving genomic coverage and resolution, have reported thousands of CNVs affecting all human

chromosomes in apparently normal individuals.199-249 (See Table 2) While the number of known SNPs

(~11 million) exceeds that of CNVs, the proportion of genomic sequence that is different between any

two genomes due to indels/CNVs is approximately 12-fold that of SNPs (1.2% vs. 0.1%).238

Page 26: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

13

Table 2 - Summary of published studies reporting germline genomic copy-number variation in non-disease samples

Study (Year Published)

Population Primary CNV detection method

Reference genome

Source of DNA

Number of CNVs

Size of reported CNVs

Proportion of CNVs detected in > 1 sample

Number of CNVs confirmed within same study

CNV confirmation methods

Sebat et al. (2004)197

20 ethnically diverse individuals

aCGH: ROMA (85,000 probes, 35kb apart; Bgl II restricti-on enzyme)

12 samples (mostly from a single male sample); single ref per hybridizati-on experiment

Blood, sperm, cell lines

76 Average = 465kb

41% 11/12 FISH, hybridization to HIND III ROMA platform

Iafrate et al. (2004)198

55 ethnically diverse individuals (39 unrelated healthy controls + 16 individuals with known chromoso-mal imbalances)

aCGH: BAC array (2632 clones, 1Mb apart)

Pooled male or female normal samples

Whole blood + cell lines

255 Average = 150kb

40% 19/19 qPCR, FISH

Sharp et al. (2005)199

47 ethnically diverse individuals

aCGH: BAC array (2194 clones, targeting 130 segment-al duplicat-ion regions)

Single male sample

Cell lines

160 (represe-nt 119 regions if merge BACs <250kb apart)

Average BAC insert size = 164kb, some CNVs involve > 1 clone

55% 7/11 FISH

Tuzun et al. (2005)200

Single female NA15510 (fosmid library)

In-silico Fosmid end sequence pair mapping

NCBI reference human genome Build 35 (hg17)

n/a 297 Median = 15.7 kb (8-329kb)

n/a 16/57 33/40 7/11

BAC array (comparing 97 genomes) Sequencing of fosmid inserts PCR

Conrad et al. (2006)201

30 YRI trios + 30 CEU trios (HapMap)

In-silico: Assessm-ent of Mendeli-an inconsis-tencies in trios

n/a n/a 586 (396 in YRI; 228 in CEU)

YRI median = 8.5kb (0.5-1200kb) CEU median = 10.6 kb (0.3-404kb)

61% 92/105 qPCR, hybridization to custom high-density oligo array

McCarroll et al. (2006)202

269 HapMap individuals (4 ethnic groups)

In-silico: Analysis of Mendeli-an

n/a n/a 541 Median = 7 kb (1-745kb)

51% 90/541 FISH, allele-specific fluorescence measure, PCR, qPCR

Page 27: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

14

transmis-sion errors, HW disequili-brium, null genotyp-es

Hinds et al. (2006)203

24 ethnically diverse individuals (Discovery panel)

aCGH: High-density oligo custom array

NCBI reference human genome (build not indicated)

Cell lines

215 Median = 0.75kb (70bp – 10kb)

67% 100/215 PCR

Locke et al. (2006)204

269 HapMap individuals

aCGH: BAC array (2007 clones, targeting 130 segment-al duplicat-ion regions)

Well-characteriz-ed single male sample (GM15724)

Cell lines

384 (in 222 regions, if merge BACs < 250kb apart)

Average = 436kb (145kb-1.4Mb)

67% 136/207 Custom high-density oligo array

Mills et al. (2006)205

36 individuals (different ethnic groups)

In-silico: Computa-tional alignme-nt of DNA reseque-ncing traces from SNP studies to reference genome

NCBI reference human genome Build 35 (hg17)

n/a 294,498 2bp-9989bp

183/189 PCR, sequencing

Redon et al. (2006)206

270 HapMap individuals (4 ethnic groups)

aCGH: Whole Genome Tiling Path array (26,574 BACs) + SNP array intensity comparison: 500K SNP platform

Single male reference (NA10851) for aCGH; pairwise comparison between all samples for 500K

Cell lines

1447 merged CNVRs (913 on WGTP platform; 980 on 500K platform)

Average = 341kb (WGTP) 206kb (500K SNP)

~50% 173/1447 43% of all CNVs

Locus-specific quantitative assay Replicated on both platforms

Simon-Sanchez et al. (2007)207

276 well-phenotyped Cauasians, from NINDS study

SNP array intensity comparison: 1)109,365 gene-centric SNP array

Reference genotyping clusters (used in Illumina-specific CNV-detection algorithms)

Cell lines

340 ~20kb – 3Mb (for non-heteros-omic CNVs)

5 13/24

qPCR replication of CNV detection in DNA from whole blood

Page 28: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

15

2) 300K SNP array

Wong et al. (2007)208

95 samples (include healthy blood donors, cancer screening program participants, 16 distinct ethnic groups)

aCGH: BAC array (26,363 clones)

Single male reference

Whole blood, cell lines

3654 >40 kb 22% detected in >2 samples

265 Confirmed in 5 cases on oligo array

Levy et al. (2007)209

Single diploid genome of Craig Venter

In-silico: Random shotgun sequenc-ing, compari-son to NCBI reference genome aCGH: 244K oligo array; 385 oligo array; 2 different SNP array platforms

NCBI reference genome Build 36 for one-to-one mapping of insertions/ deletions Single male reference (NA10851) for aCGH and SNP array compariso-ns

Whole blood

919,584 indels (600 ≥ 1kb in size) + 62 CNVs

Indels = 1-82,711 bp (average 2.4-11.7bp) CNV (~8kb-2Mb)

n/a 37/40 indels

Comparison to fosmid clones from 8 other individuals

Korbel et al. (2007)210

2 previously analyzed female subjects: NA15510 (presumed European ancestry) and NA18505 (YRI)

In-silico: Paired-end sequence mapping (generat-ed by next-generati-on massive parallel sequenc-ing)

NCBI reference human genome Build 36

Cell lines

1175 total (422 in NA15510; 753 in NA18505)

Majority <10kb, but variants up to >1Mb detected

89% of 249 variants tested in individuals from 4 population

132/261 (NA15510) 328/616 (NA18505) 95 (NA15510) 97 (NA18505) 31/48 (NA15510)

PCR (+ sequencing breakpoints in a subset of amplicons) Also present in Celera assembly aCGH with oligo tiling arrays comparing NA15510 to NA18505

Pinto et al. (2007)211

506 controls of North German descent (PopGen study)

SNP array intensity comparison: 500K SNP array

Multiple references

Cell lines

1023 CNVRs (430 high-confiden-ce; i.e. detected by ≥ 2 algorith-ms)

Average size of “high-confiden-ce” CNVRs = 369kb

4% of CNVRs in >2% of population

217/1010 Overlap with CNVRs called in 269 HapMap samples analyzed with identical algorithms to PopGen

Wang et al. (2007)212

112 HapMap individuals (4 ethnic groups)

SNP array intensity compari-son: 550K

Reference genotyping clusters (used in Illumina-specific

Cell lines

2633 Average 31.5kb-61.2kb (depend-ing on ethnic

52.6-74.8% of CNVs were also detected in parents

Assumes high heritability of CNVs, compares to CNVs called in parents

Page 29: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

16

SNP array

CNV-detection algorithms)

group) 3 CNVs

PCR, re-sequencing of breakpoints

Zogopoulous et al. (2007)213

1190 controls from Ontario Familial Colorectal Cancer Registry (Canada); mostly Caucasian

SNP array intensity compari-son: 100K and 500K arrays

Multiple references

Blood 578 CNVRs

Average = 408kb (12bp – 4.5Mb)

< 7% are detected in >1% of population

4 qPCR

deSmith et al. (2007)214

50 males (north French origin)

aCGH--2-stages: 1) 185K oligo genome-wide array (in 35 individu- als) 2) custom high-density 244K array

Pooled references for 185K array; single female reference (NA15510) for 244K array

Blood 9244 multi-probe CNVs (1469 CNVRs) 6089 single-probe CNVs (4705 CNVRs)

Median 4.4kb

45% 90-95% of common CNVRs detected on 185K array 21

Replication on 244K array PCR, MLPA

Jakobsson et al. (2008)215

485 individuals, from 29 populations (Human Genome Diversity Project)

SNP array intensity comparison: Illumina Infinium Human HapMap 500 Beadchip

Reference genotyping clusters (used in Illumina-specific CNV-detection algorithms)

Cell lines

3552 (map to 1428 loci)

Average = 82.7kb (deletion) 130.4kb (duplication) (2kb-998kb)

Perry et al. (2008)216

30 HapMap individuals (4 populations)

aCGH: Custom oligo array (470,163 probes) targeting CNVs previously detected by Redon et al. (2006)

Single male reference (NA10851)

Cell lines

2664 (map to 1153 loci)

15-33% smaller CNVs than detected by Redon et al. (2006) in same sample

50% 23/51 Sequencing over breakpoints

Takahashi et al. (2008)217

80 healthy Japanese offspring of atomic bomb survivors

aCGH: 2238 BAC custom array

One male and one female Japanese

Cell lines

251 (mapping to 30 regions)

Average: 120kb (deletion) 160kb (duplication)

53% 14/14 rare CNV regions

qPCR, FISH, PGFE-Southern Blot, sequencing)

Wheeler et Single In-silico: (sequence Blood 163,608 (2bp- n/a Excellent aCGH

Page 30: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

17

al. (2008)218 diploid genome of James Watson

Next-generati-on sequenc-ing, compari-son to NCBI reference human genome + aCGH: 244K oligo array + 2.1 million probe array (3 experim-ents with 2 different referenc-es)

mapping) NCBI reference human genome Build 36 (aCGH) a) standard Caucasian male ref and b) NA10851

indels (by sequence compari-son) 23 CNVs (by aCGH)

38,896bp) 26kb-1.6Mb

concordan-ce in CNV calls when using same reference on different oligo arrays (data not shown)

experiments against NA10851 on 244k and 2.1 million probe arrays

McCarroll et al. (2008)219

270 HapMap

SNP microarray (Affy6.0)

270 HapMap

Cell lines

3048 CNVs (1320 CNVRs)

50% 27 loci qPCR

Cooper et al. (2008)220

9 HapMap SNP microarr-ay (Illumina)

Reference genotyping cluster

Cell lines

368 64-67% Fosmid sequence alignment date

Kidd et al. (2008)221

8 HapMap samples (4 ethnic groups)

In-sliico: Fosmid-end sequence pair mapping

NCBI reference human genome Build 35

Cell lines

7184 predicted non-redunda-nt CNVs

>6kb 50% 1471 MCD analysis (multiple complete restriction enzyme digest); High-density oligo arrays and SNP arrays; Correlation to SNP genotyping data for 130 deletions; Full-length sequencing of fosmid clones

Bentley et al. (2008)222

Single YRI male (NA18507)

In-silico: Paired reads of massive-ly parallel sequenc-ing

NCBI reference human genome Build 36

Cell line

4116 n/a

Wang et al. (2008)223

Single Asian male (Han Chinese)

In-silico: paired-end reads of massive-ly

NCBI reference human genome Build 36

blood 2474 Median = 492 bp

n/a

Page 31: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

18

parallel sequenc-ing

Gusev et al. (2009)224

3000 individuals from Kosrae island (Micronesia)

In-silico: Uses novel algorithm to identify gaps in “identity-by-state” stretches of SNP genotyp-es

215 52 Used other computational methods and compared to previous reports

Itsara et al. (2009)225

2493 SNP microarr-ays (Illumina)

Cell lines; blood

13,843 (map to 3476 CNVRs)

77% Cross-platform comparison (to CGH array)

Shaikh et al. (2009)226

2026 (1320 Caucasian; 694 African-American; 12 Asian-American)

SNP microarr-ay (Illumina HumanHap550)

Reference genotyping clusters (used in Illumina-specific CNV-detection algorithms)

Blood 54,462 (non-unique CNVs map to 3272 CNVRs)

Median = 8kb

77.8% 16/20 1753/2409 19/21

qPCR array-based comparison (affy vs illumina) comparison to previously published data of a HapMap samples (Kidd et al)

Kim et al. (2009)227

Single Korean male (AK1)

In-silico: paired-end reads of massive-ly parallel sequenc-ing and end-sequenc-es of BAC clones aCGH: custom 24M microarr-ay; SNP arrays

NCBI reference human genome Build 36 Reference for CGH arrays not identified

Blood, sperm

315 277bp-2Mb

n/a Sequence data complement-ed microarray data

Ahn et al. (2009)228

Single Korean male

In-silico: paired-end reads of massive-ly parallel sequenc-

NCBI reference human genome Build 36

Blood 2920 0.1-100Kb

n/a 2344 Detected in DGV (no direct confirmation)

Page 32: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

19

ing

Matsuzaki et al. (2009)229

90 HapMap YRI samples

aCGH: Custom oligonuc-leotide microarr-ays

Signal compared to normalized signal of all 90 samples

Cell lines

6578 Median = 4.9kb

3850 31/40 qPCR (also compared to findings of previous studies – 87-99.97% agreement))

McKernan et al. (2009)230

Single YRI male (NA18507)

In-silico: ABI SOLiD paired-end and split-reads (ligation-based sequenc-ing assay)

NCBI reference human genome

Cell line

565 2-937kb n/a n/a n/a

McElroy et al. (2009)231

385 African Americans and 435 White Americans

SNP array (Affy 500K)

50 African Americans females (derived from blood)

Cell lines + Blood

1362 in African America-ns + 1972 in White America-ns (map to 412 African-American unique CNVRs; 580 White-unique CNVRs; 76 shared CNVRs)

Mean duplicat-ion = 827kb; mean deletion = 703kb

174 CNVRs

3 loci qPCR

Conrad et al. (2009)232

Discovery in 40 females (19 CEU + 20 YRI + 1 diversity panel); genotyping in 450 HapMap

Discove-ry: Nimble-Gen 42M arrays Genotyp-ing: Custom Agilent 105k arrays; SNP array (Illumina Infinium Human660W)

Discovery: NA10851 Genotyping: pooled DNA of 10 European samples (9 males + 1 female)

Cell lines

11,700 Median = 2.7kb

49% 79/99 (qPCR) 15% FDR (microarray)

qPCR; other microarrays

Page 33: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

20

Alkan et al. (2009)233

3 individuals

Read-depth of massive-ly parallel sequenc-ing reads

Reference human genome

Cell lines

725 97% of all variants

17/25 aCGH FISH

Lin et al. (2009)234

813 Taiwanese individuals

Illumina 550K Bead-Chip

Reference genotyping cluster

Blood 4452 (map to 1025 CNVRs)

Mean = 497kb

365 CNVRs

279/365 CNVRs

Identified on Affy 500K array

Li et al. (2009)235

1000 Caucasians and 700 Han Chinese

SNP array (Affyme-trix 500K)

Half the samples were used as references for the other half and vice-versa

Blood 2381 Median = 195kb

27.6% 680/985 overlap DGV

Compared to DGV No experimental validation

Altshuler et al. (2010)236

1184 (HapMap3-11 populations)

SNP arrays (Affyme-trix 6.0 and Illumina 1M arrays)

Reference genotyping clusters

Cell lines

856 Median = 7.2 kb

All CNPs detected in ≥ 1% of population

n/a FDR of algorithms determined by comparing to CGH data for 34 individuals

Ju et al. (2010)237

Single Caucasian male (HapMap NA10851)

Data from previous aCGH studies that used NA10851 as reference + read-depth of NA10851 massive-ly parallel sequenc-ing

73 individuals (from Conrad et al, 2010 and Park et al. 2010)

Cell line

1309 Median = 2.7kb

n/a n/a n/a

Pang et al. (2010)238

Single diploid genome of Craig Venter

In silico: de novo assembly comparison; paired-end reads; split-reads aCGH: Agilent 24M + Nimble-Gen 42M arrays SNP arrays: Affyme-

NA15510 for Agilent 24M and NimbleGen 42M arrays

Whole blood

808,179 insertions or deletions (2641 ≥ 1kb)

(1-1.7Mb)

n/a 89/96 SVs identified by sequence analysis 20/25 CNVs identified by microarrays 11,140 SVs in common to this study and Levy et al

Compared to SVs called in previous analysis of same genome (Levy et al) PCR/qPCR

Page 34: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

21

trix 6.0 + Illumina 1M

Park et al. (2010)239

30 females (10 Korean; 10 HapMap Chinese; 10 HapMap Japanese)

aCGH: 24M custom Agilent arrays

Single male reference (NA10851)

Cell lines

20,099 (map to 5177 loci)

Median = 2.7kb (438bp-1.1Mb)

39% 106/116 loci

qPCR

Teague et al. (2010)240

NA15510, NA10860, NA18994

Optical Mapping (single-molecule restriction mapping)

NCBI reference human genome Build 35

Cell lines

5416 3kb-megabases

>1/3 all variants

42-61% (depends on platform being compared against)

Compared to fosmid-end sequencing, paired-end sequencing, SNP array (Affy6.0), tiling arary CGH

Kidd et al. (2010)241

9 HapMap individuals

Identifyi-ng fosmid-end clones that did not map to reference genome

NCBI reference human genome Build 35

Cell lines

2363 novel insertion sites (corresp-ond to 720 loci)

Median = 1kb (1-20kb)

192 loci Sequencing, genotyping

Page 35: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

22

Kidd et al. (2010)242

17 individuals

Capillary end sequenc-ing of fosmid clones

NCBI reference human genome Build 35

Cell lines

973 n/a n/a n/a n/a

Schuster et al. (2010)243

5 individuals

Read depth aCGH

NCBI reference human genome

Blood 187 n/a n/a n/a n/a

Yim et al. (2010)244

3578 Korean individuals

SNP array (Affy5.0)

NA10851 + pooled 100 Korean females

Blood 144207 (map to 4003 CNVRs)

Median 18.9kb

656 CNVRs in ≥ 1% of samples

14/16 loci qPCR

Gayan et al. (2010)245

801 Spanish individuals

SNP array (Affyme-trix 250 NspI array)

25 female samples from other studies

Blood 11,743 Median 150.7kb

623 CNVs present in >2 individuals

519 CNVs previously described

Comparison to DGV (no experimental validation)

The 1000 Genome Project Consortium (2010)246; Mills et al. (2011)247

Three pilots: (1) 3 trios from 2 families – deep sequencing (avg 42x) (2) 179 unrelated – low depth (2-6x) (3) deep sequencing

Paired-end mapping, read-depth analysis, split-read analysis, and sequence assembly of massive-ly parallel

NCBI reference human genome

Cell lines

14,327 50bp - ~1Mb

<10% FDR PCR aCGH

Page 36: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

23

of exons of 1000 genes in 697 individuals (avg >50x)

sequenc-ing

Chen et al. (2011)248

2789 individuals from three European populations

SNP array (Illumina Infinium Human-Hap 300)

Reference genotyping cluster

Blood 4016 (map to 743 CNVRs)

Mean = 205kb

406 649 CNVRs

Overlap with reported CNVs in DGV (no experimental validation done)

Moon et al. (2011)249

Discovery: 100 Korean individuals Genotyping: 8842 Korean individuals

aCGH array (NimbleGen 3 x 720K) + SNP array (Affy 5.0)

NA10851 Blood 8779 (576 CNVRs chosen for frequen-cy analysis)

Median length of 576 CNVRs = 113kb (1kb-4.56Mb)

807 CNVRs (576 chosen for frequency analysis in larger sample set)

66.7%-100% positive predictive values for 20 randomly chosen CNVRs

TaqMan assays

Studies listed in chronological order by publication date. CGH, comparative genomic hybridization; oligo, oligonucleotide; FISH, fluorescence in situ hybridization; ROMA, representational oligonucleotide microarray analysis; qPCR, quantitative polymerase chain reaction; BAC, bacterial artificial chromosome; YRI, Yoruba in Ibadan, Nigeria; CEU, Utah residents with ancestry from northern and western Europe; NCBI, National Centre for Biotechnology Information; PGFE, pulsed gel field electrophoresis; MLPA, multiplex ligation-dependent probe amplification

2.2 CNV Databases The Database of Genomic Variants (DGV) (http://projects.tcag.ca/variation/) was founded in conjunction

with the publication of the first few CNVs in 2004 by Sebat et al.197 and Iafrate et al.198, to catalogue

former and future discoveries of structural variants in the human genome. Curated by The Centre for

Applied Genomics (TCAG) in Toronto, the objective of this database is to summarize published data on

structural variation detected in healthy control samples, and it is periodically updated as new data

becomes available.198 At this time, the DGV presents data from each study separately, only merging

overlapping CNV calls (in the same direction) across samples within the same study. Moreover, calls

made by different platforms in the same study are also presented separately. Regions are displayed in

Page 37: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

24

relation to the human genome reference assembly (Build 35/May 2004 or Build 36/March 2006 or

GRCH37/Feb 2009). The latest version of the DGV (updated Nov 02, 2010) contains 101,923 entries

mapped to the human genome Build 36, corresponding to 66,741 CNVs >1kb (mapping to 15,963

genomic loci), 34,229 InDels (relative gains or losses between 100bp-1000bp in size), and 953 inversions.

Forty-two published articles are cited as the source of data in the DGV. A beta-version of the database

has been released (October 2011) which provides access to data in partner databases at European

Bioinformatics Institute (DGVa) and National Centre for Biotechnology Information (dbVar). The DGVa

repository has been the primary supplier of data to the DGV. dbVar includes structural variants from

multiple species and also includes data from clinical studies (non-healthy populations). Future

submission of CNV data will be managed by DGVa and dbVar, while the role of DGV will be to

manually curate and visualize selected studies to allow better interpretation of the clinical significance of

CNVs.

Clinically significant CNVs (mainly those linked to genomic syndromes) are catalogued in DECIPHER250

(DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources,

https://decipher.sanger.ac.uk) and ECARUCA251 (European Cytogeneticists Association Register of

Unbalanced Chromosome Aberrations, http://umcecaruca01.extern.umcn.nl:8080/ecaruca/ecaruca.jsp).

In addition, there are several data sources for copy number alterations that are detected in tumors or

cancer cell lines. Those include The Wellcome Trust Sanger Institute Cancer Genome Project252

(http://www.sanger.ac.uk/cgi-bin/genetics/CGP/conan/search.cgi) and the Pancreatic Expression

Database253 (http://www.pancreasexpression.org/).

2.3 Discovery and Genotyping of CNVs A variety of platforms and algorithms have been applied for CNV detection, with a wide range of

resolution, coverage, and signal-to-noise ratio, resulting in significant non-overlap in the CNVs detectable

between different platforms used to study the same samples. The earliest studies mapping CNVs in the

human genome were based on flourescent in situ hybridization (FISH) and spectral karyotyping and were

limited in resolution to variants of large size (>500kb), most of which were associated with disease.254

Later, genome-wide CNV mapping became possible with array comparative genomic hybridization

(aCGH), a technique involving competitive hybridization of flourescently labeled DNA samples from two

sources on a single array that contains immobilized target DNA sequences and use of computational

algorithms to analyze the hybridization ratio of the test and reference samples. The DNA targets on the

arrays originally comprised Bacterial Artificial Chromosome (BAC) clones but later were made of long

oligonucleotides.195 Early CGH arrays were of low resolution (typical CNV size detectable by these

platforms was greater than 100kb), and they significantly overestimated the true number of bases affected

Page 38: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

25

by CNVs.197,198 Later, high density oligonucleotide tiling CGH microarrays became available, allowing

more accurate determination of CNV breakpoints and detecting many more CNVs of smaller size.232 One

important consideration in the use of CGH arrays for CNV detection is the reference sample. In any

given aCGH experiment, it is not possible to distinguish between a copy number loss on the test sample

versus a gain on the reference sample in the same region (or vice-versa), since both scenarios would

generate the same hybridization signal ratio. Moreover, a loss or gain present in both samples would be

entirely missed (since the signal ratio would appear to be 1). Ideally, the reference sample genome should

be well characterized using a variety of methods, and the same reference sample should be hybridized

against all test samples in an experiment to allow better comparison of the results. To date, several

individuals have had their genomes extensively mapped and have been used repeatedly in CNV studies

(HapMap NA10851, NA18507, NA15510).

Another type of microarray used for CNV detection is the SNP array. Originally designed to genotype

SNPs for genome-wide association studies, these arrays contain multiple probes corresponding to each

selected SNP, and a single test DNA sample is hybridized to each array. Various computational

algorithms have been developed to analyze the hybridization intensity data to estimate copy number at

each SNP location, and the two primary methods are the Hidden-Markov-Model and Segmentation.

Earlier SNP arrays had lower resolution and coverage for CNV detection due to the nature of SNP

selection (focused on “tag SNPs” with minimal allele frequencies of ≥ 1% to maximize coverage of the

genome while minimizing cost, and avoiding SNPs in regions that increase genotyping error due to

violation of Hardy-Weinberg Equilibrium or Mendelian inheritance errors).206,213 More recent SNP arrays

from Affymetrix and Illumina not only have a higher density of SNPs distributed genome-wide

(approximately 1 million) but also include probes for known CNV regions, hence allowing discovery of

smaller CNVs and the genotyping of polymorphic CNVs.219,220 Compared to CGH arrays, SNP arrays

have the added advantage of SNP genotype information which can be used to detect CNVs (by analyzing

“B-allele frequency”, which represents the proportion of total allele signal that is represented by a single

allele) as well as provide information on loss-of-heterozygosity (LOH) and uniparental disomy (UPD).

Both CGH and SNP microarrays are limited by detecting CNVs that map to regions known in the

reference genome that was the basis for the microarray build. Moreover, neither of those platforms

distinguishes between tandem and interspersed duplications, and they tend to be more sensitive in

detecting deletions than duplications (due to a higher signal ratio differential between 2 and 1 copies vs. 2

and 3 copies, for example).195 Furthermore, even the highest resolution arrays available lose sensitivity in

genome-wide detection of CNVs smaller than 10kb.219 Sequence-based methods have become used

increasingly to bridge the gap in mapping the full extent of variability of the genome. Even in the early

days of CNV discovery, several CNV papers were published based on mining of genotyping errors 219-220,

Page 39: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

26

fosmid paired-ends200,221, and paired massively parallel sequencing of paired-ends of 3-kb fragments.210

Since then, many more studies have utilized the data from next-generation sequencing technologies to

identify CNVs, although there remain substantial bioinformatic challenges associated with analyzing this

data. The four main methods of using sequencing data to identify CNVs are255: (1) identifying read-pairs

whose mapping span is inconsistent with the reference genome; (2) identifying regions with significantly

increased or reduced read-depth compared to the distribution of read-depth across the (presumed diploid)

genome; (3) identifying “split-reads”, whereby there is a break in the alignment of a read relative to the

reference genome; (4) sequence assembly. To date the most commonly used method has been read-pair

mapping. All four approaches are limited in their sensitivity, specificity, and breakpoint accuracy

depending on read length, insert size, and physical coverage.

Future direction in CNV detection includes nascent technologies like optical mapping256, nanochannel

flow cells257, and emulsion picolitre droplet PCR258 that are being developed to allow high-throughput

detection of CNVs on an individual cellular and/or molecular level.

Multiple studies have demonstrated significant non-overlap between different platforms and algorithms

when analyzing the same samples.211,259 Given the variability in sensitivity and specificity of CNV

detection by the various platforms to date, validation is essential. Validation of detected CNVs has taken

two main forms in most studies: detection of the same (or overlapping) variants by different studies, and

replication within the same study (different array platform, PCR, qPCR, FISH, other experimental

methods). Overlap with regions identified in previous studies lends support to the variability of those

specific regions in the human genome, although many of the non-overlapping regions are also real (as

demonstrated by other replication methods). Similarly, replication on different platforms or with different

calling algorithms adds validity to detected CNVs in any tested sample, but regions identified by a single

approach can also be real. Experimental replication of CNVs provides the highest level of validation, but

those methods are often time-consuming and not optimized for high-throughput testing of multiple

regions and samples. As a result, most studies experimentally validated only a subset of their detected

CNVs (Table 2). However, high-throughput validation techniques have become available (e.g.

Sequenom©)260, so most CNVs published in the future should be confirmed more readily.

While most early CNV studies focused on variant discovery, determination of disease association with

specific CNVs requires accurate genotyping of the CNVs of interest. A number of techniques have been

employed for genotyping, including PCR based (e.g. PCR across breakpoints; quantitative PCR;

multiplex methodologies that assay multiple loci at once), SNP-array based (e.g. customizing arrays using

Illumina GoldenGate© assay for specific CNVs; using tag SNPs to impute common CNVs that are in high

linkage disequilibrium (LD) with the tag SNP), aCGH-based (e.g. customized high-density tiling arrays

Page 40: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

27

with probes for known CNVs), and sequencing-based (e.g. building a library of breakpoints discovered

and validated from previous sequencing-based studies and comparing future de novo sequences against it

to rapidly genotype CNVs in those locations; calibrating aCGH data using sequencing-based data to

obtain absolute copy numbers).195 Accurate genotyping is easier for deletions than duplications, and is

particularly challenging in multi-allelic regions.

2.4 Structure and mechanism of CNV formation Several mechanisms of genomic rearrangement have been identified predisposing to duplications and

deletions, driven by structural motifs in the genome. One of the earliest observations in CNV surveys

was the association of CNVs with segmental duplications.197,198,199,200,206,208,209,212 Segmental duplications

(also called low-copy repeats or duplicons) are genomic regions ≥ 1kb in size and with ≥ 90% sequence

homology, present in multiple copies and covering approximately 5% of the human genome.261

Segmental duplications, particularly those with 97% or greater sequence identity and less than 10Mb

distance between them, can cause misalignment of homologous chromosomes or sister chromatids and

mediate non-allelic homologous recombination (NAHR), thus producing genomic duplications and

deletions of regions flanked by the segmental duplications.262 In addition, segmental duplications

themselves may be CNVs if they are not yet fixed in the human genome and they vary in copy number

between individuals.199 Most recurrent CNVs appear to be caused by NAHR mediated by segmental

duplications.

However, not all CNVs are associated with segmental duplications and other mechanisms have been

implicated in CNV formation. Different repetitive elements found in the breakpoint junctions of CNVs

include Alu SINES, L1 LINES, and long terminal repeats.210,247 Other mechanisms associated with CNV

formation include non-homologous end-joining (NHEJ), retrotransposition events (otherwise known as

mobile element insertion, or MEI), Variable Nucleotide Tandem Repeat (VNTR) expansion/contraction

events, replication Fork Stalling and Template Switching (FoSTeS), and microhomology-mediated break-

induced replication (MMBIR).263 In some cases, a parental inversion may predispose to de novo

unbalanced variants in the children, such as in the example of 17q21.31 microdeletion syndrome.264

Multiple studies have noted certain genomic locations as “hotspots” for CNVs, including 6cen, 8pter,

15q13-14, 11q11, 19q13, and 7q11.197,212,210,221 Some regions, such as 8p23, appear to be hotspots for

recombination as well as sequence variation, containing an enrichment of both structural variants as well

as SNPs205,221. In a recent report analyzing next-generation sequencing data for 1000 Genomes project,

structural variants were found to cluster into hotspots by the mechanism of their formation, with VNTR

clustering near the centromeres and NAHR near the telomeres247. Possible explanations for genomic

Page 41: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

28

variation hotspots include: older evolutionary age of the target genomic segments; biological functional

effect of involved regions driving selective pressure to maintain diverse alleles; or complete lack of

functional importance and selective pressure.205,247

2.5 Population Genetics of CNVs Population genetics of CNVs are somewhat more complex than that of SNPs. Both forms of variation

may occur de novo or be inherited, but the de novo mutation rate for CNVs has been estimated to be 2-4

orders of magnitude greater than for single base mutations. Certain genomic regions are indeed

susceptible to recurrent rearrangements due to their structure (e.g. flanked by segmental duplications), but

when Mendelian inheritance was specifically investigated most common CNVs were indeed inherited

from a parent.219

Different studies have been differentially powered to detect common versus rare CNVs, thus yielding

conflicting data on the proportion of CNVs in the genome that are polymorphic (>1%). Earlier SNP

arrays and lower-resolution CGH arrays tended to be biased against common CNVs, so the majority of

CNVs identified using those platforms were rare in the general population. However, higher resolution

SNP arrays (such as Illumina 1M and Affymetrix 6.0) as well as very high-density CGH custom arrays

succeeded in detecting and genotyping a significant proportion of common CNVs over 1kb, and it is

evident that most of the variation between any two individuals at that resolution is due to common CNVs

that obey Hardy Weinberg Equilibrium.219,232 Sequencing based technologies have been identifying more

CNVs at a smaller size, and the data is a mix of rare and common CNVs.247

Most common CNPs are biallelic (with a bias for detecting deletions on the platforms used), and most of

those were found to be tagged well by SNPs of similar frequencies, suggesting that they are ancestral

events.219 CNPs that are in strong LD with tagging SNPs can be easily genotyped in association studies,

thus facilitating their study. However, SNP “taggability” depends on the frequency as well as density of

nearby SNPs, meaning that some CNVs of lower frequency or present in regions not populated by many

SNPs will need to be genotyped directly. The same is true for complex CNVs or CNVs that have

multiple copy number alleles, as those tend to be in poor LD with nearby SNPs as well.

Studies in populations of different ethnicities have suggested population differentiation in the frequency

of some CNVs, and some CNVs do appear to be population-specific.227-229,232 In keeping with the “out of

Africa” hypothesis, African populations have been found to have a higher number of rare or low-

frequency CNVs than non-African populations.229 These findings emphasize the importance of matching

the ethnicity of cases and controls in association studies to minimize spurious associations of population-

specific CNVs with disease.

Page 42: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

29

2.6 Phenotypic impact of CNVs The earliest known CNVs, usually large genomic deletions and duplications often encompassing many

genes, were invariably linked to significant genomic disorders. With the discovery of ubiquitous CNVs

in healthy controls, interpreting the functional significance of such genomic alterations became more

complex. Of note, many studies have observed a general bias against genic CNVs in general, and large

genic deletions in particular265,232, suggesting that genomic alterations negatively impact fitness and

undergo purifying selection. Interestingly, there is also some evidence of positive selection (or potentially

reduced purifying selection266) acting on some genes, such as the salivary amylase gene AMY1 which

appears in higher copy number in humans than in other primates and which is found in higher copy

number in human populations with high-starch diets relative to populations with traditionally low-starch

diets.267 Alternatively, many common CNVs have been identified at high frequencies in all human

populations and appear to have only a modest effect, if any, on phenotype.

Early CNV surveys identified a large number of genes as copy number variable, but care must be

exercised in interpreting those results given the propensity of those early platforms to overestimate the

size of CNVs, and hence the actual number and identity of involved genes reported in earlier studies may

be inaccurate. However, even more recent studies, with the power to identify smaller CNVs with more

accurate breakpoints, have detected thousands of genes that are affected at least in part by deletions or

duplications. For example, Pang et al.238 reported an extensive analysis of the diploid genome of Dr.

Craig Venter based on multiple microarray and sequencing platforms, and they identified 189 genes

completely encompassed by gains or losses and an additional 4,867 genes whose exons were impacted by

CNVs. While they did find an overall paucity of CNVs affecting genes associated with autosomal

dominant or recessive diseases, cancer syndromes, imprinted and dosage-sensitive genes, 573 of the CNV

genes were in the Online Mendelian Inheritance in Man (OMIM) database. Conrad et al.232 used a

discovery cohort of 20 CEU and 20 YRI HapMap individuals to detect common CNVs using a high-

density CGH array, then genotyped 450 HapMap samples at approximately 5,000 common CNVs. On

average, they found 445/1,098 CNVs overlapping 622 genes between any two individuals, and they

identified 2,698 genes affected by CNVs in the total sample set. Over half of partial gene deletions were

predicted to induce frameshifts, and 267 genes appeared to be affected by unambiguous loss of function

CNVs. Genes affected by CNVs appeared to be enriched for extracellular functions such as cell adhesion,

recognition, and communication, whereas they appeared to be biased away from intracellular functions

such as metabolic and biosynthetic pathways. These results extended those of previous as well as

subsequent CNV surveys, which also reported enrichment of immune and defense responses as well as

neurological system processes.239,268,247 Those latter functions are also proposed to have been involved in

the adaptive differentiation of humans and chimpanzees.269

Page 43: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

30

The exact contribution of CNVs to gene expression variability, and how they relate to SNPs, is unclear.

Stranger et al.270 interrogated the contribution of CNVs detected by Redon et al.206 on BAC-CGH array

and Affymetrix 500K array to gene expression variability in lymphoblastoid cell lines from 210 HapMap

samples (within a 2Mb CNV-gene), and found that 17.7% of 1,061 genes with expression variability were

associated with CNVs, with over half of the associations appearing to be long-range (i.e. the CNV did not

overlap the gene whose expression it appeared to impact). While 83.6% of variability was attributed to

SNPs, only 1.3% of genes were associated with both CNVs and SNPs. Schlattl et al.271 extended this

analysis of CNV-expression association by comparing normalized transcriptome data for lymphoblastoid

cell lines (LCLs) from 60 CEU and 69 YRI HapMap samples to CNV data published in the same samples

on multiple platforms (high-resolution tiling CGH array232, high-resolution SNP array219, and next-

generation sequencing data247). By concentrating on common CNVs and restricting to effect range of

200kb or less, they found a significant association between CNVs and the expression of 110 genes.

Despite an abundance of deletions in the CNV set, Schlattl et al.271 found enrichment of duplications

among CNVs associated with variable expression, suggesting purifying selection acting against deletions

that impact gene expression. While comparing results from this analysis to previously published studies,

the authors were able to confirm several CNV-gene expression associations, including 6/13 that were

identified by Stranger et al.270 within the same effect range. Most of the CNV associations (70%)

occurred without overlap of the CNV with the respective gene, although the range of effect appeared to be

<100 kb in most cases. Interestingly, several intronic deletions were associated with gene expression, but

expression was decreased in only half of the cases, whereas it was increased in the other half. Such a mix

of positive and negative CNV effect on expression was also observed for the CNVs which did not directly

overlap genes. CNVs that overlapped exons or completely encompassed CNVs usually affected

expression in the same direction as the copy number change. Unlike Stranger et al.270 , Schlattl et al.271

found that most CNVs associated with gene expression (70%) overlap previously published SNP-

expression associations. This discrepancy in overlap likely reflects the differences in CNV characteristics

detectable by earlier platforms (more rare than common CNVs, biased away from common SNPs) relative

to the platforms used by Schlattl et al.271 Conrad et al.232 proposed that since most common genotyped

CNVs were well tagged by SNPs, it would be expected that SNP-based genome-wide association studies

would have already screened most common CNVs for association with common diseases. Based on the

finding by Conrad et al.232 that less than 5% of trait-associated SNPs in 279 publications were in linkage

disequilibrium > 0.5 with a nearby CNV and the additional finding by the Wellcome Trust Case Control

Consortium that only three CNV loci reliably associated with one or more of eight common diseases (all

of which are tagged by SNPs that were previously detected in genome-wide association studies), the

authors of those papers argued that common genotyped CNVs do not explain a significant proportion of

heritability in common diseases. Nonetheless, the findings of Schlattl et al.271 indicate that a non-

Page 44: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

31

negligible proportion of CNVs associated with gene expression variability do not link to SNPs, and

moreover 57% of genes with expression associated with CNVs were found to have a greater correlation

with their most strongly associated CNV than with any nearby SNP. This was especially true for CNVs

that overlap exons (10/10). Other studies of CNVs in mice, rats, and Drosophila have observed similar

impact of CNVs on gene expression.272-274

Many diseases have been associated with CNVs. Recurrent de novo microdeletions and

microduplications are linked to many sporadic genomic disorders such as Williams-Beuren syndrome,

Angelman syndrome/Pradel-Willi syndrome, Charcot-Marie-Tooth disease 1A, and idiopathic mental

retardation.195 Rare CNVs (de novo or heritable) have been associated with neuropsychiatric disorders

such as autism spectrum disorder and schizophrenia; neurodegenerative diseases such as Parkinson

Disease275; and metabolic disorders such as obesity276, among others. Common heritable CNVs have

been associated with autoimmune and infectious diseases such as Crohn’s disease277, rheumatoid

arthritis278, diabetes mellitus278, psoriasis279, lupus280, and susceptibility to HIV infection281. Both rare as

well as common CNVs have also been associated with susceptibility to cancer, as discussed below.

Determining the pathogenicity of CNVs, and delineating the responsible gene(s) or genomic elements,

can be challenging. CNVs may affect phenotype in a number of ways, including: increasing or

decreasing copy of dosage-sensitive genes; disrupting genes or producing fusion genes; position effect;

unmasking recessive alleles; affecting communication between alleles on homologous chromosomes.264

The effect of CNVs is also moderated by variable penetrance and expressivity.264 Some CNVs have been

associated with a wide range of phenotypes (e.g. 1q21.1 has been associated with dysmorphic features,

cardiac abnormalities, learning difficulties, mental retardation, autism, and schizophrenia)282; this may

reflect ascertainment bias due to the study design (e.g. phenotype-driven vs. genotype-driven)264 but may

also reflect variability in expressivity. Some studies have also demonstrated buffering effect in cells,

whereby the observed expression level of a given gene does not correspond linearly to the expected level

based on copy number.271,272 It should be noted that in addition to copy number, the phase information

and genomic context of CNVs is also important for understanding the potential effect of the variant.264

Other challenges in CNV research include distinguishing germline from somatic alterations. Many

studies used DNA from immortalized lymphoblast cell lines, and it has become apparent that some

structural variants occur exclusively in or may be amplified by the Epstein-Barr virus (EBV)

transformation process.278,283 Moreover, few studies addressed the issue of somatic mosaicism or

heterosomy (variants present in only a fraction of cells in the tissue/blood sample), since most

platforms/algorithms are not designed to identify the “partial” nature of these regions, and few studies

compared the genomes of different tissues from the same individual.207,212,284 One survey of large

Page 45: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

32

structural variations in blood-derived DNA in 957 controls and 1,034 bladder cancer patients identified

mosaic structural variations in 1.7% of all individuals with no significant difference between cases and

controls.285 The regions most commonly found to be somatic or cell-line artifact are T cell receptors or

immunoglobulin genes, including loci at 2q11200, 2p11.2208,212, 22q11.2200,208,212, 14q32.3200,208,212, and

14q11.2212 as well as chromosomes 9 and 20.285 Interestingly, some studies identified copy-number

variation within monozygotic twin pairs, both phenotypically concordant as well as discordant,

suggesting post-twinning somatic development of CNVs.286,287,288

2.7 CNVs and cancer Chromosomal aneuploidy, whether involving entire chromosome, chromosomal arms, or segments of

chromosomes, is a characteristic feature of most solid malignant tumors. Chromosomal instability (CIN)

is the high rate of loss and gain of whole chromosomes and has been attributed to various mechanisms

that interfere with correct segregation of chromosomes during mitotic division.289 Chromosomal structure

instability (CSI) is another hallmark of most solid cancers, involving multiple chromosomal segmental

breakages and fusions associated with telomere shortening, inappropriate DNA repair of double-strand

breaks, and chromosomal fragile sites, resulting in amplifications or deletions of the involved genomic

regions. A “chicken-vs-egg” debate has revolved around the relationship of CIN and CSI with the

development of cancer: not all aneuploid cells are unstable or tumorigenic and certainly many copy

number alterations in tumors appear to be “passengers” rather than driver mutations. Nonetheless, there

is evidence for CIN and CSI in cancer development, such as generating LOH at loci of inactivated tumor

suppressor genes or amplified oncogenes.290 Two decades ago, comparative genomic hybridization

(CGH) was developed to facilitate identifying regions of copy number gain and loss by hybridizing

biotinylated DNA from paired tumor and normal samples to metaphase chromosome spreads. Several

years later, array-based CGH was introduced and became a commonly used tool in the study of cancer

genomes. Later, SNP microarrays also came into use, providing the added advantage of detecting regions

of copy-neutral LOH and uniparental disomy. Very recently, the drop in cost of whole-genome and

exome sequencing has allowed the use of these technologies to identify a wide range of variants in

tumors, from single base to large structural variants.

In keeping with the classical Knudson two-hit hypothesis for inactivation of tumor suppressors, a number

of well-known tumor suppressor genes were first identified by analyzing focal homozygous deletions in

cancer in combination with linkage and/or LOH results (e.g. CDKN2A/B, PTEN, WT1, BRCA2). Those

discoveries spurred the identification of numerous candidate tumor suppressors by characterizing

recurrent deletions in tumors or cancer cell lines. Mouse studies have even suggested that

haploinsufficiency of some cancer genes can be sufficient to cooperate with other oncogenic alterations in

Page 46: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

33

initiating tumor development (e.g. LKB1 and BRCA2 heterozygosity have been reported to accelerate

pancreatic tumor development in mice with activated Kras mutations). Similarly, genomic amplifications

in cancer can help identify candidate oncogenes. Moreover, some deletions and amplifications carry

prognostic significance (e.g. MYCN amplification in neuroblastoma, ERBB2 amplification in breast

cancer, 18q deletion in colon cancer), and whole-genome profiling of copy number alterations in tumors

can be diagnostic or prognostic (e.g. distinguishing gastrointestinal stromal tumors from

leiomyosarcomas291; aCGH classifier based on BRCA1-mutated breast cancer predicting sensitivity to

double-strand-DNA-break-inducing chemotherapy in patients without germline BRCA1/2 mutations292).

Structural rearrangements of pancreatic adenocarcinoma have been described in multiple studies, ranging

from cytogenetic karyotyping293 and microsatellite genotyping12,182,294 to CGH295-306 and SNP

microarrays37,307,308,309 to next-generation sequencing38. Certain patterns have emerged: all chromosomal

arms manifest genomic rearrangements, and the most frequently reported rearrangements are losses on

1p, 3p, 6p, 6q, 8p, 9p, 9q, 17p, 18q, 19p and gains on 8q. Some studies attempted to identify candidate

tumor suppressor genes or oncogenes, and while most results were of insufficient resolution to pinpoint a

target gene, certain genes were highlighted by multiple studies using a combination of genomic and

expression data (e.g. SMURF1 on 7q22.1301,303 and GATA6 on 18q11.2304,310 were proposed as novel

oncogenes.) LOH is a common event across the pancreatic cancer genome, often occurring in the form of

whole chromosome loss, and there was no significant difference in the pattern of LOH between sporadic

and familial tumors.12,182 One recent study that used massive parallel sequencing technology to detect

variants at fine resolution in 3 primary tumors and 10 metastases reported significant inter-patient

heterogeneity in the number, type, and distribution of rearrangements.38 Interestingly, one sixth of all

rearrangements were in a pattern they termed “fold-back inversions”, whereby regions are duplicated but

with the duplications facing in opposite directions. This appeared to be an early event in the development

of pancreatic cancer and is associated with telomere loss. Moreover, sequence analysis of metastases

indicated that this type of rearrangement did not continue occurring later in the pancreatic cancer

developmental pathway, suggesting a reactivation of telomere repair function. Other interesting findings

from this analysis of somatic rearrangements in pancreatic cancer metastases were: evidence of ongoing

clonal evolution in the primary tumor among cells capable of initiating metastases (based on identifying

finding some rearrangements only in some metastases), evidence for driver mutations involved in

metastatic spread (based on finding some rearrangements only in the metastases but not in the primary

tumor), and evidence for differences in evolution of metastases within each organ.

Less well studied than somatic genomic rearrangements in cancer is the relationship between germline

CNVs and cancer susceptibility. It is well known that moderate-to-high-penetrance rare germline CNVs

contribute to the heritability of familial cancer. Large germline genomic rearrangements that are absent

Page 47: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

34

or rare in healthy populations have been reported as the cause of 15% of Familial Adenomatous Polyposis

(APC)311, 19% of Von Hippel Lindau disease (VHL)312, 4% of Hereditary Diffuse Gastric Cancer

(CDH1)313, 2-12% of Hereditary Breast and Ovarian Cancer (BRCA1 and BRCA2)314-320, 6-27% of Lynch

Syndrome (MSH2 & MLH1 genes)321,322, 16% of Peutz-Jeghers Syndrome (STK11)323, and 15% of

juvenile polyposis (SMAD4, BMPR1A, and PTEN)324 cases. Deleterious germline CNVs have also been

reported in non-BRCA1/2 associated familial breast cancer (PALB2325; BARD1326), Hereditary

Leiomatomatosis and Renal Cell Cancer (FH)327, Cowden disease (PTEN)328, Familial Atypical Multiple

Mole Melanoma (CDKN2A)329, Neurofibromatosis Type 1 (NF1)330, Ataxia Telangiectasia (ATM)331, Li

Fraumeni syndrome (TP53)332, familial retinoblastoma (Rb)333, and Multiple Endocrine Neoplasia Type 1

(MEN1)334. Interestingly, there are examples of copy number alterations at a distance from the coding

region of a gene influencing its expression, whether by affecting regulatory elements or by inducing

epigenetic changes that inactivate the gene. For example, in approximately 20% of suspected Lynch

syndrome cases with MSH2 loss but no detectable germline mutations or rearrangements in MSH2335

(about 1-3% of all Lynch Syndrome patients336), the causative mutation is a large heritable deletion at the

3’ end of the TACSTD1 gene, which causes transcriptional read-through and epigenetic silencing of the

adjacent MSH2 gene. In one juvenile polyposis kindred with 10 affected members who had no mutations

or rearrangements in the coding regions of SMAD4 and BMPR1A, Calva-Cerqueira et al.337 identified a

large deletion mapping 119kb upstream of the coding region of BMPR1A segregating with disease. The

deletion affected a promoter of BMPR1A and was demonstrated to diminish expression of the gene.

Common copy number polymorphisms at some genes linked to cancer have also been associated with

modest risk. For example, the glutathione-S-transferases (GSTs) constitute a family of genes involved in

drug and toxin metabolism and are thus hypothesized to protect cells against xenobiotics and oxidative

stress. Two of those genes, GSTT1 and GSTM1, have polymorphic deletions shown to correlate with

lowered enzyme activity. In one recent study that accurately quantified the copy number of those genes

in approximately 2,000 cancer patients and 8,000 controls, a gene dosage effect was demonstrated in

GSTT1 for prostate cancer in men and corpus uteri cancer in women, and in GSTM1 for bladder cancer.338

Another interesting association between a common copy number polymorphism and cancer was identified

in familial breast cancer for a deletion that eliminates exon 4 of MTUS1, a gene implicated as a tumor

suppressor. Interestingly, the common deletion was found to have a protective effect against breast

cancer, suggesting that the exon 4 deletion may paradoxically increase the tumor suppressor activity of

the gene (although this has yet to be demonstrated in functional studies).339

All of the aforementioned germline rearrangements were identified in targeted studies, commonly

utilizing PCR-based assays, which specifically searched for and/or quantified deletions or duplications at

or near known cancer genes in high-risk populations. The discovery of predisposition germline

Page 48: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

35

rearrangements in cancer subjects without a priori knowledge of the region/gene of interest requires a

different approach. Most studies addressing this question have adopted two main strategies: genome-

wide CNV surveys in large cohorts of sporadic cancer patients and controls allow the identification of

statistically significant associations between common CNVs and a low-to-modest cancer risk;

alternatively, genome-wide CNV surveys in familial or hereditary cancer patients should facilitate the

detection of rare heritable CNVs (not previously published in controls nor present in a concurrently

studied control cohort) that potentially alter cancer genes and produce a modest-to-high risk of cancer.

Genome-wide case-control CNV association studies have identified candidate risk alleles for several

sporadic cancers: neuroblastoma in a Caucasian population (deletion at 1q21.1, OR=2.49, p=2.97 x 10-

17)340, aggressive prostate cancer in Caucasian populations (deletion at 2p24.3, OR=1.31, p=0.006;

deletion at 20p13, OR=1.17, 2.75 x 10-4)341,342, and nasopharyngeal carcinoma in Han Chinese males

(deletion at 6p21.3, OR=18.92, ).343 Most recently, Huang et al.344 identified a common 10,379bp

deletion at 6q13 that was found to be higher in frequency in sporadic pancreatic cancer Han Chinese

patients compared to controls, and confirmed via a qPCR assay to have an odds ratio of 1.31 for 1-copy

carriers compared to 2-copy carriers. All those studies replicated their results in a confirmation cohort

and used ethnicity-matched cases and controls, and all but Diskin et al.340 used a PCR-based assay as the

confirmation assay; Diskin et al.340 applied multiple correction testing to verify the statistical significance

of their results. Three of the identified CNVs overlapped genes: The neuroblastoma CNV overlapped a

novel transcript that demonstrated high sequence homology to the neuroblastoma breakpoint family

(NBPF) genes, was shown to correlate in expression with copy number, and was highly expressed in fetal

brains. The prostate cancer CNV at 20p13 differentially affects isoforms of the SIRPB1 gene, which

codes for a signal regulatory protein. The CNV at 6p21.3 encompassed MICA, a major histocompatibility

complex class (MHC)-A gene which functions to mediate natural killer (NK) cell activation and T-

lymphocyte costimulation and which has been associated with nasopharyngeal cancer in previous studies.

The pancreatic cancer CNV at 6q13 and the prostate cancer CNV at 2p24.3 are non-genic and are

hypothesized to impact risk through long-range regulatory effects on an unidentified gene. Indeed,

functional analysis of the non-genic deletion associated with pancreatic cancer suggested that it may be

involved in long-range regulation of CDKN2B, an established tumor-suppressor gene. While these results

are interesting, they remain to be further validated in future studies. Some analyses may be confounded

by inaccurate genotyping of the CNV of interest: for example, the Database of Genomic Variants has

reports of gains as well as deletions at several of these putative cancer-associated CNVs, suggesting that

they may not be simple biallelic variants. Moreover, previous studies of CNVs in Asian populations232,239

reported higher frequencies of the deletion at 6p21 in controls than was identified in the population

studied in the nasopharyngeal carcinoma study. This is particularly significant because the odds ratio

Page 49: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

36

identified for the 6p21 deletion (~19) was much higher than for any other common CNV or SNP

associations, and it may in fact be an overestimation if the deletion was undercalled in controls.

A few studies have been published surveying germline CNVs in familial solid cancer patients, and

although they have proposed several candidate predisposition genes based on overlap with patient-

specific CNVs, none to date have been able to show a significant contribution or segregation with disease

of any one gene to those cancer syndromes. One of the earliest studies analyzed 57 predominantly

Caucasian pancreatic cancer patients from 56 high-risk kindreds (each containing at least a pair of

affected first-degree relatives) using an oligonucleotide-based CGH platform, filtering out losses or gains

that were also identified in 607 mostly Caucasian controls (372 were analyzed in the same study, and 235

were previously reported in two other studies).345 Twenty-five losses overlapping 81 genes and 31 gains

overlapping 425 genes were identified specific to the cancer patients, and those genes were presented as

potential candidate predisposition genes. Due to lack of sufficient related samples, the authors were

unable to demonstrate heritability or segregation with disease of the patient-specific CNVs. Moreover,

the resolution of the CGH array used in this study was relatively lower than current platforms

(approximately 30kb), which resulted in relatively large CNV calls that likely overestimated the actual

breakpoint boundaries of rearrangements. Furthermore, the available control data available at the time of

publication was limited, so some of the supposedly familial pancreatic cancer (FPC)-specific CNVs were

identified in control populations in subsequent studies. The abstract of the paper refers to two deletions

that were observed in two different patients and one deletion that was observed in three different

individuals, yet no discussion of these regions is found in the main text of the manuscript. If such regions

were truly found to be recurrent in patients and absent in controls, they would be of particular interest as

candidate predisposition CNVs, but we cannot draw any conclusions given the paucity of information

provided.

Two other studies similarly provided a list of candidate genes in familial cancer. Yoshihara et al.346

compared 68 Japanese subjects with germline BRCA1 mutations (including 51 subjects with ovarian

cancer), 34 sporadic ovarian cancer patients, and 47 healthy controls, and they identified 31 CNVs

specific to the BRCA1-mutation group. All 31 CNVs overlapped genes, and three CNVs segregated with

ovarian cancer in affected members of the same family (of which two CNVs were present in two different

families each). No significant difference was found in the per-genome total number of CNVs between

BRCA1-mutation carriers and controls, although the number of deletions was higher in the BRCA1-

mutation subjects. Otherwise, they found no evidence for differential clustering of the global CNV data

between groups, and no correlation of age at diagnosis with CNV frequency. Since the BRCA1 gene was

already identified as the primary genetic mutation in this study, the list of genes overlapped by CNVs

represented potential modifying genes that may contribute to the unique biological characteristics of

Page 50: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

37

BRCA1-mutated ovarian cancer. Venkatachalam et al.347 studied 41 young-onset and/or familial

colorectal cancer with microsatellite-stable tumors and identified four losses and three gains in six

patients (one patient had a loss and a gain) which were not present in a large control cohort nor reported

in previous control studies. Each CNV overlapped at least one gene and each was detected in a single

patient only.

A study by Shlien et al.348 presented an intriguing perspective of the connection between germline CNVs

and somatic tumor development in TP53 germline mutation carriers. They studied 53 Li-Fraumeni family

members (20 with wildtype TP53, 23 with TP53 mutations and history of cancer, and 8 with TP53

mutations and no cancer) and 70 unrelated healthy controls, and demonstrated a significantly elevated

frequency of germline CNVs in the TP53 mutation carriers relative to controls with wild-type TP53.

There was also a trend for a higher frequency of germline CNVs in cancer patients carrying TP53

mutations relative to mutation carriers without a history of cancer, but this did not reach statistical

significance possibly due to the small sample size. Furthermore, not only was the number of individual

CNVs elevated in mutation carriers but the number of copy-number variable bases was also higher, even

when the absolute number of CNVs was not, due to a tendency toward larger CNVs in the TP53 mutation

cohort. Comparison between germline and choroid plexus tumor DNA in four patients identified 15/21

loci overlapping germline CNVs that became substantially larger in the paired tumors, and three of four

tumors had loci at which a germline hemizygous deletion had progressed to homozygous deletion. These

findings suggested a model of tumor development in Li-Fraumeni syndrome in which germline genomic

instability (manifested as a higher than average CNV frequency) predisposes to additional genomic

rearrangements and/or expansion of germline CNVs in somatic tissue, affecting genes that drive the

development of cancer. The authors also report a list of cancer-related genes overlapped by germline

CNVs in the TP53-mutation carriers which may act synergistically with the TP53 mutation in promoting

cancer development. Of course, the role of TP53 in maintaining the genome is well known349, and it is

not surprising to find that even non-malignant cells exhibit increased genomic instability in Li-Fraumeni

patients. However, it is unclear if this phenomenon applies to other tumor suppressor genes that

predispose to familial cancer. Future surveys of CNV burden in other cancer syndromes would shed

more light on this question.

3. Whole-Exome Sequencing The human genome is comprised of approximately 3 billion base pairs, of which less than 2% code for

proteins. The release of the first reference build of the human genome in 2003, after a 13-year

collaborative international effort, opened the door to significant advancements in understanding the

genetic and genomic makeup of individuals, populations, and cancers. The Human Genome Project

Page 51: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

38

expanded understanding of the identity and population frequency of SNPs, the most frequently occurring

variant in the human genome, and efforts to determine haplotype structure (blocks of SNPs present in

different combinations and segregating in populations) have accelerated progress in the fields of

population genetics, human evolution, and disease-gene associations.

The original sequencing effort was based on the technique developed by Fredrick Sanger in the 1970s,

utilizing labeled dideoxy trinucleotide triphosphates (ddNTPs) as DNA chain terminators and separating

terminated chains of various lengths by gel electrophoresis to determine base order in the sequence.

High-throughput requirements of the DNA sequencing effort drove the development of automated

capillary electrophoresis and other laboratory process automation. The International Human Genome

Sequencing Consortium (IHGSC) employed a “hierarchical shotgun sequencing“ approach that involved

fragmenting and cloning DNA (initially using yeast artificial chromosomes, then subsequently bacterial

artificial chromosomes), mapping clones on the physical map of the genome with the help of established

genomic markers, shot-gun sequencing clones, and finally aligning sequenced fragments to the

developing map.350 In the last few years of the IHGSC project, a competing effort undertaken by Craig

Venter’s company CELERA utilized a “whole genome shotgun sequencing” approach which was

considered by Venter to be more efficient and faster, although CELERA did end up incorporating

publicly available data that was generated by the IHGSC to allow accurate mapping of sequenced

fragments due to the difficulty of mapping to highly repetitive regions of the genome (which constitute a

large portion of the human genome) without the use of additional genome map information.350,351 The

approximate cost of sequencing the first reference human genome was $3 billion. Importantly, neither the

IHGSC nor the CELERA genomes was the sequence of a single diploid genome but rather each was a

haploid consensus sequence of DNA derived from several anonymous individuals of different ancestries

(although the IHGSC sequence was primarily based on a single male individual, and the CELERA

reference sequence may have included Craig Venter’s genome). Building on the data discovered from the

reference human genome, the International HapMap Project set out to identify common SNPs (defined as

minor allele frequency (MAF) >1% frequency, but most identified by this project have a MAF >5%) and

their haplotype structure in members of different populations.352 This important source of information

allowed the development of genotyping arrays for genome-wide association studies.

Only four years after the release of the nearly complete human reference genome, the first diploid human

genome sequence to be published belonged to Craig Venter, using the CELERA whole-genome shotgun

sequencing method, costing $70-100 million and was completed in about 4 years. (The cost estimate

incorporates costs incurred during the development of the CELERA reference genome).209 While this

sequence presented an interesting perspective on the makeup of individual genomes, it is also clear that

Page 52: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

39

many more genomes need to be sequenced before the full potential of genomic analysis and comparisons

among individuals can be realized.

Making whole-genome sequencing possible for many genomes required a dramatic reduction in cost and

increase in the speed of the process. To that end, the development of massively-parallel next-generation

technologies presented a breakthrough in genomics. Since publication of the first sequencing-by-

synthesis technology in 2005353, a number of different platforms have been developed. While they

employ different techniques of sequencing (Illumina and Roche/454 use DNA polymerase-based

sequencing-by-synthesis approaches while ABI SOLiD uses DNA ligase-based sequencing by ligation),

all are based on clonal cluster amplification of target molecules to generate a sufficiently strong signal.354

The first human genome to be fully sequenced by a massively-parallel platform belonged to James

Watson, co-discoverer of the DNA double helix.218 In a demonstration of the significantly increased

power of next-generation sequencers, the Watson genome was sequenced in 4.5 months and this effort

cost less than $1.5 million.355 Since then, many other individuals of different ancestries have been

sequenced.209,218,222,223,227,228,230,239,243,356,357,358,359 The 1000 Genomes project is an endeavour to sequence

the genomes of 2,500 unidentified individuals from 29 populations to discover, genotype, and accurately

identify haplotypes, with the overarching goal of characterizing 95% of variants with allele frequency of

1% or greater in genomic regions that can be sequenced by the most recently available next-generation

platforms.246 To date, three pilot projects have been completed: (1) low-coverage sequencing (2-4x) of

the whole genome of 180 individuals – provides data on 1% or higher frequency SNPs; (2) deep

sequencing (20-60x) of two mother-father-adult child trios whole genomes – allow quality control of data

from pilot project (1) and inferring haplotypes; (3) targeted capture and deep sequencing (50x) of ~8,000

exons from approximately 900 randomly selected genes -- to test the effectiveness of targeted capture

sequencing in identifying common, low-frequency, and rare variants in protein-coding regions of the

genome. The main project involves low-depth sequencing (4x) of the whole genome of 2,500 individuals

as well as deeper sequencing of their exomes by the target-enrichment method (See below for more detail

on exome sequencing).

Whereas the Sanger-based automated sequencers generated approximately 100 kbp of data per day on a

single machine, the earliest next-generation platform increased the output by two orders of magnitude and

this was very quickly surpassed by further developments of other platforms with larger output, and a

single sequencer in 2011 produces around 40 Gbp per day.360,361 An important distinction between

Sanger-based and next-generation sequencers is the read length: 700-1000 bp for capillary Sanger

sequencers compared to 75-400bp in next-generation sequencers, depending on the platform. The cost of

whole-genome sequencing has dropped significantly, currently as low as $5000-$10000. Interestingly,

while the cost of generating a genome sequence has dropped dramatically, the capacity to analyze the data

Page 53: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

40

has advanced less rapidly. Some challenges have included the inadequate adaptation of software

originally designed for alignment and variant calling of Sanger sequencing and the need for newer

validated software packages that can handle the significantly larger quantity of data that is generated with

newer platforms.362 The relatively short reads have also posed a problem for de novo genome assembly

and correct alignment to repetitive or highly homologous regions. In recent years, “third-generation”

sequencing methodologies have been introduced, characterized by the ability to directly sequence single

molecules without needing to amplify the template.363 Those newest methods of sequencing may address

some of the limitations of next-generation sequencers (e.g. they appear to generate longer reads

approximating the length obtainable by the Sanger capillary sequencers) but they have their own

challenges, such as higher raw read error rate from the single molecule sequencing approach. As such,

ongoing improvements in both sequencing technologies as well as bioinformatic tools will be necessary

to achieve the most cost-effective means of sequencing large numbers of genomes for disease gene

discovery and clinical diagnostic purposes. (I am not addressing other applications of next-generation

sequencing such as transcriptomics, epigenomics, and chromatin immunoprecipitation sequencing (ChIP-

seq) as they are outside the scope of this thesis).

The cost of whole-genome sequencing has not yet reached the promised “$1,000-genome” level that has

been identified as a goal for the genomic community, particularly if post-sequencing analysis cost is taken

into consideration; moreover, much of the information identified in a whole genome remains difficult to

evaluate in terms for functional impact on disease or phenotype since only 1-2% of the entire genome has

been annotated as protein-coding. Indeed, to date, several reports of whole-genome sequencing in disease

cases have been published but invariably they focus on coding region variants to identify candidate

causative genes.364-371 These two current limitations of whole-genome sequencing (cost and functional

annotation of the genome) have made exome-sequencing an attractive alternative for researchers. Exome

sequencing is based on capturing and subsequently amplifying and sequencing the coding region of the

genome using massively-parallel sequencing. Since the target region in exome sequencing is less than

2% that in whole-genome sequencing, it is possible to obtain much greater read-depth per base per run.

This means that more samples can be sequenced in the same amount of time and for the same price as a

single whole genome. A number of methods of target enrichment have been introduced, including both

solid-phase (e.g. Nimblegen Sequence Capture Human Exome 2.1M array) as well as in-solution

oligonucleotide arrays (e.g. Agilent SureSelect System).372,373 The latest arrays can capture up to 44-

50Mb of genomic sequence, encompassing most of the annotation of the Collaborative Consensus Coding

Sequence (CCDS 2009)374 database and flanking base pairs of target regions as well as microRNAs and

other non-coding RNAs. It should be noted that, although the coverage of exome sequencing for coding

Page 54: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

41

regions and adjacent regulatory sequences is excellent, it is not perfect and the success of capture varies

between arrays to some extent, as well as sequence-specific characteristics such as high GC-content.375

The first description of a human exome was based on the coding variants identified in the previously

published diploid genome of Craig Venter (HuRef).376 The authors reported that most nonsynonymous

SNPs are common (15-20% are rare and ~95% of the rare variants were heterozygous). They also

identified 105 premature-terminating codons, many of which are common and do not appear to be under

negative selection. They noted that many of these variants were present in duplicated genes and

hypothetical genes, suggesting that their impact in this setting may be less deleterious. They also noted

that half of all coding indels occurred in tandem repeats, and tended to occur at the C and N termini of

genes and/or near exon boundaries (which in some cases were considered likely mapping errors in the

reference genome). There was a bias toward indels composed of multiples of 3 bases (3n) in coding

regions that are likely to be functionally significant, suggesting purifying selection acting on frameshift

indels in those regions. Of additional importance, the authors noted that the Venter genome contained at

least 680 nonsynonymous SNPs affecting 443 genes with some association with disease, including 7 that

were in dbSNP and OMIM database, which foreshadowed the challenge that would be encountered in

interpreting the clinical significance of coding variants as more genomes and exomes are sequenced.

The first report of target-captured exome sequencing using next-generation sequencing was published in

2009 by Ng et al.377, describing the exomes of 8 HapMap individuals whose genomes were previously

characterized by sequencing fosmid-clones to identify structural variants. In addition, in a proof of

concept experiment, the exomes of four unrelated individuals with a rare autosomal dominant disorder

(Freeman-Sheldon Syndrome) caused by MYH3 mutations were sequenced to demonstrate a filtering

strategy that would identify the causative gene. The average depth of coverage was 51x, translating into

95% of coding bases in 78% of genes being successfully called (based on a threshold of ≥ 8x depth per

base required to reliably call a heterozygous variant). The estimated average number of truncating single

base variants per genome was higher in African than non-African genomes (20/African vs. 10/non-

African), and a similar ratio was observed for rare frameshift indels (17/African vs. 8/non-African). As

was observed in the Venter exome, most indels in coding regions were non-frameshift. To identify the

causative gene in the four Freeman-Sheldon Syndrome patients, the authors filtered variants to focus on

non-synonymous and/or splice-site variants or indels that were not previously reported in dbSNP or found

in the 8 HapMap exomes, and which were in the same gene in all four affected patients. This approach

reduced the number of candidate genes to precisely one, namely MYH3. A subsequent study applied the

same filtering strategy to successfully identify the unknown genetic cause of a rare autosomal recessive

Mendelian disorder (Miller Syndrome), the first of approximately 90 such studies to be published in quick

succession over a period of 24 months. (Table 3) Currently ongoing large-scale projects employing

Page 55: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

42

exome sequencing include the 1000 genomes project (which aims to sequence the exomes of ,2500

anonymous individuals) as well as the Exome Sequencing Project, which aims to discover variants

relevant to heart, lung, and blood diseases and has to date sequenced the exomes of nearly 5,400

individuals from multiple study cohorts (the project plans to sequence approximately 7,000 exomes).

Table 3 – Studies using exome-sequencing to identify genetic cause of disease

Authors Year Journal Disease Autosomal dominant or recessive (AD or AR)

Description

Vissers et al.378 2010 Nat Genet Mental Retardation Sporadic Studied 10 trios; identified de novo mutations as potential cause for unexplained mental retardation

Walsh et al.379 2010 Am J Hum Genet

Nonsyndromic Hearing Loss

AR Combined homozygosity mapping in consanguinous family with exome sequencing to identify DFNB82 as cause

Lalonde et al.380 2010 Hum Mut Fowler Syndrome AR Identified compound hets in FLVCR2 in two fetuses from consanguinous families

Pierce et al.381 2010 Am J Hum Genet

Perrault Syndrome AR Identified compound hets in HSD17B4 in two sisters

Ng et al.382 2010 Nat Genet Kabuki Syndrome AD Studied 10 unrelated affected subjects; identified MLL2 as cause

Bilguvar et al.383 2010 Nature Malformation of Cortical Development

AR Combined homozygosity mapping and exome sequencing in family with two affected members; identified WDR62 as cause

Gilissen et al.384 2010 Am J Hum Genet

Sensenbrenner Syndrome

AR Identified compound hets in WDR35 in two unrelated affected subjects

Krawitz et al.385 2010 Nat Genet Hyperphosphatasia Mental Retardation Syndrome

AR Performed identity-by-descent filtering on exome data to identify PIGV as cause in 3 affected siblings of nonconsanguinous family

Anastasio et al.386

2010 Am J Hum Genet

Van Den Ende-Gupta Syndrome

AR Combined homozygosity mapping with exome sequencing to identify SCARF2 as cause in 4 affecteds from 3 consanguinous families

Johnson et al.387 2010 Am J Hum Genet

Brown-Vialetto-van Laere Syndrome

AR Identified C20orf54 as cause in three affected siblings

Sirmaci et al.388 2010 Am J Hum Genet

Michels Syndrome AR Combined homozygosity mapping with exome sequencing to identify

Page 56: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

43

MASP1 as cause in 3 individuals from 2 consanguinous families

Haack et al.389 2010 Nat Genet Isolated complex I deficiency

AR Identified compound hets in ACAD9 in single affected individual

Wang et al.390 2010 Brain Spinocerbellar ataxia AD Combined linkage analysis with exome squencing in a Chinese family with 4 affecteds; identified TGM5 as cause

Musunuru et al.391

2010 NEJM Combined hypolipidemia

AR Identified compound hets in ANGPTL3 in 2 affected sibs

Johnson et al.392 2010 Neuron ALS AD Combined linkage analysis with exome sequencing in 2 affected relatives, identified VCP as cause

Bolze et al.393 2010 Am J Hum Genet

Autoimmune lymphoproliferative syndrome (ALPS)

AR Found homozygous variants in FADD

Liu et al.394 2011 PLoS One Moyamoa disease AD Combined linkage analysis with exome sequencing to identify RNF213

Zuchner et al.395 2011 Am J Hum Genet

Retinitis pigmentosa AR Identified homozygous variants in DHDDS

Glazov et al.396 2011 PloS Genet Anauxetic dysplasia-like condition

AR Identified compound hets in POP1

Worthey et al.397 2011 Genet Med Inflammatory bowel disease

AR Identified hemizygous variant on X chromosomes (XIAP)

Simpson et al.398 2011 Nat Genet Hajdu-Cheney Syndrome

AD Exome sequencing of 3 unrelated affecteds identified NOTCH2

Becker et al.399 2011 Am J Hum Genet

Osteogenesis imperfecta

AR Identified homozygous variants in SERPINF1 in 2 affected sibs

Ostergaard et al.400

2011 J Med Genet Primary lymphoedema

AD Combined linkage analysis with exome sequencing to identify GJC2

Caliskan et al.401 2011 Hum Mol Genet

Non-syndromic mental retardation

AR Combined homozygosity mapping with exome sequencing to identify TECR

Erlich et al.402 2011 Genome Res Hereditary spastic paraparesis

AR Combined homozygosity mapping with exome sequencing to identify KIF1A

Sundaram et al.403

2011 Ann Neurol Tourette syndrome/chronic tic phenotype

AD Identified OFCC1 as cause

Puente et al.404 2011 Am J Hum Genet

Hereditary Progeroid Syndrome

AR Identified homozygous mutations in BANF1

Vissers et al.405 2011 Am J Hum Genet

Chondrodysplasia and abnormal joint development syndrome

AR Identified homozygous variants in IMPAD1 in three affected unrelated individuals

Page 57: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

44

O’Sullivan et al.406

2011 Am J Hum Genet

Amelogenesis imperfecta and gingival hyperplasia syndrome

AR Combined homozygosity mapping with exome sequencing to identify FAM20A

Gotz et al.407 2011 Am J Hum Genet

Infantile hypertrophic mitochondrial cardiomyopathy

AR Identified compound heterozygous mutations in mtAlaRS

Shi et al.408 2011 PLoS Genet Myopia AD Identified mutations in ZNF644 in 2 relatives

Klein et al.409 2011 Nat Genet Hereditary sensory neuropathy with dementia and hearing loss

AD Combined linkage with exome data to identify mutations in DNMT1

Barak et al.410 2011 Nat Genet Malformations of occipital cortical development

AR Identified homozygous mutation in single affected child of consang parents

O’Roak et al.411 2011 Nat Genet Autism Sporadic Identified 11 de novo protein-altering mutations, some genes previously connected to autism

Alvarado et al.412 2011 Bone Joint Surg Am

Distal arthrogryposis type 1

AD Identified MYH3 as cause

De Greef et al.413 2011 Am J Hum Genet

Immunodeficiency, centromeric instability, and facial anomalies

AR Combined homozygosity mapping with exome sequencing to identify ZBTB24

Yamaguchi et al.414

2011 J Bone Miner Res

Primary failure of tooth eruption

AD Combined linkage with exome sequencing to identify PTH1R as cause

Zhou et al.415 2011 Hum Mutat Hereditary hypotrichosis simplex

AD Combined linkage with exome sequencing to identify RPL21 as cause

Le Goff et al.416 2011 Am J Hum Genet

Geleophysic and acromicric dysplasia

AD Identified FBN1 as candidate gene in 5 patients

Hanson et al.417 2011 Am J Hum Genet

3-M syndrome AR Combined homozygosity mapping with exome sequencing to identify mutation in CCDC8

Vilarino-Guell et al.418

2011 Am J Hum Genet

Late-onset Parkinson AD Identified mutation in VPS35

Zimprich et al.419 2011 Am J Hum Genet

Late-onset Parkinson AD Identified VPS35 as cause (different patients from Vilarino-Guell)

Sergouniotis et al.420

2011 Am J Hum Genet

Leber congenital amaurosis

AR Combined homozygosity mapping with exome sequencing to identify KCNJ13 as cause

Albers et al.421 2011 Nat Genet Gray Platelet Syndrome

AR Identified NBEAL2 as cause

Sanna-Cherchi et al.422

2011 Kidney Int Steroid-resistant nephrotic syndrome

AR Combined homozygosity mapping with exome sequencing in 3 affected sibs

Page 58: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

45

of consang parents to identify homozygous mutations in MYO1E and NEIL1

Liu et al.423 2011 J Exp Med Chronic mucocutaneous candidiasis disease

AD Identified mutations in STAT1 as cause

Yariz et al.424 2011 Fertil Seril Empty Follicle Syndrome

AR Identified homozygous mutation in LHGCR in 2 sisters

Xu et al.425 2011 Nat Genet Schizophrenia Sporadic Identified 40 rare de novo protein altering mutations in 40 genes (in 27 cases), including DGCR2, a gene in schizophrenia-predisposing region 22q11.2

Sirmaci et al.426 2011 Am J Hum Genet

KBG syndrome AD Identified ANKRD11 as cause

Shaheen et al.427 2011 Am J H um Genet

Adams-Oliver syndrome

AR Combined homozygosiy mapping with exome sequencing to identify homozygous mutations in DOCK6

Noskova et al.428 2011 Am J Hum Genet

Adult-onset neuronal ceroid lipofuscinosis

AD Identified 5 unrelated individuals with mutations in DNAJC5

Weedon et al.429 2011 Am J Hum Genet

Charcot-Marie-Tooth

AD Found DYNC1H1 as cause in 3 relatives

Ozgul et al.430 2011 Am J Hum Genet

Retinitis pigmentosa AR Identified homozygous mutation in MAK as cause

Doi et al.431 2011 Am J Hum Genet

Cerebellar ataxia AR Identified mutation in SYT14 as cause

Sloan et al.432 2011 Nat Genet Malonic and methylmalonic aciduria

AR Identified mutation in ACSF3 as cause

Aldahmesh et al.433

2011 J Med Genet Knobloch Syndrome AR Identified ADAMTS18 as cause

Murdock et al.434 2011 Am J Med Genet A

Recurrent polymicrogyria

AR Identified compound het mutations in WDR62 as cause in 2 sibs

Regalado et al.435 2011 Circ Res Thoracic aortic aneurysms leading to acute aortic dissection

AD Identified SMAD3 as cause

Dickinson et al.436

2011 Blood Dendritic cell, monocyte, B and NK lymphoid deficiency

AD Identified GATA2 as cause in 4 unrelated affecteds

Hor et al.437 2011 Am J Hum Genet

Familial narcolepsy with cataplexy

AR Combined linkage with exome sequencing to identify MOG as cause

Marti-Masso et al.438

2011 Hum Genet Early-onset generalized dystonia

AR Identified GCDH as cause in 2 affected siblings

Tariq et al.439 2011 Genome Biol heterotaxy AR Combined homozygosity mapping with exome

Page 59: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

46

sequencing to identify SHROOM3 as candidate cause

Takata et al.440 2011 Genome Biol Progressive external ophthalmoplegia

AR Combined homozygous mapping with exome sequencing to identify RRM2B as cause in patient from consang family

Theis et al.441 2011 Circ Cardiovasc Genet

Dilated cardiomyopathy

AR Combined homozygosity mapping with exome sequencing to identify GATAD1 mutations in 2 affected sisters

Pierson et al.442 2011 PLoS Genet Spastic ataxia-neuropathy syndrome

AR Identified AFG3L2 as cause in 2 brothers of consang family

Al Badr et al.443 2011 J Pediatr Urol Ochoa (urofacial) syndrome

AR Combined homozygosity mapping with exome sequencing to identify HPSE2 as cause in child of consang parents

Cullinane et al.444

2011 J Invest Dermatol

Oculocutaneous albinism and neutropenia

AR Combined homozygosity mapping with exome sequencing to identify two candidate genes (SLC45A2 and G6PC30

Ovunc et al.445 2011 J Am Soc Nephrol

Intermittent nephrotic-range proteinuria

AR Identified CUBN as cause in 2 sibs of consang parents

Bowne et al.446 2011 Eur J Hum Genet

Retinitis pigmentosa with choroidal involvement

AD Combined linkage analysis with exome sequencing to identify RPE65 as cause

Kitamura et al.447 2011 J Clin Invest Autoinflammation and lipodystrophy

AR Identified PSMB8 as cause in patients from 2 consang families

Tyynismaa et al.448

2011 Hum Mol Genet

Progressive external ophthalmoplegia with multiple mitochondrial DNA deletions

AR Identified TK2 as cause

Bjursell et al.449 2011 Am J Hum Genet

hypermethioninemia AR Identified ADK as cause

Zangen et al.450 2011 Am J Hum Genet

XX female gonadal dysgenesis

AR Combined homozygosity mapping with exome sequencing to identify PSMC3IP/HOP2 as cause

Galmiche et al.451

2011 Hum Mutat Mitochondrial cardiomyopathy

AR Identified compound hets in MRPL3 as cause in 4 affected sibs

Bredrup et al.452 2011 Am J Hum Genet

Ciliopathies with skeletal anomalies with renal insufficiency

AR Identified compound hets in WDR19 as cause

Saitsu et al.453 2011 Am J Hum Hypomyelinating AR Identified POLR3A and

Page 60: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

47

Genet leukoencephalopathy POLR3B as cause Clayton-Smith et al.454

2011 Am J Hum Genet

Say-Barber-Biesecker variant of Ohde syndrome

sporadic Identified KAT6B as cause in 4 individuals

Aldahmesh et al.455

2011 Am J Hum Genet

Ichthyosis, intellectual disability, and spastic quadriplegia

AR Combined homozygosity mapping with exome sequencing to identify ELOVL4 as cause in 2 individuals

Chen et al.456 2011 Nat Genet Paroxysmal kinesigenic dyskinesia

AD Identified PRRT2 as cause in 8 families

Logan et al.457 2011 Nat Genet Early onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD)

AR Identified MEGF10 as cause

Dauber et al.458 2011 J Clin Endocrinolo Metab

Severe infantile hypercalcemia

AR Identified CYP24A1 as cause

Shamseldin et al.459

2011 J Med Genet Split hand and foot malformation

AR Combined homozygosity mapping with exome sequencing in consang family to identify DLX5 as cause

Sergouniotis et al.460

2011 Am J Hum Genet

Benign Flack Retina AR Combined homozygosity mapping with exome squencing to identify PLA2G5 as cause

Berger et al.461 2011 Mol Genet Metabol

Early prenatal ventriculomegaly

AD Combined linkage with exome sequencing to identify AIFM1 as cause

Bhat et al.462 2011 Clin Genet Primary microcephaly

AR Identified WDR62 as cause

Wang et al.463 2011 Hum Mutat Leber congenital amaurosis

AR Identified ALMS1, IQCB1, CNGA3, MYO7A as candidates

To date, most successful exome-based studies were in monogenic Mendelian disorders. The first filtering

step in most studies was to exclude variants reported in dbSNP and any other exome data available to the

investigators. Depending on the version of dbSNP used and the number of available exomes, this step

usually eliminates at least half of the called variants. Furthermore, only variants that cause potential

protein change or truncation are included in the analysis (i.e. nonsynonymous single nucleotide variants;

splice-site variants; nonsense variants; and indels). At this point, studies diverge in their strategies,

depending on the nature of the condition being studied and the available samples for sequencing. A

notable characteristic of most exome studies published to date is that the diseases being investigated are

recessive (Table 3). This allows the application of homozygosity mapping or identity-by-descent analysis

to family data, or even simply filtering out all genes except those that have homozygous variants or

compound heterozygous variants in the exome samples. If multiple affected relatives and/or more than

Page 61: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

48

one family are available for a rare, fairly homogeneous condition, this strategy is very successful at

narrowing down the list of candidate genes to just one or at most a few genes. Even if only one sample is

available, it is possible to identify the causative gene for an autosomal recessive condition using this

method. For autosomal dominant conditions, where the causative variant is heterozygous, the use of

family linkage data can aid in significantly reducing the number of candidate genes. Alternatively, for

diseases caused by mutations in a single gene in most affected cases, identifying genes with novel

variants in more than one subject also helps pinpoint the causal gene. Additional filtering by predicted

effect of variants (using such tools as Polyphen-2464 (http://genetics.bwh.harvard.edu/pph2/index.shtml)

and SIFT465 (http://sift.jcvi.org/) and/or conservation scores (using PhyloP and GERP) may help in

ranking multiple candidate genes. However, those latter tools have their limitations and are often not

consistent in ascribing functional importance to the same variant. Some investigators have presented

statistical attempts at ranking variants and genes identified in such exome studies, but their applicability

and success rates are not known as of yet.468-470 Regardless, almost all studies provide further evidence in

support of the gene identified by sequencing the gene in other patients with the disease and/or presenting

functional analysis of the gene in the disease process.

The somatic genomes of many cancers have been sequenced, shedding light on important genes and

pathways involved in driving tumorigenesis and/or metastasis. The earliest of those involved a laborious

approach of sequencing coding regions exon-by-exon using the conventional Sanger method.37,471-472 The

first cancer genome to be sequenced using next-generation platforms was that of a cytogenetically normal

acute myeloid leukemia (AML)473; subsequently, additional genomes of AML474-475; breast cancer476-477;

lung cancer478-479; uveal melanoma480; colorectal cancer481; multiple myeloma482; hepatocellular

carcinoma483; hairy cell leukemia484; diffuse large B-cell lymphoma485; pancreatic neuroendocrine

tumor486; and gastric cancer487. An international collaboration under the auspices of the International

Cancer Genome Consortium (ICGC)488 is currently undertaking a large-scale integrative analysis of 50

different cancer types and/or subtypes at the genomic, epigenomic, and transcriptomic levels.

In addition to investigating the somatic genome of cancer, germline sequencing can help identify genes

that predispose to Mendelian cancer syndromes and/or familial cancer clustering. The first such study

used paired germline-tumor exome data to identify PALB2 as a new FPC gene in a patient who did not

carry mutations in known predisposition genes.117 The paired tumor variants allowed Jones et al.117 to

narrow the search down to genes that had a germline truncating mutation as well as a somatic “second-

hit” deleterious mutation, thus excluding all but three genes, two of which were previously reported to

have truncating mutations in healthy controls. Resequencing the full PALB2 coding region in a cohort of

96 FPC subjects identified an additional three families with protein-truncating mutations in the gene,

whereas truncating mutations in PALB2 are rare in control populations, further supporting PALB2 as an

Page 62: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

49

FPC predisposition gene. In addition, the function of PALB2, a partner of BRCA2 which is already

implicated in pancreatic tumorigenesis, provided further weight to this discovery.

Despite the success of this initial report, few familial and/or syndromic cancer exome studies have been

published to date. Two studies, investigating the cause of childhood classic Kaposi Sarcoma489 and

mosaic variegated aneuploidy490, were able to take advantage of apparently recessive inheritance to filter

the exome data and identify the causative genes. In the case of Kaposi Sarcoma, variants were filtered for

homozygosity, protein-altering effect, and absence in dbSNP129, 1000 Genomes, or 49 in-house exomes,

leaving only 1 splice-site variant and 11 missense variants. The splice-site variant affects a gene (STIM1)

that is also mutated in a recessive immunodeficiency syndrome, and given the previous link of Kaposi

Sarcoma to immunodeficiency, this was considered a strong candidate. The investigators of mosaic

variegated syndrome sequenced two siblings of non-consanguinous parents and attempted to identify a

gene with two loss-of-function mutations shared by both siblings (as compound heterozygotes).

Interestingly, they did not initially identify a single causal gene, and rather identified 12 genes with a

single loss-of-function mutation in common to the siblings. Focusing on a gene with a putative functional

connection to the disease (CEP57 -centrosomal localization), Snape et al. sequenced its full coding region

in both siblings and identified a second mutation, an 11-bp deletion that was not called in the exome data.

This highlights current limitations of sensitivity and specificity of exome analysis. Two additional

unrelated patients were also found to carry compound heterozygote mutations in CEP57.

Two studies of autosomal dominant hereditary cancer were able to harness the power of sequencing

multiple unrelated individuals or linkage analysis to narrow down the list of susceptibility gene

candidates. In a study of hereditary pheochromocytoma491, three unrelated patients were sequenced and

the variants filtered to only include heterozygous protein-altering mutations shared by all three subjects

and absent in dbSNP and 1000 Genomes data. This reduced the list of candidates to just two genes, of

which only one segregated with disease in the respective families (MAX). By demonstrating LOH at the

MAX locus and absence of MAX expression in tumors from the affected families, Comino-Mendez et al.491

presented strong evidence for the role of MAX as a tumor suppressor gene in pheochromocytoma.

Moreover, they identified five additional unrelated patients with mutations in this gene (2 truncating and 3

missense). To identify susceptibility genes for familial nodular Hodgkin’s lymphoma, Saarinen et al.492

used information from linkage analysis of a large family in conjunction with exome sequencing of one

family member to narrow the list of candidates with a deleterious mutation segregating in the affected

family members and not present in controls to one gene: a 2-bp deletion in NPAT. Further sequencing of

this gene in other unrelated patients identified no other rare deletrious mutations in NPAT but they did

find a common amino-acid deletion that seemed to be significantly more frequent in Hodgkin’s patients

than controls (4.2% vs. 1.1%, OR 4.11, p=0.018). Gene expression array demonstrated decreased NPAT

Page 63: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

50

mRNA in carriers of the 2-bp deletion. These findings, in addition to the fact that NPAT shares a putative

promoter with another known tumor suppressor gene (ATM) and is thought to have a role in cell cycle

regulation, suggest that NPAT germline mutations predispose to nodular Hodgkin’s lymphoma.

One of the promises of whole-genome and exome sequencing is the power to bridge the gap occupied by

low-frequency moderately penetrant variants in explaining disease heritability which until recently could

not be identified by family-based studies (because they usually do not segregate with disease) nor by

genome-wide association studies based on common SNPs.493 Such variants have been identified in the

past through candidate gene sequencing in cases, and require relatively large case-control studies to

demonstrate significant enrichment in the disease population. (e.g. BRIP1 in prostate cancer494; CHEK2 in

breast cancer495). With the increasing number of exomes or whole genomes being sequenced, it is

possible to capture those functional variants on a genome-wide level. For example, a recent report

describes whole-genome sequencing of approximately 450 Icelandic individuals then imputes the

genotype of detected variants in a large cohort of Icelandic ovarian cancer cases and controls, thus

identifying the most significant association to be for an intronic SNP in BRIP1. Subsequent fine-mapping

of the associated regions revealed a 2-bp deletion in exon 14 of BRIP1 that was in partial linkage

disequilibrium with the intronic SNP, and which had an odds ratio > 8 for ovarian cancer. Alternatively,

exome or whole-genome data itself may reveal the functional variant directly in family-based studies,

although the challenge lies in determining which non-segregating rare/low-frequency variant is causally

important. In a recent study by Yokoyama et al.496, whole-genome sequencing of a single member of a

large familial melanoma kindred identified over 400 germline variants, one of which was a missense

variant in a gene called MITF. Genotyping of this variant in the remaining family members demonstrated

non-segregation (only three of eight affected members carried the variant). However, due to interest in

the previously reported role of MITF in development of melanoma, the investigators genotyped this

variant in two large case-control cohorts and identified a significantly elevated frequency of the MITF

variant in cases, with an odds ratio of approximately 2, supporting the hypothesis that this low-frequency

variant is enriched in familial cases and confers a moderate risk of melanoma. In a similar study by Park

et al.497 in which members of four early-onset, multiple-case breast cancer pedigrees underwent exome

sequencing, a functionally interesting gene (FAN1) with two deleterious-predicted missense variants in

two families (one family segregated while the second did not segregate the variant) was identified, but

Parks et al.497 reported no statistically significant association of the variant with breast cancer in two case-

control analyses.

Page 64: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

51

Chapter 2 - Loss of Heterozygosity at BRCA1 Locus in Pancreatic Adenocarcinoma

The contents of this chapter have been published in Human Genetics 2008 Oct;124(3):271-8.

PMID: 18762988 [http://www.springerlink.com/content/9723278j89678256/] The final publication is

available at www.springerlink.com. (I am first author).

1. Abstract Although the association of germline BRCA2 mutations with pancreatic adenocarcinoma is well

established, the role of BRCA1 mutations is less clear. We hypothesized that loss of heterozygosity at the

BRCA1 locus occurs in pancreatic cancers of germline BRCA1 mutation carriers, acting as a “second-hit”

that contributes to tumorigenesis. Seven germline BRCA1 mutation carriers with pancreatic

adenocarcinoma and 9 patients with sporadic pancreatic cancer were identified from clinic- and

population-based registries. DNA was extracted from paraffin-embedded tumor and non-tumor samples.

Three polymorphic microsatellite markers for the BRCA1 gene, and an internal control marker on

chromosome 16p, were selected to test for loss of heterozygosity. Tumor DNA demonstrating loss of

heterozygosity in BRCA1 mutation carriers was sequenced, to identify the retained allele. The loss of

heterozygosity rate for the control marker was 20%, an expected baseline frequency. Loss of

heterozygosity at the BRCA1 locus was 5/7 (71%) in BRCA1 mutation carriers; tumor DNA was available

for sequencing in 4/5 cases, and three demonstrated loss of the wild-type allele. Only 1/9 (11%) sporadic

cases demonstrated loss of heterozygosity at the BRCA1 locus. Loss of heterozygosity occurs frequently

in pancreatic cancers of germline BRCA1 mutation carriers, with loss of the wild-type allele, and

infrequently in sporadic cancer cases. Therefore, BRCA1 germline mutations likely predispose to the

development of pancreatic cancer, and individuals with these mutations may be considered for pancreas

cancer screening programs.

2. Introduction As discussed in the Literature Review section of the thesis, identifying genes implicated in predisposition

to FPC is important for developing early-detection and prevention strategies as well as more effective

therapeutic options. Several hereditary syndromes due to mutations in tumor suppressor/caretaker genes

cause an elevated risk of pancreatic cancer. These syndromes contribute to a small proportion of familial

cases, and it is expected that other genes play an important role136. Both BRCA1 and BRCA2 were

initially identified as highly penetrant genes in familial breast and ovarian cancer, but germline mutations

of these genes are also associated with several other malignancies498. Studies of cancer risks in BRCA2

Page 65: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

52

germline carriers have reported a relative risk of 3.51 – 6.61 for pancreatic cancer498-500, and it is

estimated that BRCA2 mutations contribute to 6-19% of FPC cases103,121,501,502. Molecular genetic studies

have confirmed the role of BRCA2 inactivation in the development of pancreatic cancer115,503-507.

As with BRCA2, clinic-based studies have suggested an increased risk of pancreatic cancer in germline

BRCA1 mutation carriers508,509. There is also evidence for downregulation of BRCA1 expression in

sporadic pancreatic cancer tumors510. However, the aforementioned levels of evidence are much weaker

for BRCA1 compared to BRCA2. Inactivation of the wild-type BRCA1 allele in breast and ovarian cancer

most commonly occurs by loss of heterozygosity (LOH)511. We hypothesized that LOH at the BRCA1

locus occurs in pancreatic cancers of germline BRCA1 mutation carriers, acting as a “second-hit” event

contributing to pancreatic tumorigenesis. In this study, we compared the rate of LOH at BRCA1 in

pancreatic tumors in mutation-carriers and patients with sporadic pancreatic cancers.

3. Materials & Methods Ethical approval for this study was obtained from the Mount Sinai Hospital Research Ethics Board.

Microdissection and DNA extraction from formalin-fixed paraffin-embedded (FFPE) tissue, primer

design and optimization for sequencing, PCR amplification, and interpretation of genotyping and

sequencing results was performed by W. Al-Sukhni. Microsatellite genotyping and Sanger sequencing

was performed by the Analytical Genetics Technology Centre (AGTC) at Princess Margaret Hospital,

Toronto.

3.1 Tissue Specimens Germline BRCA1 mutation carriers were identified by: (1) clinic-based recruitment of incident cases of

pancreatic cancer at the University of Toronto, as described in a previous report by our group121; and (2)

population-based recruitment of pancreatic cancer cases through the Ontario Pancreas Cancer Study

(OPCS)45. BRCA1 testing was performed at provincial labs in most cases due to a strong history of

breast/ovarian; in one case, a BRCA1 mutation was identified by our research group as part of 102

unselected hereditary pancreatic cancer patients screened for several germline mutations. This latter

mutation was subsequently confirmed by testing in an offsite provincial lab121. All seven mutation

carriers included in this study had pathologically-confirmed adenocarcinoma of the pancreas. Pancreatic

tumor resection or biopsy specimens were obtained for all patients. Non-tumor tissue and/or blood

samples were also obtained for each patient. Microdissected, formalin-fixed paraffin-embedded samples

were prepared from each tumor (≥ 70% cellularity) and non-tumor specimen, and DNA was extracted

using the QIAmp DNA FFPE Tissue Kit, as per the manufacturer’s recommendations (QIAGEN Inc.,

Mississauga, Ontario, Canada). Blood lymphocyte DNA was extracted using standard Ficoll-Paque

Page 66: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

53

technique, as per the manufacturer’s recommendations (Amersham Biosciences, Baie d’Urfe, Quebec,

Canada).

Nine patients recruited through the clinic-based Familial Gastrointestinal Cancer Registry (FGICR)121

with newly-diagnosed pancreatic cancer and no known BRCA1 germline mutations or family history of

breast/ovarian syndrome were selected for comparison. Tumor and non-tumor/lymphocyte DNA was

similarly extracted for each patient.

All patients were deceased before this study was performed; tissue specimens were previously banked for

research after obtaining consent from patients or from family members.

3.2 LOH Assay Three microsatellite markers linked to the BRCA1 locus were used for LOH analysis: D17S855,

D17S1322, and D17S579. The first two markers are intragenic. (See Figure 1 for locations of

microsatellite markers on chromosome 17)

Figure 1 - Location of BRCA1 microsatellite markers on chromosome 17

Figure 1 Legend: D17S1322 and D17S855 are intragenic (in introns 19 and 20, respectively), while

D17S579 is distal to BRCA1. The distance in base pairs between markers is identified.

Primer pair sequences were published in previous studies576-578, and primers were purchased from

Invitrogen Canada Inc. (Burlington, Ontario, Canada). Primer sequences are listed in Appendix Table S1.

A microsatellite marker on 16p (D16S2616) was selected as an internal control. The expected allelic loss

rate on this chromosomal arm in sporadic and FPC is 20-25%.181,182

For each primer pair, a (FAM-6) 5’-labeled forward primer and an unlabeled reverse primer were used.

Platinum Taq DNA Polymerase from Invitrogen was used for polymerase chain reaction amplification.

For each reaction, 20-25ng of genomic DNA were amplified in 25 µL reaction volume containing 10X

Page 67: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

54

PCR buffer (Invitrogen Canada Inc.), 2mM MgCl2, 0.5µL of 10mM dNTP, 1-1.5µL of 10mM primers,

and 0.2µL of Invitrogen Platinum Taq DNA Polymerase. Initial denaturation was performed at 95°C x 2

minutes; followed by 35 cycles of (a) 94°C x 30 seconds, (b) primer-specific annealing temperature x 30

seconds, and (c) 72°C x 30 seconds; and final extension at 72°C x 5 minutes.

Automated DNA fragment analysis was performed using the ABI 3100 Prism sequencer (Applied

Biosystems), and GeneMapper Software version 3.7 was used to measure the allelic peak intensities. A

case was informative for a particular marker if two distinct alleles were amplified in the non-

tumor/lymphocyte DNA. Allelic peak ratio was calculated in informative cases as (T1/T2)/(N1/N2),

where T1, N1 = peak intensities for larger alleles; T2, N2 = peak intensities for smaller alleles; T = tumor

DNA; N = non-tumor or lymphocyte DNA (Figure 2).

Figure 2 - Sample electropherogram of microsatellite marker fragment analysis

Figure 2 Legend: T=tumor DNA; N=non-tumor/lymphocyte DNA; T1,N1=peak intensities of larger alleles;

T2,N2=peak intensities of smaller alleles; Allelic peak ratio = (T1/T2)/(N1/N2); LOH = 0.70 > allelic ratio > 1.43

An allelic ratio of < 0.70 or > 1.43 was considered evidence of LOH in tumor DNA. Results were

confirmed with at least 2 separate PCRs.

3.3 Tumor DNA Sequencing in BRCA1 Mutation Carriers For carriers of germline BRCA1 mutations who demonstrated LOH in their pancreatic tumors, the DNA

of the pancreatic cancer tissue was sequenced to determine if the wild-type or mutated allele was retained.

Since paraffin-extracted DNA was being amplified, unique primers were designed for each BRCA1

mutation to obtain amplification products < 110 bp. Appendix Table S2 lists primer sequences. Non-

tumor/lymphocyte DNA was sequenced for comparison for each case. Unlabeled primers were purchased

from Invitrogen. The ABI Prism 3130 XL Genetic Analyzer (Applied Biosystems) was used to perform

automated sequencing. The forward primer was used for sequencing, and results were confirmed by

sequencing two independently amplified PCR products for each sample.

Page 68: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

55

4. Results

4.1 Patient Characteristics Table 4 compares the characteristics of BRCA1 mutation carriers and sporadic pancreatic cancer patients.

Table 4 - Characteristics of BRCA1 mutation carriers and sporadic pancreatic cancer patients

Patient Characteristic BRCA1 Mutation Carriers (N=7)

Sporadic Pancreatic Cancer (N=9)

Gender (F:M) 0:7 4:5 Age at diagnosis with pancreatic cancer, years (mean +/- SD)

65.4 +/- 12.2

63.6 +/- 10.9

Ethnicity: (n;(%)) Ashkenazi Jewish

Caucasian Other

5 (71%) 2 (29%)

0

0

8 (89%) 1 (11%)

Source of specimen: (n;(%)) Whipple resection

Biopsy Autopsy

2 (29%) 4 (57%) 1 (14%)

6 (67%) 3 (33%)

0 BRCA1 mutation:

5382insC 185delAG 2318delG

3 3 1

N/A N/A N/A

Families with BRCA1 mutations demonstrated a history of breast +/- ovarian cancer, and four families

also had ≥ 2 pancreatic cancer cases (one of these cases has been previously reported)121. Most BRCA1

mutation carriers were of Ashkenazi Jewish descent, whereas we excluded patients with Jewish ancestry

from the sporadic cancer group due to the elevated prevalence of BRCA1 mutations in this population.

The two founder Ashkenazi Jewish BRCA1 mutations, 5382insC and 185delAG, were present in the

majority of mutation carriers (6/7 families). Table 5 summarizes the pedigree information for the seven

mutation carriers.

Table 5 - Pedigree summary for BRCA1 mutation carriers

BRCA1 mutation carrier ID

Ethnicity Mutation Age at diagnosis of PC (years)

Number of relatives with

PC

Number of relatives with

BC and/or OC

Tumors at other sites

BRC-1 AJ 5382insC 79 2 (brother, 1st cousin)

6 CRC

BRC-2 Caucasian 5382insC 57 1 (1st cousin) 5 -

BRC-3* AJ 5382insC 52 1 (son) 1 (sister; dx age 42)

-

BRC-4 AJ 185delAG 77 0 1 (daughter; dx age 39)

Prostate

Page 69: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

56

BRC-5 AJ 185delAG 76 0 3 Prostate

BRC-6 Caucasian 2318delG 51 0 6 -

BRC-7 AJ 185delAG 66 2 (sister, 1st cousin)

3 -

AJ = Ashkenazi Jewish; PC = pancreatic cancer; BC = breast cancer; OC = ovarian cancer; CRC = colorectal cancer *This patient did not have molecular testing to confirm mutation; his brother and son both have confirmed 5382insC mutation

The mean age at diagnosis was similar for the two groups: 65.4 years in mutation-carriers vs. 63.6 years

in sporadic patients. Three BRCA1 mutation carriers had a history of other malignancies: two prostate

cancer and one colorectal cancer. No sporadic cancer patient had a history of multiple primary tumors.

4.2 LOH Analysis All cases (BRCA1 mutation carriers and sporadic cancers) were informative for at least one BRCA1

marker. D17S855 was informative in 11/16 (69%) cases; D17S1322 and D17S579 were each informative

in 13/16 (81%) cases. The internal control marker D16S2616 was informative in 10/16 (63%) of all

cases. Two BRCA1 mutation carriers did not have enough tumor DNA to test for LOH with D16S2616;

tumor DNA from one sporadic cancer patient could not be amplified when testing for LOH with

D17S855.

Table 6 shows the LOH results for each case with each marker.

Table 6 - LOH results for BRCA1 mutation carriers and sporadic pancreatic cancer cases

BRCA1 Mutation Carriers

Sporadic Pancreatic Cancer Cases

Case ID

Marker

BRC 1

BRC 2

BRC 3

BRC 4

BRC 5

BRC 6

BRC 7

SPR 1

SPR 2

SPR 3

SPR 4

SPR 5

SPR 6

SPR 7

SPR 8

SPR 9

D17S855 + + + U + + U + U * - - U - - -

D17S1322 - + U - U + - + - - - - - U - -

D17S579 - U U - + + - + U - - - - - - -

D16S2616 U + * - * U - - - + - - - - U U

(+) = LOH [1.43 < allelic peak ratio < 0.70] (-) = No LOH [1.43 > allelic peak ratio > 0.70] (U) = uninformative sample (homozygous at the tested microsatellite marker in germline DNA) (*) = DNA unavailable for amplification/DNA did not amplify

Page 70: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

57

Ten cases in total were successfully tested with D16S2616, and only 2/10 (20%) demonstrated LOH.

Five of seven (71%) BRCA1 mutation carriers demonstrated LOH with at least one marker, whereas only

one of nine (11%) sporadic cancer cases demonstrated LOH with any BRCA1 marker (p = 0.035, 2-tailed

Fisher’s Exact test). In four of the five BRCA1-mutated cases with LOH, the allelic peak ratio was < 0.5

or > 2.0. (See Figure 3 for representative genotyping results).

Figure 3 - Three representative matched-pair electropherograms for microsatellite LOH

Figure 3 Legend: T=tumor DNA; N=non-tumor DNA. (a) and (b) represent LOH; (c) represents no LOH

The histopathologies of pancreatic tumors from BRCA1 mutation carriers were moderately- and poorly-

differentiated ductal adenocarcinoma, with no distinguishing pathologic characteristics of tumors with

LOH compared to tumors without LOH.

4.3 Sequencing to Identify Retained Allele in LOH Tumors Four of five BRCA1-mutation carriers demonstrating LOH had sufficient tumor DNA for sequencing.

Three cases (BRC-1, BRC-2, and BRC-3) had the 5382insC mutation, and one (BRC-6) the 2318delG

mutation. Three of four sequenced cases (BRC-2, BRC-3, and BRC-6) demonstrated loss of or decrease

in wild-type allele, while BRC-1 was inconclusive. (Figure 4 demonstrates a sample sequencing result)

Page 71: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

58

Figure 4 - Representative sequencing result for an individual with 5382insC germline BRCA1 mutation

Figure 4 Legend: T=tumor DNA; N=non-tumor/lymphocyte DNA. The top panel demonstrates

sequencing of two alleles in non-tumor DNA (mutant and wild-type allele); the bottom panel demonstrates only the mutant allele sequence in tumor DNA of the same individual.

Of note, patient BRC-3 who did not have molecular confirmation of the germline mutation was

successfully sequenced for the 5382insC mutation carried by his brother and son, confirming that he is a

carrier.

5. Discussion This analysis sheds light, at the molecular level, on the putative role of BRCA1 in pancreatic cancer

tumorigenesis. The importance of LOH as a “second-hit” in tumorigenesis is well-established in many

cancers. Since BRCA1 inactivation occurs via LOH in the majority of breast and ovarian tumors in

BRCA1-mutation carriers, we hypothesized that LOH also plays a primary role in inactivation of BRCA1

in mutation-positive pancreatic cancer. Indeed, we found that the majority of our mutation-positive

pancreatic cancer subjects (5/7) did demonstrate LOH in tumor DNA. In comparison, we found that only

1/9 sporadic cancer patients demonstrated LOH at the BRCA1 locus in tumor DNA. It is possible that the

remaining two subjects had inactivation of their wild-type allele by epigenetic methylation of the

promoter; promoter hypermethylation of the wild-type allele in a minority of BRCA1 mutation-positive

breast tumors has been previously reported512. Due to the limitations of quantity and quality of our

Page 72: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

59

paraffin-embedded specimens, we were not able to correlate LOH with decreased BRCA1 expression.

However, our sequencing results did confirm loss of wild-type in most of the cases with LOH, suggesting

that only the truncated protein product from the mutated allele would be expressed in those cases.

The link between BRCA2 mutations and pancreatic cancer is well-established, and most recommend

including this gene in mutational screening for high-risk pancreatic cancer individuals and their relatives.

However, the contribution of germline BRCA1 mutations to increased risk of pancreatic cancer is less

clear. Both BRCA1 and BRCA2 have important roles in the repair of double-stranded DNA breaks.513 A

number of anecdotal reports have described pancreatic cancer in association with BRCA1 mutations.514,515

Our group previously identified 38 individuals from a group of 102 pancreatic cancer patients who were

considered to have intermediate/high-risk families, of whom one Ashkenazi Jewish patient screened

positive for a deleterious BRCA1 mutation.121 A study by Tonin et al.516 screened 220 Ashkenazi Jewish

breast cancer families for BRCA1 and BRCA2 mutations, and reported pancreatic cancer in 11/91 families

with a BRCA1 mutation compared to 5 cases in 120 families without BRCA1 mutations. More recently,

Skudra et al.122 screened 90 consecutive Latvian patients presenting with pancreatic cancer and 640

controls for several germline BRCA1 mutations, including two Latvian founder mutations (5382insC,

4154delA) and two less common mutations (300T>G, 185delAG) in the BRCA1 gene. Four of 90 (4.4%)

pancreatic cancer patients were found to carry a BRCA1 mutation compared to 1/640 (0.15%) controls. It

was noted, however, that the rate of mutation in controls likely underestimates the true prevalence of the

founder mutations in the general Latvian population since control subjects were relatively older, hence

selecting against highly penetrant mutations.

Two large studies used family-based designs to study cancer risk at sites other than breast or ovary in

families with multiple breast/ovarian cancers or with young age of onset of breast cancer. There was some

overlap in the families used between the two studies, but different analytical methods were used.508,509,517

Both studies found a statistically significant association for pancreatic cancer, albeit lower than the

association with BRCA2: Brose et al.509 reported a three-fold increase in pancreatic cancer risk among

BRCA1 carriers (3.6%, compared to 1.3% estimated general population risk); Thompson et al.508 reported

a relative risk of 2.26 (95% CI 1.26-4.06) for developing pancreatic cancer in BRCA1 mutation carriers,

with a greater association in individuals diagnosed under age 65 (RR 3.10, 95% CI 1.43-6.70). One

limitation of these studies was the family-based design, which may overestimate cancer risks due to

possible confounding effects of other genetic and/or environmental factors shared by members of a

family. To circumvent this problem, Risch et al.498 performed a population-based study of 1171

unselected women from Ontario, Canada who presented with new-onset ovarian carcinoma. Subjects

were screened for BRCA1 and BRCA2 mutations, and information about other cancers in their first-degree

relatives was used to estimate cancer risk at other sites in mutation carriers, and compared to estimated

Page 73: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

60

cancer incidence rates in Ontario. Seventy-five BRCA1 mutation carriers were identified, and a relative

risk of 3.1 was calculated for pancreatic cancer; however, this was not statistically significant (95% CI

0.45-21).

More recently (and subsequent to completion of our study), Ferrone et al.502 published an analysis of

unselected Ashkenazi Jewish patients who underwent pancreatic cancer resection and found no significant

increase in BRCA1 frequency relative to the general Ashkenazi population (1.3% vs. 1.1%); however, the

BRCA1 mutation rate was based on previous reports and not directly assessed in a control cohort in this

study, and the authors acknowledged that the small size (145 subjects) may have resulted in insufficient

power to detect a statistically significant difference. Axilbund et al.123 did not find carriers of BRCA1

mutations in 66 FPC patients (defined as having at least two additional relatives with pancreatic cancer),

but most of the subjects did not report Ashkenazi Jewish ancestry. In the non-Jewish North American

population, the estimated frequency of BRCA1 mutations is 1/500-1/800518,519; this suggests that Axilbund

et al.’s study was underpowered to identify an association of BRCA1 with FPC unless the effect size was

at least 15-fold, a value exceeding the estimated risk of BRCA2. Kim et al.520 reported a statistically

lower age of onset for pancreatic cancer in BRCA1-mutation carriers than in non-carriers.

For our study, we identified seven unrelated individuals with pathologically-confirmed pancreatic

adenocarcinoma whose families have BRCA1 mutations. In all but one of these cases, a molecular

confirmation of the mutation was previously available. The patient without molecular confirmation had a

brother and son who carried the identical 5382insC mutation; we later confirmed the presence of the same

mutation in this patient when we sequenced his tumor DNA to identify the remaining allele. The age at

diagnosis of pancreatic cancer did not differ significantly between the mutation carriers and sporadic

cases; this is similar to findings of other studies.515,521 Though further studies are needed to definitively

determine if BRCA1 is associated with increased pancreatic cancer risk, current data suggests that the

penetrance of BRCA1 mutations for pancreatic cancer is lower than that of BRCA2.498 Moreover, some

studies have suggested that some pancreatic cancer patients with BRCA2 mutations may not have a family

history of breast or ovarian cancers.501,522 It is not clear if the same may be true for pancreatic cancer

patients with BRCA1 mutations; most studies to date have characterized families selected for breast or

ovarian cancer.

Possible sources of experimental artifact include contamination of microdissected tumor cells with

adjacent stromal cells and potential bias from PCR-based microsatellite assay. Measures to reduce the

impact of such bias included using microdissected tumor samples with minimum 70% cellularity, as

identified by an experienced pathologist), and confirming PCR-based results with at least two separate

PCR experiments. Since FFPE-specimens often yield DNA of variable quality as a result of nucleic acid

Page 74: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

61

cross-linking by the fixation process, we minimized potential bias from degraded DNA by selecting

primers for microsatellite markers that amplify small fragments (125-150bp). Due to the limitation of

available DNA, and the amplicon size restriction in selecting microsatellite markers, we were limited to

just three BRCA1 markers for our experiments. However, every sample produced informative results for

at least one marker, and most generated results for two or more markers. We also attempted to include an

internal control, an unrelated microsatellite marker at chromosome 16 with a previously reported LOH

frequency of 20-25%. Due to technical reasons and inadequate DNA for further testing, only three of the

seven familial samples successfully amplified this marker, with 1/3 demonstrating LOH. In comparison,

seven of nine sporadic cases amplified this internal control marker, with 1/7 showing LOH. Overall, 2/10

(20%) of samples showed LOH at this locus, consistent with previous reports. Although the inadequate

number of informative samples among the familial cases reduced the value of this control in our

comparison, our results remain valid given the confirmatory Sanger sequencing that demonstrated

decreased signal for the functional allele in tumors from samples that demonstrated LOH.

Our small sample size (seven germline BRCA1 mutation carriers with pancreatic cancer) reflects the

challenges inherent in studying a malignancy as lethal as pancreatic cancer, in which only 15% of cases

are resectable. To our knowledge, this is the first molecular genetic study investigating BRCA1 LOH in

pancreatic cancer of germline BRCA1 mutation carriers. Two previous studies have investigated BRCA1

in sporadic pancreatic tumors. Beger et al.510 used quantitative reverse-transcription PCR (qRT-PCR) and

immunohistochemistry antibody staining to analyze BRCA1 and BRCA2 gene expression in 13 normal

pancreas samples, 30 chronic pancreatitis samples, and 53 sporadic pancreatic adenocarcinomas. They

found decreased BRCA1, but not BRCA2, mRNA and protein expression in 50% of pancreatic cancer

samples, and also found decreased BRCA1 mRNA expression in chronic pancreatitis samples, whereas

normal expression was observed in normal pancreatic tissue. Correlation of these findings with clinical

information demonstrated worse 1-year survival in patients whose tumors had reduced BRCA1

expression, compared to patients with normal BRCA1 expression. Another study by Peng et al.523 found

that BRCA1 was frequently methylated in sporadic pancreatic adenocarcinoma as well as in ductal cells

showing inflammatory background without histologic change. The authors suggested that promoter

methylation of the BRCA1 gene may be the mechanism explaining the reduced gene expression reported

by Beger et al.510 in pancreatic cancer and in chronic pancreatitis. However, they noted heterogeneity of

methylation in different sections of the same tumor, and they did not directly measure gene expression

level, so it is not clear how promoter methylation impacted expression. Moreover, they found

methylation of BRCA1 even in normal ductal cells. Our study adds to the evidence for BRCA1 in

pancreatic tumorigenesis by specifically demonstrating an inactivating mechanism in the pancreatic tumor

Page 75: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

62

DNA of BRCA1 mutation carriers, likely akin to the role of BRCA1 in breast and ovarian cancer

tumorigenesis.

Determining the association between BRCA1 and pancreatic cancer has diagnostic and therapeutic

implications. The implication of BRCA2 in pancreatic cancer has allowed incorporation of this gene in

mutational screening panels and identification of kindreds at risk; the same can be done for BRCA1. As

for treatment, current chemotherapeutic protocols for pancreatic cancer are based on 5-FU and

gemcitabine.524 Interestingly, in-vitro and in-vivo studies have found BRCA1-deficient tumors to be

particularly sensitive to certain chemotherapeutic agents that take advantage of the impaired DNA repair

mechanism that characterizes these tumors, such as cross-linking agents (e.g. Mitomycin C), type II

topoisomerase inhibitors (e.g. etoposide), and PARP1 (Poly ADP-ribose polymerase family, member 1)

inhibitors.525-527 Recently, case reports and small series have shown that patients with BRCA1 or BRCA2

mutations respond to such therapies.174,178,528,529,530

In conclusion, we demonstrate that LOH occurs at the BRCA1 locus in pancreatic cancers of BRCA1-

mutation carriers, suggesting that this gene is inactivated in these tumors and may play a role in

pancreatic tumorigenesis. Further research into the role of BRCA1 in pancreatic cancer is needed to

assess the expression of this gene in pre-invasive and invasive pancreatic lesions. Subjects with germline

BRCA1 mutations should be considered for inclusion in pancreas cancer screening programs, and they

may benefit from chemotherapies that target the DNA repair pathway.

Page 76: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

63

Chapter 3 - Germline Genomic Copy Number Variation in Familial Pancreatic Cancer

The contents of this chapter have been published in Human Genetics 2012 Jun 5 (Epub ahead of print).

PMID: 22665139 [http://www.springerlink.com/content/6665070t28854647/]. The final publication is

available at www.springerlink.com. (I am first author).

1. Abstract Adenocarcinoma of the pancreas is a significant cause of cancer mortality, and up to 10% of cases appear

to be familial. Heritable genomic copy number variants (CNVs) can modulate gene expression and

predispose to disease. We hypothesized that genes overlapped by rare germline genomic losses or gains

identified exclusively in pancreatic cancer patients from high-risk families are candidate FPC genes. A

total of 120 FPC cases and 1194 controls were genotyped on the Affymetrix 500K array, and 36 cases and

2357 controls were genotyped on the Affymetrix 6.0 array. Detection of CNVs was performed by

multiple computational algorithms and partially validated by quantitative PCR. We found no significant

difference in the germline CNV profiles of cases and controls. A total of 93 non-redundant FPC-specific

CNVs (53 losses and 40 gains) were identified in 50 cases, each CNV present in a single individual.

FPC-specific CNVs overlapped the coding region of 88 RefSeq genes. Several of these genes have been

reported to be differentially expressed and/or affected by copy number alterations in pancreatic

adenocarcinoma. Further investigation in high-risk subjects may elucidate the role of one or more of these

genes in genetic predisposition to pancreatic cancer.

2. Introduction As illustrated in Chapter 1 of this thesis, a small proportion of familial pancreatic cancer cases can be

attributed to known cancer genes, such as Hereditary Breast and Ovarian Cancer (HBOC);

BRCA2/BRCA1/PALB2;Peutz-Jeghers Syndrome (PJS), STK11; Familial Atypical Multiple Mole

Melanoma (FAMMM), p16/CDKN2A; and Hereditary Pancreatitis (HP), PRSS1. However, most cases of

Familial Pancreatic Cancer (FPC) have an unknown genetic etiology.136 Segregation analysis of families

with multiple affected members suggests that FPC is caused by heritable alterations in at least one rare

“major gene”, likely in an autosomal dominant manner.161 Moreover, multiple case-control and cohort

studies have demonstrated that members of FPC families, particularly those with an affected first-degree

relative, have a significantly elevated lifetime risk of developing the disease (up to 32-56 fold).156;158,160

However, to date traditional methods of linkage analysis for identifying predisposition genes have met

with challenges in studying FPC, due in part to probable genetic heterogeneity as well as difficulty in

Page 77: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

64

collecting DNA specimens on multiple affected members in a family due to the rapid mortality of the

disease.

Recently, it has become clear that submicroscopic copy number variants (CNVs) are prevalent throughout

all genomes, accounting for at least 1.2% of nucleotide variation between any two individuals.238 CNVs

have been linked to rare genomic disorders531 as well as common neurodevelopmental196, psychiatric532,

autoimmune533 and metabolic534 diseases. Some studies have suggested an association between common

CNVs and sporadic cancers (e.g. pancreatic cancer (6q13)344, neuroblastoma (1q21.1)340, prostate cancer

(2p24.3; 20p13; GSTT1)338,341,342, nasopharyngeal carcinoma (6p21.3)343, and endometrial cancer

(GSTT1)535). The recent paper by Huang et al.344 is the first to describe an association of a germline CNV

with pancreatic cancer risk: a common 10,379bp deletion at 6q13 was found to be higher in frequency in

sporadic pancreatic cancer patients compared to controls, with an odds ratio of 1.31 for 1-copy carriers

compared to 2-copy carriers. Interestingly, functional analysis of this non-genic deletion suggested that it

may be involved in long-range regulation of CDKN2B, an established tumor-suppressor gene.

In addition, it is well known that rare germline CNVs contribute to the genetic basis of familial cancer.

Indeed, large germline genomic rearrangements cause 15% of Familial Adenomatous Polyposis (APC

gene)311, 2% of breast and ovarian cancer (BRCA1 gene)536, and 5% of Lynch Syndrome (MSH2 & MLH1

genes)321 cases. In 1-3% of Lynch Syndrome patients, the causative mutation is a large heritable deletion

at the 3’ end of the TACSTD1 gene, which causes transcriptional read-through and epigenetic silencing of

the adjacent MSH2 gene.336 Furthermore, a report by Shlien et al.348 identified an elevated frequency of

germline CNVs in individuals with Li Fraumeni syndrome (TP53 mutation), and suggested that the

increased predisposition to cancer in this syndrome may be proportional to the frequency of germline

CNVs, many of which overlap known cancer genes.

Since germline CNVs implicated in familial cancers to date are rare with relatively high penetrance, we

hypothesized that familial and young-onset pancreatic cancer patients have a distinctive germline

genomic copy number variation (CNV) profile compared to non-cancer controls and that tumor

suppressor genes or oncogenes predisposing to pancreatic cancer may be overlapped by one or more

CNVs that are detected exclusively in patients. Here we present an analysis of germline CNVs detected

in 120 high-risk pancreatic cancer patients and compare them to CNVs in a large cohort of unaffected

controls.

3. Materials & Methods This study was approved by the Research Ethics Boards at Mount Sinai Hospital and University Health

Network in Toronto, Canada; Office for Human Research Studies at Dana Farber/Harvard Cancer Centre

Page 78: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

65

in Boston, Massachusetts; Institutional Review Board at Mayo Clinic in Rochester, Minnesota;

Institutional Review Board at M.D. Anderson Cancer Centre in Houston, Texas; Office of Human

Subjects Research at Johns Hopkins University in Baltimore, Maryland; and Human Investigation

Committee at Karmanos Cancer Institute, Wayne State University in Detroid, Michigan.

DNA extraction from blood or EBV-transformed cell lines was performed by technicians at each

participating site and provided to W. Al-Sukhni. Genotyping of samples and ancestry verification on

STRUCTURE was performed by W. Al-Sukhni. Computational analysis of Affy 500K data on dChip,

CNAG, and Partek was performed by W. Al-Sukhni, with assistance from S. Joe in script-writing for

organization and filtration of data (as directed by W. Al-Sukhni). To standardize the analysis of Affy6.0

chips in the same manner used for the POPGEN and OHI controls, computational analysis of Affy6.0

data on Birdsuite and iPattern was performed by A. Lionel at TCAG. Filtration and annotation of all

CNV data was performed by W. Al-Sukhni. Validation of CNVs by qPCR was performed by W. Al-

Sukhni with technical assistance from N. Zwingerman, A. Gropper, and S. Moore. Breakpoint-mapping

of CNV by qPCR and Sanger sequencing entirely performed by W. Al-Sukhni. Comparison of case and

control CNVs and statistical analysis performed by W. Al-Sukhni.

3.1 DNA extraction DNA was extracted at each centre from either whole blood (white blood cells/lymphocytes) or EBV-

transformed cell lines. Cells were purified from whole blood using Ammonium Chloride-Tris lysis of red

blood cells. DNA was extracted using MaXtract Low Density tubes, which is an adaptation of the

standard organic solvent method of DNA extraction using phenol and chloroform. Purified DNA was

precipitated with 95% ethanol and dissolved in low TE buffer.

3.2 FPC cases recruitment Genomic DNA was extracted from peripheral blood or EBV-transformed cell lines of 133 pancreatic

cancer patients from 131 high-risk families recruited by PACGENE (Pancreatic Cancer Genetic

Epidemiology Consortium; PI, G Petersen, Mayo)165, a six-centre consortium that recruits kindreds

containing two or more blood relatives affected with pancreatic cancer for genetic studies. Inclusion

criteria in the current study included: subjects with two or more affected relatives (“3+ FPC”; N=79);

subjects with only one affected relative diagnosed at age 49 years or younger (“2 FPC”; N=22); and

subjects without affected relatives who were diagnosed at age 49 years or younger (“single young”;

N=32). (Some of the families were reassigned based on updated information after analysis – see Results

section). We included young cases with no family history of pancreatic cancer because they may have de

novo mutations in the gene(s) of interest, although we acknowledge that the definition of FPC involves

Page 79: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

66

more than one affected member in the family. Subjects were excluded if they carried known mutations or

were in families with syndromes which predispose to pancreatic cancer (BRCA2, BRCA1, p16/FAMMM,

STK11/PJS, PRSS1/HP, Lynch Syndrome). The majority of DNA samples were extracted from blood

(N=97) and the remaining samples were from EBV-transformed lymphoblast cell lines. (Appendix Table

S3 (excel sheet on attached CD) for details.)

3.3 Controls recruitment Control samples of matched ancestry (> 95% of cases and controls reported Caucasian ancestry) were

obtained from two sources: 45 samples were healthy controls recruited by the Familial Gastrointestinal

Cancer Registry (FGICR)537 at Mount Sinai Hospital, Toronto, and 1,153 samples were recruited by the

Ontario Familial Colon Cancer Registry (OFCCR)538. Almost all control DNA samples were extracted

from blood (only 12 OFCCR controls were from lymphoblasts). (Appendix Table S4 (excel sheet on

attached CD) for details.)

In addition, we had access to CNV data for 1,234 controls recruited through the Ottawa Heart Institute

(OHI)539 and 1,123 controls of German descent recruited by the POPGEN project540. Most of the OHI

and POPGEN DNA samples were extracted from blood, and the platform for CNV detection was the

Affymetrix 6.0 array.

3.4 SNP genotyping For primary CNV discovery, 128 cases and all 1,198 FGICR + OFCCR controls were genotyped at

approximately 500,000 genome-wide SNPs on the Affymetrix GeneChip Human Mapping 500K Array

(NspI and StyI chips) according to Affymetrix standard protocol. The cases and 45 FGICR controls

genotyping was performed at The Centre for Applied Genomics (TCAG) in Toronto, while the 1,153

OFCCR controls were previously genotyped at Genome Quebec Innovation Centre as part of the

ARCTIC case-control colorectal cancer GWAS study. Briefly, whole genomic DNA was digested with

restriction enzyme (NspI or StyI) and ligated to universal adaptors, and adaptor-ligated fragments were

PCR-amplified with preference for 200bp-1,100bp size range. Subsequently, PCR amplicons were

fragmented, labeled, and hybridized to NspI or StyI chips. Chips were scanned using GeneChip Scanner

3000 7G, and Affymetrix GeneChip Command Console (AGCC) files were produced for further

processing. Intensity files (CEL) and genotype files (CHP) were converted from AGCC files using

GeneChip Operating Software (GCOS) and GeneChip Genotyping Analysis (GTYPE) software,

respectively. Genotype calls were made by Affymetrix Genotyping Console (GTC 2.1), which

implements the BRLMM genotype calling algorithm (Bayesian Robust Linear Model with Mahalanobis

Page 80: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

67

distance classifier), using default settings (Score Threshold = 0.5, Block Size = 0, Prior Size = 10,000,

DM Threshold = 0.7).

GTC 2.1 performs a quality control (QC) analysis of the SNP genotype call rate, to estimate overall

quality of the chip hybridization, based on the Dynamic Model genotype calling algorithm. For 500K

arrays, Affymetrix considers QC < 93% call rate to suggest poor hybridization. However, QC call rate in

the range of 88-93% can also produce useable data for CNV analysis, in the experience of collaborators at

TCAG. Therefore, if we were unable to obtain rehybridized chips for some samples, we retained arrays

with QC call rate> 88% in the CNV analysis but inspected the raw calls made from those arrays to verify

if they appear to be false.

A subset of the original FPC cohort (33 samples) plus five new cases (Appendix Table S3) were

genotyped on the Affymetrix 6.0 array according to standard protocol to validate CNVs detected on the

Affymetrix 500K array as well as detect new CNVs. Arrays meeting Affymetrix quality control

guidelines of Contrast QC > 0.4 were used for further analysis. The Affymetrix Power Tools platform

was used to extract normalized intensities for each array and inter-array intensity correlation was

calculated; arrays with average correlation of > 0.9 were considered suitable for joint analysis.

3.5 Ancestry verification Subject ancestry was verified using STRUCTURE software

(http://pritch.bsd.uchicago.edu/structure.html), which infers population structure using genotype data of

unlinked markers541. We used 1,089 unlinked genome-wide autosomal SNPs that map to the Affymetrix

500K array (NspI and StyI chips), with differing minor allele frequencies across three major HapMap

populations (Caucasian (CEU), African (YRI), and Asian (CHB/JPT)). The observed alleles (major and

minor) at each SNP in HapMap populations were obtained using UCSC genome browser “Tables”

function. To determine the population cluster (assuming three ancestral populations), 270 unrelated

HapMap samples were used (90 CEU, 90 YRI, 90 CHB/JPT) as reference of known ancestry. Ancestries

were assigned using a coefficint of ancestry threshold > 0.9.

3.6 CNV discovery Figure 5 is a summary flow chart of the primary CNV discovery on the Affy500K arrays.

Page 81: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

68

Figure 5 – Analysis of 500K arrays in FPC cases and controls

128 FPCcases

1153 OFCCR controls

Affymetrix 500K SNP arrays

(TCAG)

Affymetrix 500K SNP arrays

(Genome Quebec)

dChip CNAG Partek Genomics Suite(HMM)

Merged overlapping CNVs per sample Merged overlapping CNVs per sample

LOW CONFIDENCE CNVs(single algorithm/chip)

HIGH-CONFIDENCE CNVs(≥2 algorithms or chips)

HIGH-CONFIDENCE CNVs(≥2 algorithms or chips)

FPC-specific CNVs(HIGH-CONFIDENCE SET cases vs. controls)

LOW CONFIDENCE CNVs(single algorithm/chip)

45 FGICR controls

500K ARRAYANALYSIS PIPELINE

dChip CNAG Partek Genomics Suite(HMM)

120 Cases

8 cases excluded(noise, no longer FPC) 1194 controls

4 controls excluded (personal PC or family history suggests FPC)

CNVs in 45 controls

Figure 5 Legend: Cases and controls were analyzed in a parallel fashion on three independent computational algorithms. A high-confidence CNV set (based on support by at least two algorithms or chips) was obtained for each of cases and controls and compared.

Copy number at each SNP position was estimated using three validated Hidden Markov Model (HMM)-

based CNV-calling algorithms (dChip 2006542, CNAG 2.0543, and Partek Genomics Suite v6.3©). NspI

and StyI chips were analyzed separately for each individual. After conducting several trials of different

analysis approaches, we identified the following as the method that best addresses the noise level in our

data: for dChip and Partek, samples were analyzed in batches corresponding to the grouping of samples

during chip hybridization (to minimize “batch effect” differences in hybridization that may lead to false

differences in intensity between samples): FPC cases and FGICR controls were analyzed in two batches

(batch 1 contained 47 cases and 22 controls; batch 2 contained 81 cases and 23 controls); OFCCR

controls were analyzed on dChip and Partek in 10 batches of approximately 100 samples each. For

CNAG, use of a maximum number of samples improves CNV detection, so the full group of FPC cases

and FGICR (173 samples) were analyzed concurrently, while the ARCTIC controls were analyzed in 6

random batches of approximately 200 samples each. Default analysis settings were used for each of the

computational programs: invariant-set probe normalization and hidden markov model copy inference

Page 82: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

69

method for dChip; “non-paired reference/test sample” category and “automated analysis” option for

CNAG; 2-probe minimum used for calling CNV on Partek Suite (HMM method). The Partek CNV

coordinates were based on hg18 genome build and were converted to hg17 to merge with dChip and/or

CNAG.

A loss was defined by two or more consecutive SNPs with estimated copy number of < 2; a gain was

defined by two or more consecutive SNPs with estimated copy number of > 2. CNVs whose size was

less than 1,100bp were excluded to avoid the bias of PCR artifact causing false calls (since the fragment

size of amplified fragments was 200-1,100bp). Losses larger than 2 Mb and gains larger than 7 Mb were

also excluded (the cut-off was based on the largest CNVs seen in cases, with intention of maximizing

sensitivity in detecting case CNVs while removing excessively large CNVs in controls that are likely

false calls and/or represent somatic events). CNVs that crossed the centromere were removed because

they were incompatible with chromosomal stability and expected to be false calls. For any given chip and

algorithm, if the number of CNVs (losses + gains) called in a sample exceeded 40 (after above filters),

that sample was eliminated from the analysis for that given algorithm and chip (i.e. considered too noisy).

For each sample on a given chip, CNVs identified by two or more algorithms with overlapping

breakpoints (same direction on all algorithms) are merged if the length of the overlap area corresponds to

at least 20% of the length of any of the overlapping CNVs (Figure 6).

Figure 6 – Criteria for merging CNVs

For each sample, CNVs identified on both chips of the 500K array with overlapping breakpoints (same

direction on both chips) are merged if the length of the overlap area corresponds to at least 20% of the

length of either of the overlapping CNVs (Figure 6). “High-confidence calls” were identified as CNVs

called by at least two different algorithms and/or on both chips. Note, if a CNV is called by a different

algorithm on each chip, it was not considered “high-confidence”. For the purpose of identifying “CNV

Page 83: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

70

loci”, CNVs in multiple samples with overlapping CNVs are merged (using the above-described 20%

threshold).

CNV calling on Affy6.0 arrays was performed using the Birdsuite tools (Canary + Birdseye algorithms)544

and iPattern545 algorithms, using a reference set that included the 38 FPC cases in addition to 100 other

closely-correlated Affy6 arrays previously analyzed at TCAG (based on correlation coefficient > 0.9).

(Samples were also analyzed on GTC 4.1, but this data was only used to support calls made on Birdsuite

or iPattern). For each of these algorithms, we required CNVs to span 5 or more consecutive array probes

and be at least 20 kb in length. Detection by either Birdsuite or iPattern was sufficient for the purpose of

validating 500K array CNVs. Only “high-confidence” calls (i.e. called by at least two of Birdsuite,

iPattern, and/or GTC 4.1 software – boundaries of overlapping regions were determined in the same

manner as for 500K data) were included as novel FPC-specific CNVs. Samples with number of calls

greater than three times the standard deviation from the mean number of calls for an analysis batch were

excluded from the study. The combined results of Birdsuite (Canary and Birdseye) were filtered to

remove CNVs with the following: excluded centromere jumpers; excluded X chromosome variants; tag

of “loss” with a copy number of > 1 or tag of “gain” with a copy number of < 3. The iPattern results were

filtered to remove CNVs in X chromosome and CNVs tagged as “complex”.

3.7 PCR validation of CNVs Quantitative PCR validation of a subset of CNVs was performed using Invitrogen Platinum SYBR Green

qPCR Supermix – UDG, with primers designed within the CNV of interest, and MSH2-exon2 used as a

reference gene. (Appendix Table S5 for primer sequences). Standard PCR conditions were used: (50C x

2mins; 95C x 2mins; (95C x 15sec; 60C x 32sec) x 40 cycles). Reactions were performed in replicates of

4-8x per sample. A standard curve was performed on each plate using control DNA (From a single

sample for all experiments) to ensure primer efficiency is between 90%-110% (slope = -3.6 – 3.1) and the

correlation coefficient (R2) of the standard curve samples is > 0.99. Dissociation curve was checked for a

single peak (indicating a single product). Data was analyzed on the ABI 7500 real-time machine, setting

the baseline and threshold manually to reflect the exponential phase of amplification. Finally, data from

each plate was analyzed using the ddCt method546: for each sample with at least 4 replicates, one sample

may be excluded from the calculation if it falls outside the range of Mean +/- 2*SD of Ct values (range

calculated after removal of uppermost or lowermost value); a “validation” curve of dCt vs. log input DNA

amount was done for each primer set to prove that the absolute slope is <0.1, signifying that the

efficiencies of the test gene and reference gene primer sets are approximately equal. The calculations for

ddCt are made as follows:

Page 84: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

71

dCt = mean Ct (test gene) – mean Ct (control gene (MSH2))

Standard deviation (SD) of dCt = SquareRoot[(SD Ct(gene of interest))2 + (SD Ct (MSH2))2]

ddCt = dCt (test sample) – dCt (control sample)

Fold difference in copy number = 2ddCt

SD of fold difference in copy number = Ln(2)*SD of dCt*2ddCt

3.8 Prioritization of CNVs Figure 7 illustrates the priority order for investigating CNVs detected in cases.

Figure 7 – CNV prioritization plan

Figure 7 Legend: CNVs segregating with disease in a family or de novo in single case are highest priority,

followed by recurrent CNVs in unrelated affected individuals that are not found in unaffected controls. Single-affected disease-specific CNVs are lower in priority, and least likely to yield candidate genes are CNVs found in both affecteds and unaffecteds.

We defined “FPC-specific CNVs” as losses or gains detected in FPC cases on the 500K or Affymetrix 6.0

array, and which did not overlap (by 20% or more) with losses or gains in FCIGR, OFCCR, OHI, or

POPGEN controls, nor overlapped CNVs reported from non-BAC based platforms in the Database of

Genomic Variants (DGV)547 (http://projects.tcag.ca/variation -updated Nov 2010). Although we did not

Page 85: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

72

control for ancestry in this analysis, we did note which FPC-specific CNVs were detected in non-

Caucasian samples.

3.9 Annotation of CNVs Affymetrix 500K and Affymetrix 6.0 array coordinates were aligned to the NCBI hg17 and NCBI hg18

human genome builds, respectively. Genes overlapped by CNVs were identified through the University

of California, Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/), using the respective human

genome build. Information about CNV-overlapped genes was obtained from Entrez Gene

(http://www.ncbi.nlm.nih.gov/gene) and Pubmed (http://www.ncbi.nlm.nih.gov/pubmed/). The Memorial

Sloan Kettering Cancer Centre (MSKCC) CancerGenes database

(http://cbio.mskcc.org/CancerGenes/Select.action)548 was used to identify genes with reported pathways

or functions linked to cancer development. The Wellcome-Trust Sanger Catalogue of Somatic Mutations

in Cancer (COSMIC version 55) database (http://www.sanger.ac.uk/genetics/CGP/cosmic)549 (used

Biomart to identify all genes with mutation type “complex-compound substitution; complex – frameshift;

deletion-frameshift; insertion-frameshift; substitution-missense; substitution – nonsense; unknown”. To

get all COSMIC genes fitting these categories, the “gene” field was left empty; otherwise the desired gene

lists were used) and the Pancreatic Expression Database – version 2.0

(http://www.pancreasexpression.org)253 identified genes that had previously reported point mutations or

copy number alterations in tumors or cancer cell lines, or which were reported to be differentially

expressed in pancreatic cancer according to published gene expression studies.

3.10 Comparing Affy500K CNV profile between cases and controls Only “high-confidence” CNVs from non-EBV samples were included in the CNV profile comparison to

minimize potential cell line artifacts and false calls.278 As well, only controls with data available for both

NspI and StyI chips were included in this comparison to minimize bias of undercalling CNVs in single-

chip samples. To minimize CNV calling errors for “complex” CNVs (i.e. losses and gains in different

samples overlapping the same region), we performed the “rare CNV” analysis only on regions reported as

either losses or gains only. CNV loci that are present in fewer than 1% of the total number of samples

(cases + controls) were considered “rare”, excluding EBV samples and the complex CNVs. For losses,

32 cases and 235 controls (total 267 samples) were included in the “rare loss” analysis, so a rare loss was

defined as present in fewer than 3 individuals. For gains, 56 cases and 551 controls (total 607 samples)

were included in the “rare gain” analysis, so a rare gain was defined as present in fewer than 7 samples.

Page 86: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

73

3.11 Statistical analysis Comparison of medians was performed using the Mann Whitney U test and comparison of means was

performed using the two-tailed Student’s t-test with Levene’s test for equal variance. Testing for

significant difference in proportions was performed with the two-tailed Fisher’s exact test. A p-value <

0.05 was considered significant. Statistical testing was performed using the SPSS© software package

(version 17).

For comparing differences in proportions of cases and controls at each CNV locus, we only considered

regions containing only losses or only gains (in cases and/or controls) for non-EBV samples, and we

excluded samples with only a single chip in the analysis. After calculating two-tailed Fisher’s exact test

p-values for each loss and gain locus, we performed a Bonferroni correction to account for multiple-

testing. The number of multiple tests was defined as the total number of loss or gain loci in the above

comparison (losses and gains were assessed separately).

3.12 Breakpoint Mapping and Sequencing To precisely identify the CNV breakpoints, qPCR was performed at several positions near the estimated

breakpoints (based on the SNP microarray results), narrowing down the estimated location of the

breakpoint to a region approximately 1,000 bp in length. (See Appendix Table S6 for primer sequences;

standard PCR conditions were used as described previously). Primers were designed to PCR-amplify the

region estimated to contain the breakpoint (see Appendix Table S6) and Sanger sequencing was used to

identify the exact base pairs delineating the breakpoint. Products were cleaned up using Qiagen MinElute

PCR purification kit. Sanger sequencing was performed by the AGTC service lab.

4. Results

4.1 Affymetrix 500K results Of the original 128 FPC cases genotyped on the Affymetrix 500K array, eight were subsequently

excluded (two subjects had excessively noisy data based on CNV count > 40 per analysis run; one subject

was discovered to have had chronic lymphocytic leukemia at the time of blood sample donation, making

it difficult to distinguish germline from somatic CNVs detected in the sample; and five subjects no longer

met inclusion criteria in light of new information that became available after the start of the study),

leaving 120 cases in the final analysis with both NspI and StyI chips represented for each sample. Some

of the subjects were reassigned to different inclusion criteria after updated information became available,

resulting in 68 “3+ FPC” subjects, 28 “2 FPC” subjects, and 24 “single young” subjects contributing to

Page 87: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

74

the final set of case CNVs detected on Affymetrix 500K array. Two controls were discovered to have a

history of sporadic pancreatic cancer (no affected relatives), and two other controls each reported having

two relatives with pancreatic cancer, suggesting potential FPC kindreds. After excluding those four

samples, 1,194 controls remained in the final analysis. For 236 of those controls, only one chip was

included in the analysis (137 NspI only; 99 StyI only) due to inadequate hybridization of the second chip.

STRUCTURE software was used for estimating population ancestry of the 120 FPC cases and 958

controls that had NspI + StyI chips available for analysis: 89.2% of cases and 94.8% of controls were

Caucasian; 1.7% of cases and 2.1% of controls were Asian; and 9.2% of cases and 3.1% of controls were

of admixed background.

Figures 8 and 9 summarize the number of gains and losses called by each algorithm on each chip in cases

and controls.

Figure 8 – Gains and losses identified in FPC cases by each algorithm/chip

Figure 8 Legend: Number of losses and gains identified by each algorithm and resultant number of losses and gains

after merging overlapping CNVs.

Page 88: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

75

Figure 9 - Gains and losses identified in controls by each algorithm/chip

Figure 9 Legend: Number of losses and gains identified by each algorithm and resultant number of losses and gains

after merging overlapping CNVs.

The total number of autosomal CNVs identified in cases and controls was 873 and 10,794 respectively, of

which 382 CNVs (123 losses + 259 gains) in cases and 3,115 CNVs (805 losses + 2,310 gains) in controls

were considered high confidence calls (corresponding to 66 loss loci + 105 gain loci in cases and 313 loss

loci + 467 gain loci in controls). (Appendix Tables S7 to S10 for high- and low-confidence CNVs in cases

and controls (available as excel files on attached CD)). The proportion of losses and gains considered

“high-confidence” was significantly larger in cases than in controls (losses: 48% cases vs. 33% controls,

p<0.001; gains: 42% cases vs. 28% controls, p<0.001). As well, the percentage of cases with at least one

high-confidence loss was significantly greater than controls (68% vs 47%, p<0.001), but no significant

difference existed between cases and controls in the percentage of samples with high-confidence gains

(85% vs. 80%, p=0.227). Significance testing results were the same whether or not the 236 controls with

only one chip in the analysis were included, or whether the denominator is all samples vs. only samples

that had at least one CNV call. We note that no significant difference was observed between cases and

controls when restricting the analysis to FGICR controls that were genotyped at the same centre (TCAG).

(Tables 7 and 8)

Page 89: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

76

Table 7 - Proportion of high-confidence losses in cases and controls

% of losses that are high-confidence (HC)

% of HC losses if remove controls with only 1 chip

% of samples with HC losses

% of samples with HC losses if remove controls with only 1 chip

% of samples with HC losses among 2-chip samples with at least one loss call

Cases 48 48 68 68 76 All Controls 33 35 47 53 63 Fisher's exact p < 0.001 p < 0.001 p < 0.001 p=0.002 p=0.009 FGICR controls 41 43 51 55 64 Fisher's exact (compared to cases) p=0.303 p=0.512 p=0.070 p=0.190 p=0.190

Table 8 - Proportion of high-confidence gains in cases and controls

% of gains that are high-confidence (HC)

% of HC gains if remove controls with only 1 chip

% of samples with HC gains

% of samples with HC gains if remove controls with only 1 chip

% of samples with HC gains among 2-chip samples with at least one gain call

Cases 42 42 85 85 87 All Controls 28 29 80 86 88 Fisher's exact p < 0.001 p < 0.001 p=0.227 p=0.782 p=0.882 FGICR controls 49 50 80 81 85 Fisher's exact (compared to cases) p=0.109 p=0.086 p=0.227 p=0.626 p=0.789

4.2 Affymetrix 6.0 results In 36 cases genotyped on the Affymetrix 6.0 array (two of the original 38 samples were excluded due to

excess noise – see methods), a total of 3,364 autosomal CNVs (2,665 losses and 699 gains) were

identified using Birdsuite, and 3,266 autosomal CNVs were identified using iPattern (1,975 losses and

1,291 gains). Table 9 summarizes some key parameters of CNVs identified by each algorithm.

Table 9 - CNVs called by each of Birdsuite and iPattern in 36 samples on Affymetrix 6.0 array Birdsuite iPattern # losses 2,665 1,975 # gains 699 1,291 median size losses (bp) 7,793 10,388 median size gains (bp) 60,599 19,857 # genic losses (% of all losses) 969 (36%) 693 (35%) # genic gains (% of all gains) 512 (73%) 690 (53%) # losses called as HC losses in 500K array (in same sample) 33 35 # losses called as LC losses in 500K array (in same sample) 20 20 # gains called as HC gains in 500K array (in same sample) 70 70 # gains called as LC gains in 500K array (in same sample) 33 38 mean # losses per sample/mean # gains per sample 74/19 55/36 HC = high-confidence; LC = low-confidence on 500K array

Page 90: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

77

The high-confidence set of Affy6 CNVs (incorporating GTC-supported CNVs) comprised 2,187 CNVs

(1,656 losses + 531 gains). (Appendix Tables S11 to S12 for high-confidence CNVs on Affy6 array in

FPC cases and controls (available as excel files on attached CD)). The median size of high-confidence

losses and gains was 12.7kb (1kb-1.4Mb) and 48.9kb (1kb-1.6Mb), respectively, and the average number

of losses and gains per genome was 46 and 15, respectively.

4.3 CNV validation Quantitative PCR was used to attempt validation of 18 losses (13 high-confidence and 5 low-confidence)

and 10 gains (all high-confidence) in FPC cases, of which all the high-confidence CNVs validated and 4/5

low-confidence CNVs validated. (Appendix Figures S1 to S32 for qPCR results). Of the 33 FPC cases

that were hybridized to both Affy 500K and Affy6.0 arrays, 31 yielded useable results on both arrays.

For those 31 cases, 113 high-confidence CNVs and 142 low-confidence CNVs were called on the 500K

array, of which 107 (95%) high-confidence CNVs and 63 (44%) low-confidence CNVs were validated on

the Affy6 array. The combined results of qPCR validation and Affy6 genotyping demonstrated a

validation rate of 95% (121/127) for high-confidence CNVs but only 45% (66/146) for low-confidence

CNVs. Therefore, the remainder of this analysis was limited to high-confidence CNVs in cases and

controls. Approximately one third (121/382) of all high-confidence case CNVs identified on the 500K

array, corresponding to half (88/171) of all high-confidence CNV loci in cases, have been confirmed by

either the Affymetrix 6.0 array and/or qPCR.

4.4 Comparing CNV profile of cases and controls We compared several characteristics of CNVs identified on the 500K array between FPC cases and

FGICR/OFCCR controls. Table 10 compares several key CNV attributes between cases and controls

(based on high-confidence CNVs and excluding EBV-derived samples and controls with only one chip in

the analysis).

Table 10 - High confidence CNV profile of cases vs. controls (excluding EBV-derived samples and excluding controls with data from only one chip)

FPC cases Controls p-value

# Lymphocyte samples 91 950 #High-confidence losses/high-confidence gains 91/190 731/2,059 Median CNV size (range) 219.5kb

(1.2kb-6.4Mb) 219.5kb

(1.2kb-6.8Mb) 0.439

Median CNV SNP count (range) 42 (2-417) 40 (2-1318) 0.578 #Genic CNVs/all CNVs Losses Gains

52/91 (57%)

153/190 (81%)

400/731 (55%)

1,646/2,059 (80%)

0.738 0.850

Page 91: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

78

#Samples with genic CNVs/samples with any CNVs Losses Gains

43/59 (73%) 70/75 (93%)

327/500 (65%) 765/816 (94%)

0.309 0.805

#CNV genes identified as “Cancer Genes” in MSKCC CancerGenes database/all CNV genes recognized by the MSKCC database Losses Gains

8/36 (22%) 53/335 (16%)

35/264 (13%) 507/2940 (17%)

0.201 0.541

#CNV loci included in rare analysis/all CNV loci Losses Gains

36/52 (69%) 65/83 (78%)

203/290 (70%) 349/428 (82%)

1.000 0.541

#CNVs that are part of rare loci/all CNVs Losses Gains

23/91 (25%)

47/190 (25%)

199/731 (27%)

461/2,059 (22%)

0.802 0.469

#Samples with CNVs included in rare analysis/samples with any CNV Losses Gains

32/59 (54%) 56/75 (75%)

235/500 (47%) 551/816 (68%)

0.335 0.244

#Samples with rare CNVs/samples with any CNV Losses Gains

21/59 (36%) 37/75 (49%)

169/500 (34%) 348/816 (43%)

0.773 0.275

#Genic rare CNVs/all rare CNVs Losses Gains

10/23 (43%) 33/47 (70%)

69/199 (35%)

330/461 (72%)

0.491 0.866

#Samples with genic rare CNVs/samples with rare CNVs Losses Gains

10/21 (48%) 27/37 (73%)

63/169 (37%)

267/348 (77%)

0.476 0.684

Mean CNVs per genome* Losses Gains

1.5 2.5

1.5 2.5

0.443 0.956

Mean rare CNVs per genome* Losses Gains

0.4 0.6

0.4 0.6

0.919 0.498

*mean and t-test calculated for losses and gains based only on samples with at least one high-confidence loss or gain, respectively (to avoid the bias of samples which didn’t get a high-confidence CNV call due to noise)

Overall, no significant difference was observed in the CNV profile of cases and controls, including such

parameters as CNV size, proportion of genic CNVs, proportion of rare CNVs, and average number of

CNVs per individual genome. In both groups, gains were larger than losses (median size - cases: 228.7kb

vs. 176.6kb, p=0.016; controls: 224.4kb vs. 168.0kb, p<0.001) and were more likely to overlap genes

(cases: 153/190 gains vs. 52/91 losses are genic, p<0.001; Controls: 1,641/2,059 gains vs. 400/731 losses

are genic, p<0.001).

Page 92: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

79

4.5 CNVs of interest Figure 7 summarizes the CNV prioritization plan that we applied to our data. The highest priority is

assigned to CNVs that segregate with disease status in blood relatives, or alternatively de novo CNVs in

singleton young affected subjects.

Since no trios were available for analysis, we could not determine which CNVs were de novo. Only two

pairs of siblings were genotyped, while the remaining were all unrelated subjects. In one pair of siblings

whose parents are not consanguinous, only a single gain was shared by the two siblings and this CNV was

also identified in many other cases and controls. In the second pair of siblings whose parents are first-

cousins, one loss and three gains were shared by the two siblings but all the CNVs were also shared by

controls. Hence, no FPC-specific CNVs were found to segregate in either of the two pairs of siblings.

Next in priority are CNVs that overlap in two or more unrelated cases and are absent in controls. We also

considered CNVs present in cases and controls if they met the following conditions: (1) CNV present in

two or more cases; (2) CNV overlaps gene(s) in cases; (3) the genic portion of the region is not

overlapped by control CNVs or DGV CNVs. (To ensure that we are not missing anything significant, we

assessed the data for loci overlapping two or more cases and no controls even if reported in the DGV, but

none fit this criteria). A total of 64 FPC CNVs (27 losses and 37 gains) detected on the 500K array were

not identified in FGICR or OFCCR controls. After further excluding regions that overlapped POPGEN

or OHI controls or were reported in the DGV, the number of FPC-specific CNVs identified on the 500K

array is 37 CNVs (16 losses and 21 gains). On the Affymetrix 6.0 array, 119 FPC CNVs (71 losses and 48

gains) were not identified in POPGEN or OHI controls, and after further excluding regions which

overlapped FGICR and OFCCR controls or were in the DGV, 73 FPC-specific CNVs (45 losses and 28

gains) remained. Combining results from the two arrays (including regions identified on both platforms)

yielded a total of 93 non-redundant FPC-specific CNVs (53 losses and 40 gains), each CNV present in a

single individual only (a total of 50 FPC cases, including 7 EBV-derived samples); 13 losses and 8 gains

were in non-Caucasian individuals.

One duplication (G_97) appeared to affect the same gene (TGFBR3) in two unrelated cases, albeit with

different breakpoints in each case (Figure 10). This gene codes for a receptor of TGF-beta, a signaling

molecule with an important role in pancreatic cancer initiation and progression, and decreased expression

of TGFBR3 has been observed in various cancers suggesting that it behaves as a tumor-suppressor. Given

the potential significance of this gene for pancreatic cancer, we aimed to investigate this duplication

further.

Page 93: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

80

Figure 10 – Duplications overlapping TGFBR3 gene

Figure 10 Legend: TGFBR3 transcripts circled; red bars represent breakpoints of CNVs identified on SNP arrays

Although an overlapping duplication was also present in one POPGEN control, the control duplication

only overlapped the beginning of one of the multiple isoforms of this gene. (There was also a large low-

confidence duplication called in one of our ARCTIC controls, but this appeared to be a false call as

demonstrated by qPCR – see Appendix Figure S33). The duplication in case ID-27 was validated by

qPCR using two different primer sets. We validated the duplication in case ID-203 using those same

primer sets, and additionally tested family members for this subject for whom DNA was available.

(Figure 11; Appendix Figures S33-S38).

Page 94: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

81

Figure 11 – Pedigree of case ID-203, indicating results of qPCR testing for duplication G_97

Figure 11 Legend: GB = gallbladder; PC = Pancreas cancer; dup = duplication identified; no dup = no

duplication identified; blood = source of DNA is lymphocytes; tissue = source of DNA is FFPE resected specimen

At this point, we observed that the mother of the proband did not carry the duplication, which weakened

the argument for this CNV being causative for pancreatic cancer (since the pancreatic cancer was

considered matrilineal in this family, with a maternal grandmother reported to have died of the disease).

However, we considered the possibility of the disease being inherited from the paternal side, particularly

since the paternal grandmother was reported to have died of “gallbladder cancer”, which could have been

a misdiagnosis of pancreatic cancer. We did not have access to DNA from the father or paternal

grandmother, but as noted in the pedigree, a sister of the proband’s had also died of pancreatic cancer.

We wished to test for segregation of the duplication with the disease, but only formalin-fixed paraffin-

embedded (FFPE) tissue was available for DNA extraction from this sister. Due to the fragmented nature

of FFPE-derived DNA (caused by cross-linking and degradation of nucleic acid by formalin

preservation), qPCR performed on FFPE-DNA can be biased and difficult to verify. Therefore, we

decided to fine-map the breakpoints of the duplication to allow Sanger sequencing of the tandem

duplication point. Our fine-mapping method involved designing qPCR probes at several positions falling

within as well as outside the array-defined boundaries of the duplication (Figure 12; Appendix Figures

S39 to S45 for qPCR results).

Page 95: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

82

Figure 12 – Fine-mapping the breakpoint of duplication overlapping TGFBR3 using qPCR walk-along method

Figure 12 Legend: Panel [A] depicts the array-based estimation of the duplication breakpoints; panels [B] and

[C] indicate the locations of the qPCR probes at either end of the duplication (shown as small vertical black bars). Panels [B] and [C], the red arrows indicate the area between the confirmed duplicated and non-duplicated positions at either end of the CNV.

At this point, we selected two primers used for qPCR analysis (O_Out_5 and T_Out_3) to attempt PCR

amplification of the region containing the duplication breakpoint. Although we did not know at this point

the exact size of the duplication, we were able to amplify a fragment approximately 1.5-2kb in size (see

Figure 13), whereas a control sample not containing the duplication failed to amplify anything using these

primers (as would be expected).

Figure 13 – PCR gel demonstrating amplification of ~1.5-2kb fragment containing G_97 duplication breakpoint in case Id_203

Figure 13 Legend: Each well represents a separate PCR reaction (three for duplication-carrying sample and

three for non-duplication control)

A

B

C

Page 96: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

83

We submitted the fragment for Sanger sequencing from both ends; although the size of the fragment was

too large to read completely from either primer, we obtained sufficient length of reads from each primer

such that they overlapped at the breakpoint of the duplication, thus allowing us to pinpoint the exact

location of the breakspoint (see Figure 14).

Figure 14 – G_97 duplication breakpoint mapping by Sanger sequencing

Figure 14 Legend: Sequence [A] is located at the end of G_97 that does not transect TGFBR3; the purple-highlighted

portion is seen in Sanger sequence reads from forward primer (O_Out_5) located at that end of the duplication. Sequence [C] is located at the end of G_97 that transects TGFBR3; the yellow-highlighted portion is seen in Sanger sequence reads from reverse primer (T_Out_3) located at that end of the duplication. Non-highlighted portion of each of those reads represents the normally expected sequence in each location if no duplication was present. The red-higlighted sequence is the region of the tandem duplication breakpoint that observed in each of the Sanger sequence reads from the above-described primers; note the insertion of “TAT” at the point of duplication.

Based on this information, we designed a primer set to amplify a smaller fragment encompassing the

breakpoint (~100 bp), to allow amplification of FFPE-derived DNA (obtained from non-tumor region of

the specimen block) from the affected sister of the proband. We also performed PCR amplification of

several other amplicons of similar size to control for DNA degradation, and we used case Id-203 as a

positive control for the duplication. As Figure 15 illustrates, although the FFPE DNA appeared to

amplify the four other test amplicons well, no amplification of the duplication breakpoint region was

observed in the affected sister, indicating that she did not inherit the duplication.

A

B

C

Page 97: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

84

Figure 15 - PCR gel illustrating amplification of test regions and duplication breakpoint in case Id-203 and affected sister

Figure 15 Legend: Wells within the blue boxes belong to sister of ID_203 (source of FFPE DNA); wells

outside blue boxes belong to case ID_203 (blood-derived DNA); every fifth column is water control

4.6 FPC-specific CNVs Since the TGFBR3 duplication did not segregate with pancreatic cancer in the family we studied, and no

FPC-specific CNV occurred in more than one case, we proceeded to annotate the FPC-specific CNVs and

to prioritize them based on gene content and their association with cancer. (Figure 16 illustrates the

distribution of FPC-specific CNVs across the genome).

100 bp

100 bp

Page 98: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

85

Figure 16 - FPC-specific losses and gains on autosomal chromosomes

Twenty-three FPC-specific losses and 23 FPC-specific gains overlapped introns, exons, and/or

untranslated regions of 104 RefSeq genes (Table 11).

Table 11 – FPC specific CNVs

CNV type CNV Id Sample Id

Coordinates (hg18) Size (kb) RefSeq Genes

Overlaps Pancreatic Expression Database CNVs?

Gain Affy6.0_G_11 127 chr1:49856085-50089082 233.0 AGBL4 no

Gain Affy500K_G_280 & Affy6_G_298 62

chr18:6838462-7291170 452.7

ARHGAP28, LAMA1, LRRC30, LOC400643

High-level amplification

Gain Affy500K_G_380 82 chr3:143693491-143928895 235.4

ATR, PLS1, TRPC1 no

Gain Affy6.0_G_324 20 (Admixed) chr19:60436319-60696243 259.9

BRSK1, UBE2S, SHISA7, TMEM190, COX6B2, no

Figure 16 Legend: Red box = loss; Green box = gain

Page 99: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

86

FAM71E2, HSPBP1, TMEM150B, ISOC2, IL11, RPL28, TMEM238, ZNF628, SUV420H2, NAT14, PPP6R1, SSC5D

Gain Affy500K_G_136 37 (EBV) chr16:78810438-79258408 448.0

DYNLRB2, CDYL2, MIR548H4

High-level amplification

Gain Affy500K_G_615 & Affy6_G_77 125

chr7:133223330-133393933 170.6 EXOC4 no

Gain Affy6.0_G_235 99 chr15:32814039-32848252 34.2 GJD2 no

Gain Affy500K_G_365 79 chr4:93344017-93591992 248.0 GRID2 no

Gain Affy6.0_G_226 44 chr15:70381008-70436843 55.8 HEXA, CELF6 no

Gain Affy500K_G_603/604 & Affy6_G_93

123 (Admixed)

chr8:39935640-39943638 8.0 IDO2 no

Gain Affy6.0_G_39 123 (Admixed)

chr3:161448573-161518365 69.8 IFT80 no

Gain Affy6.0_G_143 17 chr10:71778181-71797516 19.3 LRRC20 no

Gain Affy6.0_G_170 20 (Admixed) chr11:65027491-65201466 174.0

LTBP3, PCNXL3, MAP3K11, MIR4489, MALAT1, RELA, SIPA1, SSSCA1, FAM89B, KCNK7, MIR4690, EHBP1L1, LOC254100, SCYL1 no

Gain Affy500K_G_176 & Affy6_G_301 44

chr18:2254263-2555103 300.8 METTL4 no

Gain Affy6.0_G_33 69 chr2:216465517-216485115 19.6 none no

Gain Affy500K_G_88 24 chr4:26691114-26985948 294.8

none

(mRNA present) no

Gain Affy500K_G_369 80 chr4:29195980-29209908 13.9 none no

Gain Affy500K_G_602 & Affy6_G_50

123 (Admixed)

chr4:72734028-72817447 83.4 none no

Gain Affy500K_G_511 107 (EBV) chr4:105853937-106127766 273.8 none no

Gain Affy500K_G_407 86 chr6:48829836-49492706 662.9 none no

Gain Affy6.0_G_70 44 chr6:132466247- 12.9 none no

Page 100: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

87

132479169

Gain Affy6.0_G_95 99 chr8:83294045-83332227 38.2 none no

Gain Affy500K_G_49 12 (Admixed) chr9:81978854-82021829 43.0 none no

Gain Affy6.0_G_138 54 chr10:4497158-4555255 58.1 none no

Gain Affy6.0_G_152 54 chr11:41420026-41456633 36.6

none

(mRNA present) no

Gain Affy500K_G_622 & Affy6_G_158 126

chr11:81521790-81598468 76.7

none

(mRNA present) no

Gain Affy500K_G_502 106 (EBV) chr12:57378034-57482408 104.4

none

(mRNA present) no

Gain Affy6.0_G_194 69 chr13:86091484-86118457 27.0 none no

Gain Affy6.0_G_326 202 chr20:46926869-46943223 16.4 none no

Gain Affy500K_G_225 58 chr21:28431800-28667362 235.6

none

(mRNA present) no

Gain Affy500K_G_226 58 chr21:35973166-36013145 40.0

none

(mRNA present) no

Gain

Affy500K_G_105 & Affy6_G_283 & Affy6_G_284 28

chr17:2919396-3184579 265.2

OR1D2, OR1G1, OR1A2, OR1A1, OR1D4, OR3A2, OR3A1, OR3A4P no

Gain Affy500K_G_95 26 chr10:19849680-20589237 739.6 PLXDC2

High-level amplification

Gain Affy6.0_G_90 202 chr8:49008716-49049657 40.9 PRKDC, MCM4 no

Gain Affy6.0_G_3 123 (Admixed)

chr1:157133096-157188413 55.3 PYHIN1 no

Gain Affy500K_G_69 & Affy6_G_87 18

chr8:108696004-109010881 314.9 RSPO2

High-level amplification

Gain Affy500K_G_303 65 chr2:230753632-230823051 69.4 SP110, SP140 no

Gain Affy6.0_G_179 11 (Asian) chr12:81711207-81762121 50.9 TMTC2 no

Gain Affy6.0_G_212 67 chr14:73405361-73432688 27.3 ZNF410, PTGR2 no

Gain Affy6.0_G_315 62 chr19:60824299-60923809 99.5

ZNF784, NLRP9, EPN1, CCDC106, ZNF580, U2AF2, ZNF581 no

Loss Affy500K_D_125 & Affy6_D_1246 68

chr12:39394850-39501843 107.0 CNTN1

High-level amplification

Loss Affy6.0_D_870 123 (Admixed)

chr5:11220277-11229088 8.8 CTNND2 no

Loss Affy6.0_D_1507 11 (Asian) chr18:3670476- 45.1 DLGAP1 no

Page 101: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

88

3715553

Loss Affy6.0_D_1127 123 (Admixed)

chr10:128752241-128780181 27.9 DOCK1 no

Loss Affy6.0_D_637 204 chr2:55010996-55019655 8.7 EML6 no

Loss Affy500K_D_24 & Affy6_D_1342 11 (Asian)

chr13:93544008-93670507 126.5 GPC6

High-level amplification

Loss Affy6.0_D_739 69 chr4:70867305-70952889 85.6

HTN1, HTN3, STATH no

Loss Affy500K_D_152 85 chr3:125676839-125815545 138.7 KALRN no

Loss Affy6.0_D_477 97 chr1:62528216-62538049 9.8 KANK4 no

Loss Affy6.0_D_1548 61 chr19:61684427-61697318 12.9 LOC100128252 no

Loss Affy6.0_D_844 40 chr4:178997998-179018809 20.8 LOC285501 no

Loss Affy6.0_D_911 123 (Admixed)

chr6:119578774-119604698 25.9 MAN1A1 no

Loss Affy500K_D_220 112 (EBV) chr8:6371546-6430547 59.0

MCPH1, ANGPT2 no

Loss Affy500K_D_142 77 (Admixed) chr8:17998784-18145035 146.3 NAT1 no

Loss Affy6.0_D_535 62 chr2:41356049-41390177 34.1 none no

Loss Affy500K_D_114 & Affy6_D_74 62

chr2:41474986-41608172 133.2 none no

Loss Affy6.0_D_677 20 (Admixed) chr3:22405124-22481450 76.3 none no

Loss Affy6.0_D_671 30 chr3:192351519-192375879 24.4 none no

Loss Affy6.0_D_769 28 chr4:123803190-123806840 3.7

none

(mRNA present) no

Loss Affy6.0_D_930 35 chr6:142219243-142324891 105.6

none

(mRNA present) no

Loss Affy6.0_D_992 64 (Admixed) chr7:23094182-23110722 16.5

none

(mRNA present) no

Loss Affy6.0_D_1029 64 (Admixed) chr8:2578046-2587479 9.4 none no

Loss Affy6.0_D_1650 20 (Admixed) chr8:58080498-58091757 11.3 none no

Loss Affy6.0_D_1069 125 chr8:88575501-88585299 9.8 none no

Loss Affy500K_D_93 48 chr8:89782116-89849946 67.8

none

(mRNA present) no

Loss Affy6.0_D_1644 91 chr8:131657747-131683625 25.9

none

(mRNA present) no

Loss Affy6.0_D_1024 16 chr8:138328381-138425832 97.5 none no

Page 102: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

89

Loss Affy500K_D_134 74 chr9:2235919-2351848 115.9 none no

Loss Affy500K_D_43 & Affy6_D_1112 17

chr9:75525136-75638229 113.1 none no

Loss Affy6.0_D_1109 64 (Admixed) chr9:75637796-75657448 19.7 none no

Loss Affy6.0_D_1103 27 chr9:95533791-95585819 52.0 none no

Loss Affy6.0_D_1108 4 chr9:102517861-102553347 35.5

none

(mRNA present) no

Loss Affy500K_D_40 & Affy6_D_1198 16

chr11:39882017-40010124 128.1 none no

Loss Affy500K_D_6 2 (EBV) chr11:89730130-89888327 158.2

none

(mRNA present) no

Loss Affy6.0_D_1205 204 chr11:104741261-104793318 52.1

none

(mRNA present) no

Loss Affy500K_D_83 & Affy6_D_1253 40

chr12:130382166-130686668 304.5

none

(mRNA present) no

Loss Affy6.0_D_1336 101 chr13:39389124-39515818 126.7 none no

Loss Affy6.0_D_1377 54 chr14:42513084-42541303 28.2 none no

Loss Affy500K_D_121 & Affy6_D_1383 64 (Admixed)

chr14:85216336-85436133 219.8 none no

Loss Affy6.0_D_1679 68 chr15:57862260-57891107 28.8 none no

Loss Affy6.0_D_1428 35 chr15:60314660-60333770 19.1

none

(mRNA present) no

Loss Affy6.0_D_1467 67 chr16:54046835-54056160 9.3

none

(mRNA present) no

Loss Affy6.0_D_1601 11 (Asian) chr20:50766640-50780316 13.7 none no

Loss Affy500K_D_225 114 (EBV) chr21:23160325-23267106 106.8 none no

Loss Affy6.0_D_542 61 chr2:148426768-148464448 37.7 ORC4 no

Loss Affy6.0_D_925 101 chr6:162342089-162365931 23.8 PARK2 no

Loss Affy500K_D_234 117 (EBV) chr5:95640616-96152064 511.4

PCSK1, ERAP1, CAST

High-level amplification

Loss Affy6.0_D_1065 61 chr8:85558196-85579549 21.4 RALYL no

Loss Affy6.0_D_1527 203 chr18:38603464-38605275 1.8 RIT2 no

Loss Affy6.0_D_1484 35 chr17:75852813-75870192 17.4 RNF213 no

Loss Affy6.0_D_741 28 chr4:53829489-53875712 46.2 SCFD2 no

Page 103: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

90

Loss Affy6.0_D_549 99 chr2:78025162-78059816 34.7 SNAR-H no

Loss Affy500K_D_98 & Affy6_D_743 54

chr4:147802903-148190197 387.3 TTC29 no

Fourteen genes (including one small nuclear RNA) had at least part of their coding regions affected by

FPC-specific losses, and 74 genes (including 3 microRNAs) had at least part of their coding regions

affected by FPC-specific gains (Table 12).

Table 12 – Genes whose coding regions are affected by FPC-specific CNVs

CNV type Gene Entrez Id Official full name Position (hg18) Array Sample

Extent of gene affected

Gain OR1A1 8383 olfactory receptor, family 1, subfamily A, member 1

chr17:2932535-3161719 500K 28 full

Gain OR1A2 26189 olfactory receptor, family 1, subfamily A, member 2

chr17:2932535-3161719 500K 28 full

Gain OR1D2 4991 olfactory receptor, family 1, subfamily D, member 2

chr17:2919396-3019805

500K & Affy6 28 full

Gain OR1G1 8390 olfactory receptor, family 1, subfamily G, member 1

chr17:2919396-3019805

500K & Affy6 28 full

Gain OR1D4 653166

olfactory receptor, family 1, subfamily D, member 4 (gene/pseudogene)

chr17:2932535-3184579

500K & Affy6 28 full

Gain OR3A1 4994 olfactory receptor, family 3, subfamily A, member 1

chr17:2932535-3184579

500K & Affy6 28 full

Gain OR3A2 4995 olfactory receptor, family 3, subfamily A, member 2

chr17:2932535-3184579

500K & Affy6 28 full

Gain OR3A4 390756 olfactory receptor, family 3, subfamily A, member 4

chr17:2932535-3184579

500K & Affy6 28 full

Gain CDYL2 124359 chromodomain protein, Y-like 2

chr16:78810438-79258408 500K 37 partial

Gain DYNLRB2 83657 dynein, light chain, roadblock-type 2

chr16:78810438-79258408 500K 37 full

Gain MIR548H4 100313884 microRNA 548h-4 chr16:78810438-79258408 500K 37 partial

Gain METTL4 64863 methyltransferase like 4 chr18:2254263-2555103

500K & Affy6 44 partial

Gain ARHGAP28 79822 Rho GTPase activating protein 28

chr18:6838462-7291170

500K & Affy6 62 partial

Gain LAMA1 284217 laminin, alpha 1 chr18:6838462-7291170

500K & Affy6 62 full

Gain LOC400643 400643 hypothetical LOC400643 chr18:6838462-7291170

500K & Affy6 62 full

Gain LRRC30 339291 leucine rich repeat containing 30

chr18:6838462-7291170

500K & Affy6 62 full

Gain SP110 3431 SP110 nuclear body protein chr2:230753632-230823051 500K 65 partial

Gain SP140 11262 SP140 nuclear body protein chr2:230753632-230823051 500K 65 partial

Gain GRID2 2895 glutamate receptor, ionotropic, delta 2

chr4:93344017-93591992 500K 79 partial

Page 104: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

91

Gain ATR 545 ataxia telangiectasia and Rad3 related

chr3:143693491-143928895 500K 82 partial

Gain PLS1 5357 plastin 1 chr3:143693491-143928895 500K 82 full

Gain TRPC1 7220

transient receptor potential cation channel, subfamily C, member 1

chr3:143693491-143928895 500K 82 partial

Gain IDO2 169355 indoleamine 2,3-dioxygenase 2

chr8:39935640-39943638

500K & Affy6 123 partial

Gain EXOC4 60412 exocyst complex component 4

chr7:133223330-133393933

500K & Affy6 125 partial

Gain RSPO2 340419 R-spondin 2 homolog (Xenopus laevis)

chr8:108696004-108994913

500K & Affy6 18 partial

Gain PLXDC2 84898 plexin domain containing 2 chr10:19849680-20589237 500K 26 partial

Gain AGBL4 84871 ATP/GTP binding protein-like 4

chr1:49856085-50089082 Affy6 127 partial

Gain EHBP1L1 254102 EH domain binding protein 1-like 1

chr11:65027491-65201466 Affy6 20 full

Gain FAM89B 23625 family with sequence similarity 89, member B

chr11:65027491-65201466 Affy6 20 full

Gain KCNK7 10089 potassium channel, subfamily K, member 7

chr11:65027491-65201466 Affy6 20 full

Gain LOC254100 254100 hypothetical LOC254100 chr11:65027491-65201466 Affy6 20 full

Gain LTBP3 4054 latent transforming growth factor beta binding protein 3

chr11:65027491-65201466 Affy6 20 full

Gain MALAT1 378938

metastasis associated lung adenocarcinoma transcript 1 (non-protein coding)

chr11:65027491-65201466 Affy6 20 partial

Gain MAP3K11 4296 mitogen-activated protein kinase kinase kinase 11

chr11:65027491-65201466 Affy6 20 full

Gain MIR4489 100616284 microRNA 4489 chr11:65027491-65201466 Affy6 20 full

Gain MIR4690 100616292 microRNA 4690 chr11:65027491-65201466 Affy6 20 full

Gain PCNXL3 399909 pecanex-like 3 (Drosophila) chr11:65027491-65201466 Affy6 20 full

Gain RELA 164014

v-rel reticuloendotheliosis viral oncogene homolog A (avian)

chr11:65027491-65201466 Affy6 20 partial

Gain SCYL1 57410 SCY1-like 1 (S. cerevisiae) chr11:65027491-65201466 Affy6 20 full

Gain SIPA1 602180 signal-induced proliferation-associated 1

chr11:65027491-65201466 Affy6 20 full

Gain SSSCA1 10534

Sjogren syndrome/scleroderma autoantigen 1

chr11:65027491-65201466 Affy6 20 full

Gain PTGR2 145482 prostaglandin reductase 2 chr14:73405361-73432688 Affy6 67 partial

Gain ZNF410 57862 zinc finger protein 410 chr14:73405361-73432688 Affy6 67 partial

Gain CELF6 60677 CUGBP, Elav-like family member 6

chr15:70381008-70436843 Affy6 44 partial

Gain HEXA 3073 hexosaminidase A (alpha polypeptide)

chr15:70381008-70436843 Affy6 44 partial

Gain GJD2 57369 gap junction protein, delta 2, 36kDa

chr15:32814039-32848252 Affy6 99 full

Page 105: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

92

Gain CCDC106 29903 coiled-coil domain containing 106

chr19:60824299-60923809 Affy6 62 full

Gain EPN1 29924 epsin 1 chr19:60824299-60923809 Affy6 62 full

Gain NLRP9 338321 NLR family, pyrin domain containing 9

chr19:60824299-60923809 Affy6 62 partial

Gain U2AF2 11338 U2 small nuclear RNA auxiliary factor 2

chr19:60824299-60923809 Affy6 62 full

Gain ZNF580 51157 zinc finger protein 580 chr19:60824299-60923809 Affy6 62 full

Gain ZNF581 51545 zinc finger protein 581 chr19:60824299-60923809 Affy6 62 full

Gain ZNF784 147808 zinc finger protein 784 chr19:60824299-60923809 Affy6 62 partial

Gain BRSK1 84446 BR serine/threonine kinase 1 chr19:60436319-60696243 Affy6 20 full

Gain COX6B2 125965

cytochrome c oxidase subunit VIb polypeptide 2 (testis)

chr19:60436319-60696243 Affy6 20 full

Gain FAM71E2 284418 family with sequence similarity 71, member E2

chr19:60436319-60696243 Affy6 20 full

Gain HSPBP1 612939

HSPA (heat shock 70kDa) binding protein, cytoplasmic cochaperone 1

chr19:60436319-60696243 Affy6 20 full

Gain IL11 3589 interleukin 11 chr19:60436319-60696243 Affy6 20 full

Gain ISOC2 79763 isochorismatase domain containing 2

chr19:60436319-60696243 Affy6 20 full

Gain NAT14 57106 N-acetyltransferase 14 (GCN5-related, putative)

chr19:60436319-60696243 Affy6 20 full

Gain PPP6R1 22870 protein phosphatase 6, regulatory subunit 1

chr19:60436319-60696243 Affy6 20 partial

Gain RPL28 6158 ribosomal protein L28 chr19:60436319-60696243 Affy6 20 full

Gain SHISA7 729956 shisa homolog 7 (Xenopus laevis)

chr19:60436319-60696243 Affy6 20 full

Gain SSC5D 284297

scavenger receptor cysteine rich domain containing (5 domains)

chr19:60436319-60696243 Affy6 20 partial

Gain SUV420H2 84787 suppressor of variegation 4-20 homolog 2 (Drosophila)

chr19:60436319-60696243 Affy6 20 full

Gain TMEM150B 284417 transmembrane protein 150B chr19:60436319-60696243 Affy6 20 full

Gain TMEM190 147744 transmembrane protein 190 chr19:60436319-60696243 Affy6 20 full

Gain TMEM238 388564 transmembrane protein 238 chr19:60436319-60696243 Affy6 20 full

Gain UBE2S 27338 ubiquitin-conjugating enzyme E2S

chr19:60436319-60696243 Affy6 20 full

Gain ZNF628 89887 zinc finger protein 628 chr19:60436319-60696243 Affy6 20 full

Gain IFT80 57560 intraflagellar transport 80 homolog (Chlamydomonas)

chr3:161448573-161518365 Affy6 123 partial

Gain PYHIN1 149628 pyrin and HIN domain family, member 1

chr1:157133096-157188413 Affy6 123 partial

Gain MCM4 4173

minichromosome maintenance complex component 4

chr8:49008716-49049657 Affy6 202 partial

Gain PRKDC 5591

protein kinase, DNA-activated, catalytic polypeptide

chr8:49008716-49049657 Affy6 202 partial

Page 106: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

93

Loss NAT1 9

N-acetyltransferase 1 (arylamine N-acetyltransferase)

chr8:17998784-18145035 500K 77 full

Loss KALRN 8997 kalirin, RhoGEF kinase chr3:125676839-125815545 500K 85 partial

Loss ANGPT2 285 angiopoietin 2 chr8:6371546-6430547 500K 112 partial

Loss CAST 831 calpastatin chr5:95640616-96152064 500K 117 full

Loss ERAP1 51752 endoplasmic reticulum aminopeptidase 1

chr5:95640616-96152064 500K 117 full

Loss PCSK1 5122 proprotein convertase subtilisin/kexin type 1

chr5:95640616-96152064 500K 117 partial

Loss TTC29 83894 tetratricopeptide repeat domain 29

chr4:147802903-148190197

500K & Affy6 54 full

Loss RNF213 57674 ring finger protein 213 chr17:75852813-75870192 Affy6 35 partial

Loss ORC4 5000 origin recognition complex, subunit 4

chr2:148426768-148464448 Affy6 61 partial

Loss SNAR-H 100170221 small ILF3/NF90-associated RNA H

chr2:78025162-78059816 Affy6 99 full

Loss HTN1 3346 histatin 1 chr4:70867305-70952889 Affy6 69 partial

Loss HTN3 3347 histatin 3 chr4:70867305-70952889 Affy6 69 full

Loss STATH 6779 statherin chr4:70867305-70952889 Affy6 69 full

Loss SCFD2 152579 sec1 family domain containing 2

chr4:53829489-53875712 Affy6 28 partial

Fifty-five percent of the genes in Table 12 (48/88) have reported non-silent mutations (missense or

nonsense variants; insertions/deletions; gene fusions) in different cancers according to the COSMIC v.55

database, whereas only 37% of genes in all 500K + Affymetrix 6.0 FPC CNVs (p=0.002) and only 42%

of genes in all 500K + Affymetrix 6.0 control CNVs (p=0.022) had such mutations. None of the genes

overlapped by FPC-specific losses were reported to have downregulated expression in pancreatic cancer

in the Pancreatic Expression Database, whereas six genes overlapped by gains had reports of upregulation

in pancreatic adenocarcinoma and three genes were reported to be upregulated in intraductal papillary

mucinous neoplasm, a pre-invasive lesion. Furthermore, four FPC-specific gains overlapped regions

reported to have high-level amplification in pancreatic adenocarcinoma in the Pancreatic Expression

Database. The four gains overlap eight genes, of which four genes (LOC400643, DYNLRB2, LRRC30,

and LAMA1) are entirely encompassed by their respective gains. LOC400643 is a non-coding RNA and

has no known association with cancer. There are no reports of differential expression in pancreatic cancer

or somatic mutations in DYNLRB2, which codes for a light chain component of cytoplasmic dynein 1

complex but this gene is reported to be involved in TGF-beta/SMAD3 signaling550 and reported to be

downregulated in hepatocellular carcinoma551. LRRC30, which codes for leucine-rich repeat-containing

protein 30, has no reports of differential expression in pancreatic cancer or other association with

tumorigenesis, but does have two reported mutations in the COSMIC database (one nonsense mutation in

Page 107: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

94

ovarian serous carcinoma and one missense mutation in hepatocellular carcinoma). LAMA1 codes for

laminin, an extracellular matrix component that binds to cells via high-affinity receptors and mediates

attachment, migration, and organization of cells into tissues during embryogenesis.552 The COSMIC v.55

database reports 18 protein-altering or truncating somatic mutations in this gene in tumors of the

pancreas, ovary, central nervous system, large intestine, breast, upper aerodigestive tract, and skin. In

comparison, for 10,849 COSMIC v.55 database genes that had at least one non-silent/non-intronic

mutation, the average number of mutations per gene is 3.7. A similar average number of reported somatic

mutations is observed in genes affected by CNVs in our study (determined from the compiled data of

500K and Affymetrix 6.0 arrays): 3.6 mutations per gene for FPC-specific genes (p=0.983), 3.4 mutations

per gene for all FPC genes (p=0.821), and 3.7 mutations per gene for all control genes (p=0.955). There is

also evidence for differential expression of LAMA1 in tumors of sites other than the pancreas: one study

reported hypermethylation and under-expression of LAMA1 in colorectal cancer553, while another study

reported overexpression of this gene in glioblastoma554.

Lastly, for non-complex CNV loci (i.e. only losses or gains per locus), we performed Fisher’s exact

testing to determine if any loci had a significantly different proportion in cases relative to controls. After

multiple-correction testing, no loss or gain locus demonstrated a significant difference.

5. Discussion Identifying predisposition genes associated with FPC has been challenging due to the rapid lethality of the

disease, low rate of tumor resection (resulting in paucity of tissue specimens for analysis), and probable

genetic heterogeneity. An estimated 20% of hereditary cases are linked to cancer syndromes caused by

alterations in known genes. However, most families that demonstrate clustering of pancreatic cancer do

not meet criteria for known cancer syndromes.161 We performed an analysis of germline CNVs in

pancreatic cancer patients suspected to have a heritable genetic cause for their disease. These primarily

included members of families with three or more affected cases, but also included families with only one

or two affected cases if at least one of the cases was under age 50 at diagnosis. Three different

computational algorithms were used for CNV identification in each array to identify high confidence

CNVs, an approach that is commonly used in CNV studies. One advantage to utilizing different

algorithms is improved sensitivity for detecting CNVs, since it multiple studies have illustrated

significant non-overlap between algorithms. For our purpose, the use of multiple CNV-calling algorithms

identified variants with a very high likelihood of validation (the “high-confidence” set), as verified by

qPCR and/or second-array hybridization. This allowed us to focus our downstream analysis on these

high-confidence CNVs, whose expected validation rate was > 95%, rather than low-confidence CNVs

(meaning those called by only a single algorithm on a single chip), of which only half appeared to be true

Page 108: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

95

genomic alterations. Those results are in keeping with recently published comparative assessment of

CNV-calling algorithms and platforms.555 While we acknowledge that our CNV list is not exhaustive,

this is a logistical limitation of the field as it is neither plausible to genotype hundreds of samples on

multiple platforms nor to perform qPCR validation on hundreds of CNVs. Thus, our approach at least

ensured that we were working with a highly valid set of data.

Interestingly, we noted a discrepancy in the proportion of high-confidence CNVs between samples

genotyped at TCAG in Toronto (all the cases and a small subset of controls) and those genotyped in

Quebec (most controls). We attributed this difference to an apparently higher level of noise in control

arrays genotyped in Quebec. Pinto et al.555 commented on the effect of inter-laboratory variability on

CNV validation rate, finding it to be less important than reproducibility of the chosen platform or calling

algorithm. However, they do note that Affymetrix arrays (the platform used in our study) are an

exception to this, being highly dependent on the reference data set used for the analysis. Since we used

the total number of samples within each group (i.e. those genotyped at each centre constituted a group) as

reference, a noisier set of data from the Quebec samples would be expected to result in a greater

proportion of noisy and/or unreliable calls. We expect that some of the control low-confidence CNVs

would in fact be real calls, so we advocate that CNVs of interest that are to be investigated futher should

be checked for CNV calls in controls and those should be validated before further analysis. (We

performed such validation for the CNV G_97 that overlapped TGFBR3; it appeared to overlap a low-

confidence duplication in an ARCTIC control but this putative gain was demonstrated by qPCR to be a

false call).

To date, this is the largest study of germline CNVs in unrelated cancer patients from high-risk families. A

previous study of 57 pancreatic cancer patients from 56 high-risk kindreds (each containing at least a pair

of affected first-degree relatives) used an oligonucleotide-based CGH platform to identify FPC-specific

germline CNVs, filtering out losses or gains that were also identified in 607 controls (372 were analyzed

in the same study, and 235 were previously reported in two other studies).345 Twenty-five FPC-specific

losses overlapping 81 genes and 31 FPC-specific gains overlapping 425 genes were identified. In our

study, we investigated 133 members of 131 high-risk kindreds, of whom 17 subjects were part of the

previous CGH study, and we identified 93 FPC-specific CNVs using a combination of Affymetrix 500K

and Affymetrix 6.0 arrays. The median size of FPC-specific CNVs in the CGH study was larger than in

our FPC-specific CNVs (losses: 151kb vs. 35.5kb; gains: 379kb vs. 73kb). This may be due, in part, to

the lower resolution of the CGH platform (mean inter-marker distance = 30kb) compared to the

Affymetrix 500K array (median inter-marker distance = 2.5kb) and Affymetrix 6.0 array (median inter-

marker distance = 0.7kb) used in our study. It may also reflect enrichment for somatic CNVs caused by

EBV-transformation, since all FPC DNA samples in the CGH study were extracted from EBV-

Page 109: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

96

transformed cells whereas only 29 samples in our population were EBV lymphoblasts. The size of

control populations used to filter CNVs was larger in our study and the number of control CNVs from

non-BAC studies currently catalogued in the DGV is greater than was available at the time of publication

of the previous FPC CNV study. As a result, some of the CNVs identified as “FPC-specific” in the

previous study overlapped CNVs in our controls and/or in the DGV. This may explain the slightly higher

(FPC-specific CNVs)-to-sample ratio observed in the CGH study (approximately 1 CNV per sample)

compared to our study (0.8 CNV per sample).

It is difficult to estimate concordance in CNV calling between the two studies, as we do not know how

many of the 56 FPC-specific CNVs reported in the CGH study were identified in samples that were also

used in our study. Only 1/25 loss and 3/31 gain loci reported in the CGH study were also observed in our

analysis in samples common to both studies, and all of these overlapped CNVs in our controls and/or in

the DGV. Interestingly, multiple reports have demonstrated generally low concordance for CNV calling

on different platforms/algorithms when analyzing the same DNA source.259,555 In addition to CNVs

identified in cases common to both studies, there was one FPC-specific loss locus which was identified in

two different subjects (one in each study). The region overlapped a gene, DOCK1 (dedicator of

cytokinesis 1), but in our study the loss only encompassed an intronic portion of the gene. This gene may

have a role in cellular proliferation and migration556,557, and it has been reported to be overexpressed in

high-grade dysplastic lesions (PanIN3), suggesting that it may be important in advancing

tumorigenesis.558

A number of other genome-wide germline CNV analyses have been reported for various cancers, but only

a few have studied familial cancers. In addition to the aforementioned familial pancreatic cancer study,

microarray-based germline CNV studies have been reported for Li-Fraumeni syndrome348, young-onset

and/or familial colorectal cancer in families without mutations in known predisposition genes347, and

BRCA1-associated ovarian cancer.346 Shlien et al.348 described an increased frequency of germline CNVs

in 33 Li-Fraumeni family members carrying mutations in the TP53 gene (of which 23 were affected by

cancer), compared to 20 Li-Fraumeni family members with wildtype TP53 and 70 healthy controls. Since

many of the CNVs overlapped or were near important cancer genes, the authors proposed a model

whereby baseline genomic instability in these patients progresses over time, leading to more frequent and

larger copy number alterations affecting genes that contribute to tumorigenesis. In our study, patients and

controls had a similar number of alterations per genome, with similar CNV size, ratios of losses to gains,

likelihood of CNVs to overlap genes, and proportion of genic CNVs that were associated with cancer.

The lack of significant difference in the germline CNV profile between cases and controls suggests that

causative genes for pancreatic cancer do not significantly impact genomic stability in non-tumor cells.

Our results are similar to those of Yoshihara et al.346 who compared 68 Japanese subjects with germline

Page 110: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

97

BRCA1 mutations (of whom 51 had ovarian cancer), 34 sporadic ovarian cancer patients, and 47 healthy

controls. They reported no significant difference in the per-genome total number of CNVs between

BRCA1-mutation carriers and controls, although the number of deletions was higher in the BRCA1

subjects. Otherwise, they found no evidence for differential clustering of the global CNV data between

groups, and no correlation of age at diagnosis with CNV frequency.

Our proposal for CNV prioritization emphasized regions that segregate with disease in the same family

and/or overlapping CNVs in multiple unaffected cases (and absent in controls). We only had access to

CNV data for two sets of relatives (two sibling pairs), neither of which demonstrated evidence of FPC-

specific CNVs that were co-inherited within the same family. When looking at overlapping CNVs in

cases, one region that caught our interest contained two overlapping duplications in two unrelated cases,

both of which intersected the TGFBR3 gene. While none of the ARCTIC controls had a validated CNV

in this region, a single POPGEN control from the Affy6 dataset contained a duplication that overlapped

the cases’ duplications. However, the control’s duplication did not intersect the gene to the same extent

as the cases, and in fact only appeared to transect the 5’ end of one of multiple isoforms of the gene

(whereas the cases intersected all isoforms). The significance of the TGF-beta pathway in cancer

initiation and progression in general, and in pancreatic cancer in particular, made this duplication

especially interesting to us. We successfully validated this CNV in both affected cases, and we further

demonstrated that it was heritable in one of the two families for which we had access to DNA from

multiple relatives. Furthermore, we successfully identified the exact breakpoint of the duplication,

proving in the process that it is a tandem duplication, by a combined approach of qPCR walk-along and

Sanger-sequencing of a PCR-amplified fragment. This breakpoint contained three base pairs that do not

appear to be derived from the sequence of either end of the duplication (“TAT”), which is a common

finding at the breakpoints of duplications caused by non-homologous end-joining (NHEJ).559 However,

once we were able to design a sufficiently small fragment containing the region of the breakpoint to test

its presence in FFPE-derived tissue from an affected sister of the proband, we found that this duplication

does not cosegregate with pancreatic cancer in that family. This effectively refuted the implication of this

duplication as a cause for familial pancreatic cancer. (We also note that the breakpoints of both case

duplications fell within intronic regions of TGFBR3, further decreasing the likelihood of disrupting the

gene). While this direction in our investigation ultimately proved fruitless, it confirmed the challenge of

interpreting the impact of CNVs, an aspect of CNV research that has lagged behind the ability to detect

CNVs or statistical methods for performing genome-wide association studies using CNVs as disease

markers. As illustrated by our effort, the process of fine-mapping CNV breakpoints is painstaking but

necessary to understanding the precise region that is transected by a duplication or deletion. And even

that alone is not sufficient to prove that a CNV causes a particular phenotype; for that, further functional

Page 111: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

98

work would be required such as demonstration of expression correlation to copy number, and impact of

altered expression on cellular function.

Next in priority for our analysis were single-case FPC-specific CNVs that overlapped genic regions. We

identified 88 genes whose coding regions were partially or completely encompassed by FPC-specific

CNVs, and although some are unlikely to be candidate FPC genes (e.g. olfactory receptor genes), many

are functionally relevant to carcinogenesis, and are differentially expressed and/or overlap regions that are

reported deleted or amplified in pancreatic adenocarcinoma. Moreover, the proportion of genes that were

reported in COSMIC v.55 to have protein-altering mutations in tumors or malignant cell lines was

significantly higher in FPC-specific genes than in either the full population of cases or in controls. This

further suggests that FPC-specific CNVs are enriched for cancer-associated genes. In the report by

Yoshihara et al.346, the primary genetic etiology for the hereditary cancer was already known (BRCA1),

and the authors presented genes overlapped by BRCA1-specific CNVs as potential modifiers to the

development of cancer. Alternatively, the study by Venkatachalam et al.347 identified seven genic CNVs

specific to patients with familial colorectal cancer who have no known genetic mutation, each CNV found

in a single individual only. In that study, like ours, each gene is considered a potential causative gene for

familial colorectal cancer. None of the genes overlapped by cancer patient CNVs reported by Shlien et

al.348, Yoshihara et al.346, or Venkatachalam et al.347 were part of our FPC-specific gene list. It should be

noted that, in addition to the RefSeq genes we highlighed in this paper, 6 FPC-specific gains and 11 FPC-

specific losses that did not overlap RefSeq genes did overlap expressed human mRNA. While these

regions are of lower interest relative to bonefide genes, some published CNV studies have reported

associations of non-genic regions with disease, demonstrating evidence for hitherto unidentified genes

and/or regulatory elements.340,344

The final stage of our CNV prioritization involved calculating the difference in proportion of cases vs.

controls containing each simple loss or simple gain locus. This approach would theoretically identify

CNVs that are detected in both cases and controls but at a higher frequency in cases. No locus achieved a

statistically significant p-value after multiple-testing correction. This was not unexpected, since the

number of cases included in our analysis was too small for the purpose of identifying a significant

genome-wide association result, unless a very high effect size was associated with a CNV. Furthermore,

the biases inherent in the design of our study (e.g. the Affymetrix 500K array is suboptimal for detecting

recurrent CNVs relative to rare CNVs, the differences in noise level and high-confidence CNV calling

between cases and controls) meant that such an analysis would be inappropriate with our dataset. A

properly designed genome-wide association CNV study requires a well-validated platform for genotyping

CNVs, such as the Affymetrix 6.0 array, and the necessary sample size for achieving sufficient power in

the statistical analysis. Alternatively, we note that some of the loci in our study had a significant p-value

Page 112: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

99

with a higher case frequency before multiple-correction testing, and those regions can be selected for

further testing in an independent case-control study that directly genotypes the CNVs of interest (for

example using a PCR-based approach). Such a technique was in fact utilized by Huang et al.344 for

identifying the 6q13 deletion associated with pancreatic cancer.

In conclusion, we have presented a list of candidate predisposition genes for FPC overlapped by germline

CNVs that are specific to the largest cohort of high-risk pancreatic cancer patients published to date. One

limitation of our analysis is the coverage and resolution of the platform we used for primary CNV

discovery (i.e. Affymetrix 500K array). Since the completion of our study, novel methods of CNV

detection have become available, including very high resolution tiling microarrays and next-generation

sequencing. We expect future studies using these methods to independently test our findings and detect

additional FPC candidate genes. Some of the samples containing FPC-specific CNVs in our study

differed in ancestry from the majority of controls, raising the possibility that these CNVs are specific to

the respective ancestry group rather than to pancreatic cancer risk. Those CNVs should be investigated

further in a larger ethnicity-matched control cohort. Despite these limitations, our list of FPC-specific

genes contains several interesting candidates and further screening for mutations in other high-risk

pancreatic cancer subjects, along with investigation of the functional role of these genes, would add

support to the role of one or more genes in predisposition to FPC.

Page 113: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

100

Chapter 4 - Exome Sequencing in a Familial Pancreatic Cancer Kindred

1. Abstract In recent years, the significant drop in cost of next-generation sequencing and target-region enrichment

have enabled researchers to use whole-exome sequencing for identification of predisposition genes for a

variety of Mendelian disorders, a few of which have been familial cancer syndromes. In this study, we

aimed to apply this novel method to investigate the genetics of a family containing four relatives affected

by pancreatic cancer. Blood-derived DNA was available from three affected relatives (two siblings and

their maternal uncle), and we also included an unaffected maternal aunt as a control. Target-enrichment

was performed using Nimblegen in-solution array and sequencing was performed by Illumina GAII

parallel sequencer. We present two alternative hypotheses: (1) in this family, rare variants that are

inherited by the three affected individuals and not inherited by the unaffected aunt are candidate

susceptibility genes for familial pancreatic cancer; and (2) in this family, rare variants that are inherited

by the three affected individuals, whether or not the are present in the unaffected aunt, are candidate

susceptibility genes for familial pancreatic cancer. We present four potential variant filtration models to

develop a list of candidate genes for further investigation, but we focus our downstream analysis on one

model. The validation rate for heterozygous single nucleotide variants and indels was high (> 80%) but

significantly lower for homozygous variants. In Model#1 of our analysis, we identify 9 candidate genes

with heteozygous single nucelotide variants in the three affected family members and absent in the

unaffected aunt, of which we further investigate the two top-ranked genes using Sanger sequencing in a

cohort of unrelate high-risk pancreatic cancer patients. We do not identify further subjects with

unreported variants in those genes. Further investigation of other genes in this model and the other three

filtration models will be possible in future exome sequening studies on other pancreatic cancer patients.

2. Introduction In the previous chapter, we performed a genome-wide analysis of germline CNVs in pancreatic cancer

patients from high-risk families to identify candidate susceptibility genes. As was discussed, this was

based on the hypothesis that a proportion of syndromic cancer cases occur due to large rearrangements

affecting the causative gene. It remains, though, that most variants which cause hereditary cancer are

point mutations, most commonly occurring in coding regions or splice-sites, thus altering the encoded

protein or causing premature termination. Until recently, such variants could only be identified by a

candidate-gene approach and laborious Sanger sequencing. However, the development of target-capture

arrays for building DNA arrays enriched for coding regions (“the exome”), in combination with

Page 114: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

101

decreasing cost of massively parallel next-generation sequencing, has enabled interrogation of entire

genomes for susceptibility variants. Indeed, over the past couple of years, a large number of exome-based

studies have been published identifying causative genes for heretofore unexplained Mendelian diseases.

(See Literature Search for more details).

While the number of studies specifically addressing cancer syndromes has been small, it is evident that a

similar strategy can be applied to identifying susceptibility genes in individuals or families who appear to

inherit the disease in a Mendelian fashion (dominant or recessive). Therefore, we chose a family

consisting of two affected siblings, their affected mother, and an affected maternal uncle to investigate

using exome sequencing. The CNV profile of the two siblings was already characterized in Chapter III

(CNV-case ID-89 here identified as ID-001 and CNV-case ID-30 here identified as ID-006), and all

deletions and gains segregating in the two siblings were also found in controls. (Indeed, only one deletion

was FPC-specific, found exclusively in sibling ID-006, and it occurred in a non-genic region. This CNV

was not identified in sibling ID-001). For the study described in this chapter, blood-derived DNA was

available for the two siblings and their affected maternal uncle (but not their mother). We chose to also

include DNA from an unaffected maternal aunt to act as a control for filtering out candidate variants, with

the hypothesis that all three affecteds would be carriers of a high-penetrance variant and that the 80-year-

old unaffected aunt is unlikely to be a carrier. However, we acknowledge that, since we do not know the

penetrance of the gene in question, the aunt may also be an unaffected carrier. For that reason, we also

present an alternate hypothesis that considers the aunt a possible carrier of the variant of interest, and thus

identifying variants shared among the affected members whether or not present in the aunt.

In the methods below, filtration models#1 and #3 fall under the first hypothesis: variants inherited by the

three affected relatives and absent in the unaffected are in candidate susceptibility genes for FPC;

filtration models # 2 and #4 are based on the second hypothesis: variants inherited by the three affected

relatives are in candidate susceptibility genes for FPC, regardless of inheritance in the unaffected family

member. As described in this chapter, we only focus our downstream investigation and candidate gene

screening on results from model#1, pertaining to the first hypothesis.

3. Materials & Methods

3.1 Description of Family C We identified a consanguinous family of Maltese ancestry with a strong history of pancreatic cancer: the

proband was a male (ID-001) who presented with metastatic pancreatic cancer at age 42 years; soon after,

one of his sisters (ID-006) also presented with metastatic pancreatic cancer at age 34 years. Neither

patient had a resectable tumor and both subjects died within one to two years of diagnosis. The two

Page 115: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

102

siblings were part of a sibship of seven (in addition, their mother had two miscarriages); a brother was

affected with low-grade B-cell follicular lymphoma at age 45 and remains alive and free of disease today

at age 49. Their mother had previously undergone a pancreaticoduodenectomy for pancreatic cancer at

age 58 but also died soon after from disease recurrence. Several years later, a maternal uncle (ID-011)

developed metastatic pancreatic cancer at age 80 while enrolled in an MRI-based screening program and

died of his disease. Figure 17 illustrates the pedigree of the family.

Figure 17 – Pedigree of FPC kindred investigated by exome sequencing

Figure 17 Legend: Large red box indicates affected mother without available DNA for sequencing; blue circles

indicate family members on whom exome sequencing was performed. Filled box = affected male; filled circle = affected female; unfilled box = unaffected male; unfilled circle = unaffected female. (“affected” refers to pancreatic cancer)

Blood samples were taken from all seven siblings (including the two pancreatic cancer patients before

they died), as well as from the affected maternal uncle and an unaffected maternal aunt (ID-010). No

blood sample was available for the mother. DNA was extracted from blood samples as per previously

described protocol (see Chapter II of this thesis).

3.2 Target-capture, next generation sequencing, and raw-data analysis [Note: DNA library preparation and sequencing, alignment of reads, and variant calling was performed by

members of Dr. John McPherson lab at Ontario Institute for Cancer Research (Quang Trinh). Data was

provided to W. Al-Sukhni for validation and downstream variant filtration and subsequent Sanger

sequencing in other patients. Most PCR-amplifcation for variant validation and screening in other

ID-001 ID-006

ID-011 ID-010

Page 116: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

103

patients described in this chapter was performed by W. Al-Sukhni, with assistance from H. Kim and T.

McPherson.]

DNA samples from the siblings, uncle, and aunt were enriched for exomic regions using Nimblegen

SeqCap EZ Human Exome Library v2.0, as per industry protocol. This in-solution array contains 2.1

million empirically optimized oligonucleotide probes targeting approximately 300,000 exons based on

annotation of consensus coding sequence (CCDS) project (Sep 2009)374, RefSeq database (Jan 2010)560

and miRBase database (v.14, Sep 2009)561, with a total target size of approximately 35Mb. Resulting

DNA libraries were sequenced using the Illumina GAII next-generation sequencer using paired-end

2x101 standard sequencing procedure provided by Illumina, generating 101-bp reads to align against the

reference genome. For ID-001 and ID-006, the data in this analysis were generated by 6 sequencing lanes

each, for ID-010, two lanes were used, and for ID-011 three lanes were used.

Raw data was processed through an empirically-validated workflow: First, basic quality controls (QC)

such as number of reads, average base quality per cycle, and percentage of bases with their corresponding

Phred quality values were examined on each lane of raw data. Next, raw reads were aligned to the

reference human genome (GRCH37) using Novoalign562, and only uniquely aligned reads were included

for downstream analysis. After documenting several QC parameters (e.g. % of reads aligned, % of reads

aligned in correct orientation, % of reads aligned only as singletons), duplicated fragments that have

exactly the same start and end points are presumed to be PCR artifacts and are removed (“collapsing”)

using Picard command-line tools (http://picard.sourceforge.net). Further QC parameters to be assessed at

this point include comparing percent of reads aligned before and after collapsing, proportion of target

region that is covered at least once by sequencing, percent of bases covered at incrementally higher depth

of coverage, and average depth of coverage across the captured target region.

At this point, the data was processed through GATK563 software for quality recalibrations, local

realignments, and variant/indel calling. Variants passing a minimum quality score threshold of 30 were

considered reliable. A minimum read depth of 8x was considered necessary to call a variant, and the

maximum allowable number of single nucleotide variants (SNVs) in a 10-base window was two.

Heterozygosity/homozygosity for each variant was also estimated by GATK.

3.3 Validation of variants Validation of exome sequencing data was performed by two approaches. First, we took advantage of the

fact that the two siblings were previously genotyped on Affymetrix 500K array for the CNV project (see

Chapter 3 of this thesis). We identified common SNPs in common to both platforms for each sample and

checked the concordance rate in genotype call between the two platforms. The microarray genotype calls

Page 117: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

104

were determined using the Affymetrix Genotyping Console (GTC 2.1), which uses the BRLMM564

algorithm for assigning genotypes. This algorithm has >99% accuracy in detecting homozygous and

heterozygous variant alleles. (Note that we were not able to directly identify SNPs that were wildtype

(i.e. reference) allele in the exome data since only variants were called and provided to us. As a result, we

can only comment on the concordance of heterozygous and homozygous variant calls in exome data in

relation to the microarrays)

Second, for variants that did not appear in the dbSNP database at the time of initial sequence results

(identifying the variant as “novel”), we performed Sanger sequencing to validate the variants.

Sequencing was performed in the sense and anti-sense direction for each variant to confirm. We

calculated specificity and sensitivity of heterozygous variant calling in exome data as follows:

Specificity = TN/(TN+FP), where TN = true negative (no variant is called in either the exome data or

Sanger sequencing) and FP = false positive (variant called by exome data but not validated by Sanger

sequencing)

Sensitivity = TP/(TP+FN), where TP = true positive (variant called by exome data and validated by

Sanger sequencing) and FN = false negative (variant not called by exome data but identified by Sanger

sequencing)

For the above definitions, we excluded homozygous calls and calls where the exome data indicates that

the allele is different from the reference genome but misidentifies the allele (e.g. exome analysis calls

G>A variant, but Sanger proves the variant to be G>T).

3.4 Filtering strategy All SNVs and indels within the exome-capture target regions were identified, and SIFT565 was used to

annotate the synonymous/non-synonymous/frameshift/non-frameshift nature of each SNV or indel.

Synonymous variants (i.e. no alteration in amino acid) were identified and removed. In addition, variants

reported in dbSNP131 were removed. Only coding region and/or splicing-site variants (up to +/- 3bp

from exons) were included in the final list per subject. We screened the excluded variants for very low

minor allele frequencies (< 0.2%) or variants that are somatic variants in cancer that should be re-included

in our list (since dbSNP does contain some somatic variants).

To identify candidate susceptibility genes for the pancreatic cancer in this family, we adopted four

filtering approaches:

Page 118: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

105

Model#1 - Assuming the two siblings and the uncle are all carriers of the responsible variant, and that the

unaffected aunt is not a carrier, we identified variants in common to the siblings + uncle and absent in the

aunt.

Model#2 - To account for incomplete penetrance of the susceptibility gene, we assumed the aunt may or

may not be a carrier and identified variants in common to the siblings and uncle, whether or not present in

the aunt as well.

Model#3 - To account for the lower coverage in the uncle, we assumed the two siblings are carriers and

the aunt is not a carrier and identified variants in common to the siblings and absent in the aunt, whether

or not called in the uncle.

Model#4 - To account for lower coverage in the uncle and incomplete penetrance of the gene, we

assumed the two siblings are carriers and identified variants in common to the siblings, whether or not

present in the aunt and/or uncle.

For each model, the final list of variants was manually curated by screening in dbSNP135

(http://www.ncbi.nlm.nih.gov/snp) which includes results from the first phase of the 1000 Genomes246

project (low-coverage genome-wide sequencing of 180 samples, sufficient to call most variants ≥ 1%

minor allele frequency, and deep-sequencing of exons captured for 1000 genes in 900 individuals,

sufficient to call rare and low-frequency variants in the coding region of these exons). We also screened

the variants in the Exome Sequencing Project566, a collaborative project that is sequencing thousands of

genomes from large, well-phenotyped cohorts. To date, data for approximately 5,400 samples are

available online. For the purpose of this analysis, since cancer syndromes are typically caused by high-

penentrance, rare variants, we removed variants that appear with a frequency >0.2% in the 1000 Genome

or Exome Sequencing Project.572 For indels, we individually inspected the region of the genome near the

putative variant to verify that it is indeed novel based on the latest information in dbSNP135, since in

some repetitive regions, the exact position of the indel can be called differently by different algorithms.

For the remaining variants under each model, we identified the predicted effect of variants using SIFT565

and Polyphen-2.464 We also determined if the genes containing the variants have been reported to be

differentially expressed in pancreatic adenocarcinoma or pre-invasive lesions (in Pancreatic Expression

Database)253, as well as whether they have reported somatic mutations in cancer (as catalogued in

COSMIC database549). We also compared our list of genes generated from this analysis with the list of

genes affected by coding-region CNVs, reported in Chapter III of this thesis.

Page 119: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

106

3.5 Screening candidate genes We performed PCR amplification and Sanger sequencing to validate top candidates and also performed

Sanger sequencing to screen candidate genes in a cohort of 70 familial and young-onset pancreatic cancer

cases. (Primer sequences were previously published by Jones et al.37).

4. Results Table 13 summarizes the number of raw reads generated per sample and the percentage of reads that were

aligned after collapsing PCR artifacts.

Table 13 – Summary of raw sequence data from Illumina GAII for each subject

N raw reads

N reads aligned marked as PCR

% reads aligned marked as PCR

N reads aligned

% reads aligned after collapsing

N reads aligned in + strand

% reads aligned in + strand

N reads aligned in - strand

% reads aligned in - strand

ID-001 (sibling) 255729700 93291690 36.48 100526497 39.31 50238199 49.98 50288298 50.02 ID-006 (sibling) 286351706 74640172 26.07 145229860 50.72 72609270 50 72620590 50 ID-010 (aunt) 125185468 16889185 13.49 98153318 78.41 49068884 49.99 49084434 50.01 ID-011 (uncle) 122363100 71869013 58.73 22965430 18.77 11455807 49.88 11509623 50.12

Although the two siblings generated approximately twice as many raw reads as the aunt and uncle, only

40-50% of the siblings’ reads were ultimately aligned after excluding PCR artifacts while nearly 80% of

the reads for the aunt were aligned. This resulted in approximately an equivalent number of reads for

those three samples contributing to the final alignment of each genome. Fewer than 20% of the raw reads

generated for the uncle were aligned after excluding PCR artifacts, resulting in significantly lower

coverage for the uncle’s genome: while each of the four samples had the majority of the target region

bases (~35Mb) covered by at least one read (1x), the exome-wide average read-depth for the uncle was

about 10-fold the average coverage of the other three samples (~20x vs. 186x). (Figures 18 and 19).

Figure 18 – Average coverage of bases in target region of exome per subject

Figure 18 Legend: ID-011 (uncle) had lower average read depth for target exome than the other 3 subjects.

Page 120: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

107

Figure 19 – Read-depth per base in target region of exome in each subject

Of note, an accepted minimum threshold for accurate identification of a heterozygous variant (in previous

papers and by the lab performing sequencing) is 8x coverage: at this threshold, the algorithm can reliably

call a heterozygous variant at approximately 94-95% of the target region of the siblings and aunt but at

only 82% of the uncle.

4.1 Validation For siblings ID-001 and ID-006, a total of 1,985 and 1,995 SNPs, respectively, were identified as having

a heterozygous or homozygous non-reference allele in the exome data and which were genotyped on the

Affymetrix 500K array. Of those, 473 variants in ID-001 and 439 variants in ID-006 were discordant

between the exome data and the microarray genotypes; 318 of those were discordant in both siblings, the

majority of which were identified as wildtype on the microarray and homozygous variant on the exome

data. For ID-001, 1,086/1,103 (98.5%) SNPs identified as heterozygous in the exome data were

concordant with the microarray results, while only 426/882 (48.3%) SNPs identified as homozygous

variant in the exome were concordant with the microarray results (p<0.0001). The results for ID-006

were nearly identical: 1,122/1,141 (98.3%) of heterozygous SNPs and 434/854 (50.8%) homozygous

variants allele called by the exome data were concordant with microarray genotypes (p<0.0001).

We also performed Sanger sequencing on 38 SNVs that were unreported in dbSNP131, including eight

putatively novel homozygous variants. (Table 14)

8x

Page 121: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

108

Table 14 – Sanger validation data for selected SNVs in each exome subject

Sib ID-001 (affected)

Sib ID-006 (affected)

Uncle ID-011 (affected)

Aunt ID-010 (unaffected)

Gene Variant (hg19)

NGS Sanger NGS Sanger NGS Sanger NGS Sanger

ABCC12 chr16:48177891 G/A

het het conc het het conc wt het disc wt wt conc

ADAMTS20 chr12:43886974 T/G

het het conc het het conc wt het disc wt wt conc

APLF chr2:68717350 A/T

het het conc het het conc het het conc wt wt conc

ASTN2 chr9:119802196 A/G

het het conc het het conc het het conc wt wt conc

AZI1 chr17:79169727 T/C

het het conc het didn’t do

n/a wt didn’t do

n/a wt wt conc

C14orf102 chr14:90752754 G/A

het het conc het het conc wt het disc wt noisy n/a

C1orf65 chr1:223568054 G/A

het het conc het didn’t do

n/a het didn’t do

n/a wt wt conc

CCDC141 chr2:179702237 G/A

het het conc het het conc het het conc wt wt conc

CEP110 chr9:123886284 A/T

het het conc het het conc het het conc wt wt conc

CREBBP chr16:3820773 G/A

het het conc het didn’t do

n/a wt didn’t do

n/a wt wt conc

MUC7 chr4:71346606 C/T

het het conc het het conc het het conc wt wt conc

PCYOX1 chr2:70503881 T/A

het het conc het het conc het het conc wt wt conc

RASSF6 chr4:74442178 A/C

het het conc het het conc het het conc wt wt conc

SEZ6L2 chr16:29896915 C/T

het het conc het het conc wt het disc wt wt conc

SFRS2IP chr12:46322436 C/T

het het conc het het conc wt het disc wt wt conc

TAF5L chr1:229745873 C/T

het het conc het het conc het het conc wt wt conc

CYP2C9 chr10:96741007 C/A

het het conc het het conc wt het disc wt wt conc

AGL chr1:100318245 T/G

het wt disc het didn’t do

n/a wt didn’t do

n/a wt wt conc

ARAP1 chr11:72406442 T/C

het het conc het noisy n/a wt wt conc wt didn’t do

n/a

RPA1 chr17:1800470 G/C

het het conc het het conc wt wt conc wt didn’t do

n/a

AKAP7 chr6:131571655 A/G

het het conc het het conc wt wt conc wt didn’t do

n/a

NEIL3 chr4:178283483 G/A

het het conc het het conc wt wt conc wt didn’t do

n/a

C9 chr5:39331804 A/G

het het conc het het conc wt wt conc wt didn’t do

n/a

RAPGEF3 chr12:48131310 G/A

het het conc het het conc wt wt conc wt wt conc

SERPINB3 chr18:61323259 A/T

het het conc het het conc wt wt conc wt wt conc

C2orf24 chr2:220037608 A/C

het het conc het het conc wt wt conc wt wt conc

KDM4C chr9:7103702 A/C

het het conc het wt disc wt wt conc wt didn’t do

n/a

EXPH5 chr11:108389007 G/A

het het conc het het conc wt didn’t do

n/a wt wt conc

MSH6 chr2:48027541 het het conc het het conc het het conc het het conc

Page 122: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

109

G/A

PCSK9

chr1:55524237 G/T

homo homo (diff variant-A)

disc homo homo (diff variant- A)

disc wt het (diff variant- G/A)

disc het het (diff variant- G/A)

disc

ANKRD11

chr16:89350038 G/T

homo homo (diff variant- A)

disc homo homo (diff variant- A)

disc wt homo (diff variant- A)

disc het het (diff variant- G/A)

disc

KIAA0020 -(1)

chr9:2811505 G/T

homo homo (diff variant- A)

disc homo homo (diff variant- A)

disc wt homo (diff variant- A)

disc het het (diff variant- G/A)

disc

KIAA0020 -(2)

chr9:2828765 C/T homo homo (diff variant- G)

disc homo homo (diff variant- G)

disc homo het (diff variant- C/G)

disc het het (diff variant- C/G)

disc

USP6 chr17:5037281 T/C

homo homo

conc homo homo conc wt wt conc wt wt conc

CHRNE

chr17:4802829 G/T

homo homo (diff variant- A)

disc homo homo (diff variant- A)

disc wt wt conc wt wt conc

TXNDC17 chr17:6544421 G/A

homo homo conc homo homo conc wt wt conc wt het disc

MYH2 chr17:10432311 C/T

homo homo conc homo homo conc wt wt conc wt wt conc

NGS = next-generation sequencing; het = heterozygous variant; homo = homozygous variant; wt = wildtype; i.e. homozygous reference allele; conc = concordant results between next-generation and Sanger sequencing; disc = discordant results between next-generation and Sanger sequencing

For heterozygous exome variants, 53/57 (93%) of calls in the siblings and aunt and 9/9 (100%) of calls in

the uncle were concordant with Sanger sequencing (p=1.000); for homozygous exome variants, 6/16

(37.5%) of calls in the siblings and aunt and 0/1 (0%) of calls in the uncle were concordant with Sanger

sequencing (p=1.000); for wildtype alleles in the exome data, 24/25 (96%) of calls in the siblings and aunt

and 13/22 (59%) of calls in the uncle were concordant with Sanger sequencing (p=0.003). Of the eight

homozygous variants called in the two siblings, only three validated as called in the exome data; the

remaining five were discovered to be a different homozygous allele by Sanger sequencing. Notably, the

three accurately called homozygous variants were all novel, whereas the five inaccuarately identified

variants were at positions of reported SNPs (i.e. the Sanger-sequence allele is the same as that reported in

dbSNP). Based on the Sanger sequencing results, the specificity for heterozygous variant calling in our

exome data in the siblings and aunt was 24/(24+2)=92% and in the uncle was 13/(13+0)=100%

(p=0.544); the sensitivity in the siblings and aunt was 52/(52+1)=98% and in the uncle 9/(9+6)=60%

(p<0.001).

In addition, we performed Sanger sequencing on 15 indels called in the exome data. (Table 15)

Page 123: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

110

Table 15 – Sanger validation data for selected indels in each exome subject

Sib ID-001 (affected)

Sib ID-006 (affected)

Uncle ID-011 (affected)

Aunt ID-010 (unaffected)

Gene Variant Position NGS Sang NGS Sang NGS Sang NGS Sang

TUB

ins GAGGATGAG chr11:8118257 y y

conc n y

disc n n

conc n n

conc

C22orf40 del T chr22:46643104 y y conc y y conc n n conc n n conc

WDR92 ins C chr2:68384601 y n disc n

didn't test

n/a n

didn't test

n/a n

didn't test

n/a

KCNMB3 del T chr3:178960766 y y conc n n conc n n conc n n conc c4orf35 del A chr4:71201064 y y conc n n conc n n conc n n conc

FAM53C

del CCTCAGGCCTGAGCCTGCA chr5:137680588 y y

conc n n

conc n n

conc n n

conc

STAG3 ins G chr7:99797230 y n disc n

didn't test

n/a n

didn't test

n/a n

didn't test

n/a

ARHGAP36 ins C chrX:130217764 y n

disc n

didn't test

n/a n

didn't test

n/a n

didn't test

n/a

NBPF3

del GTCTCCCAG chr1:21801435 n n

conc y y

conc n n

conc n n

conc

ZNF683

del CCACCGAGCGCTGGGGTGCCCCAG chr1:26691286 n n

conc y y

conc n y

disc n n

conc

CLSPN del TTC chr1:36203659 n n conc y y conc n n conc n n conc

FLVCR2

del CCCAGCGTCTCGGTCCAT chr14:76045387 n n

conc y y

conc n n

conc n n

conc

NUCB1

del AGCAGC chr19:49425108 n n

conc y y

conc n y

disc n n

conc

MNDA del AGAA chr1:158817614 n n

conc y y

conc n n

conc n n

conc

PCDHGA2 del C chr5:140719334 n n

conc y y

conc y y

conc y y

conc

NGS = next-generation sequencing; Sang = Sanger sequencing; y = indel identified; n = indel not identified; conc = concordant results between NGS and Sanger; disc = discordant results between NGS and Sanger

Thirty-nine sequencing reactions were conducted in the siblings and aunt: 14/17 (82%) of indels called in

exome data were validated and only a single indel in one individual was missed on exome sequencing.

There were too few tests in the uncle to identify a significant difference (only one indel was called in this

sample set, which validated, and for 11 indels that were not called in the uncle 9 were also not observed

on Sanger sequencing). The specificity of indel calling in the sibs and aunt was 21/(21+3) = 88% and in

the uncle was 9/(9+0)=100% (p=0.545); the sensitivity in the sibs and aunt was 14/(14+1)=93% and in

the uncle was 1/(1+2)=33% (p=0.056).

Page 124: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

111

4.2 Filtration results Table 16 summarizes the number of variants identified in each subject.

Table 16 – Number of variants identified in each exome subject Sibling – 001

(affected) Sibling – 006 (affected)

Uncle – 011 (affected)

Aunt – 010 (unaffected)

All SNVs in-target (autosomes + X chr) 20,665 20,822 10,815 21,930 In-target SNVs excluding synonymous SNVs 13,551 13,413 7,267 14,328 In-target nsSNVs that are nonsense or missense or splice-site excluding any corresponding to position in dbSNP131 (het/homozygous; % homozygous) [% of all nsSNVs]

298 (282/16; 5.4%) [2.2% of all nsSNVs]

306 (289/17; 5.6%) [2.3% of all nsSNVs]

146 (144/2; 1.4%) [2.0% of all nsSNVs]

325 (319/6; 1.9%) [2.3% of all nsSNVs]

All indels (intronic + exonic) 713 726 456 741 Exonic and splice-site indels not in dbSNP131 68 69 46 59 Model# 4 - Rare variants in common to siblings (+/- uncle +/- aunt) [truncating mutations (splice-site/nonsense/fs indels)]

98 SNVs + 5 indels*

[9 truncating (3/1/5)]

Model # 3 - Rare variants in common to siblings +/- uncle (-aunt) [truncating mutations (splice-site/nonsense/fs indels)]

68 SNVs + 1 indels* [4 truncating (3/0/1)]

Model # 2 - Rare variants in common to siblings + uncle (+/- aunt) [truncating mutations (splice-site/nonsense/fs indels)]

14 SNVs + 2 indels* [2 truncating (0/0/2)]

Model # 1 - Rare variants in common to siblings + uncle (- aunt) [truncating mutations (splice-site/nonsense/fs indels)]

9 SNVs + 0 indels* [0 truncating]

*Number of combined variants in each model given after excluding olfactor receptor genes and pseudogenes.; fs=frameshift; nsSNV = non-synonymous single nucelotide variant

For each of the siblings and the aunt, approximately 20,000-21,000 SNVs in the autosomes and X-

chromosome were identified within the target region of the exome, at ≥ 8x depth of coverage and passing

the quality thresholds of the alignment and variant-calling algorithms. For the uncle, the number of

variants called under the same threshold parameters was only half as many as the other samples. For

each of the four samples, approximately one-third of called variants were synonymous and were filtered

out. Further filtering of variants reported in dbSNP131 and present in untranslated regions or in introns

beyond +/- 3bp from exons (i.e. not splice site variants) reduced the number of variants per sample to

approximately 300 variants in the siblings and aunt, and approximately 150 variants in the uncle –

approximately 2% of all nonsynonymous variants (nsSNVs) in each subject. We noted that each sibling

had a higher proportion of unreported homozygous variants compared to the uncle and aunt (Sib 001 and

Sib 006 = 5.5% vs. Aunt = 1.9%, p=0.03 and p=0.02 respectively; Sib 001 and Sib 006 = 5.5% vs. Uncle

= 1.3%, p=0.07 and p=0.04 respectively). While it is possible that some of these may be false calls, this

higher degree of homozygosity in the siblings is expected since their parents are first cousins. Figures 20

to 22 illustrate the distribution of SNVs across the 22 autosomes and X chromosome in each subject; the

pattern of distribution is nearly identical in the siblings and aunt, and fairly similar to the uncle, for the

total SNV group and there was no significant difference in the pattern after excluding synonymous SNVs.

Page 125: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

112

However, SNVs not reported in dbSNP131 took on a differing chromosomal distribution, and while the

new pattern remained consistent across the siblings and aunt, the uncle displayed a visibly differing

pattern of variant distribution.

Figure 20 – Genome-wide distribution of all SNVs identified in each exome subject

Figure 21 – Genome-wide distribution of SNVs excluding synonymous variants in each exome subject

Page 126: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

113

Figure 22 – Genome-wide distribution of SNVs not reported in dbSNP131 in each exome subject

Around 715-740 indels were identified in each of the siblings and aunts, and about 450 in the uncle, but

most of those were intronic. Combining unreported protein-altering SNVs and indels, each of the siblings

and aunt had approximately 370 potentially significant variants, and the uncle had approximately 200.

4.3 Candidate genes Table 17 lists the genes identified by each filtering model described in the methods section.

Table 17 – Genes containing variants identified by filtration model #1, 2, 3, and/or 4

Filtering Model Variant VariantType GeneName SIFT Polyphen-2

Model#1/2/3/4 chr4#71346606#C#T# nonsynonymous_SNV MUC7 DAMAGING unknown

Model#1/2/3/4 chr1#223568054#G#A# nonsynonymous_SNV C1orf65 TOLERATED benign

Model#1/2/3/4 chr2#68717350#A#T# nonsynonymous_SNV APLF DAMAGING probably damaging

Model#1/2/3/4 chr4#74442178#A#C# nonsynonymous_SNV RASSF6 DAMAGING probably damaging

Model#1/2/3/4 chr9#119802196#A#G# nonsynonymous_SNV ASTN2 DAMAGING possibly damaging

Model#1/2/3/4 chr2#179702237#G#A# nonsynonymous_SNV CCDC141 TOLERATED benign

Model#1/2/3/4 chr9#123886284#A#T# nonsynonymous_SNV CEP110 DAMAGING probably damaging

Model#1/2/3/4 chr2#70503881#T#A# nonsynonymous_SNV PCYOX1 TOLERATED benign

Page 127: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

114

Model#1/2/3/4 chr1#229745873#C#T# nonsynonymous_SNV TAF5L TOLERATED possibly damaging

Model#2/4 chr2#48027541#G#A# nonsynonymous_SNV MSH6 TOLERATED benign

Model#2/4 chr9#711331#G#A# nonsynonymous_SNV KANK1 DAMAGING probably damaging

Model#2/4 chr13#25839959#C#T# nonsynonymous_SNV MTMR6 TOLERATED benign

Model#2/4 chr21#37586387#A#C# nonsynonymous_SNV DOPEY2 TOLERATED possibly damaging

Model#2/4 chr10#17726708#G#A# nonsynonymous_SNV STAM DAMAGING probably damaging

Model#2/4 chr10#7774320#(+G) frameshift_indel ITIH2 FRAMESHIFT n/a

Model#2/4 chr3#182737989#(-T) frameshift_indel MCCC1 FRAMESHIFT n/a

Model#3/4 chr16#48177891#G#A# nonsynonymous_SNV ABCC12 DAMAGING probably damaging

Model#3/4 chr12#43886974#T#G# nonsynonymous_SNV ADAMTS20 DAMAGING benign

Model#3/4 chr4#178283483#G#A# nonsynonymous_SNV NEIL3 DAMAGING probably damaging

Model#3/4 chr16#29896915#C#T# nonsynonymous_SNV SEZ6L2 DAMAGING probably damaging

Model#3/4 chr16#3820773#G#A# nonsynonymous_SNV CREBBP TOLERATED unknown

Model#3/4 chr17#79169727#T#C# nonsynonymous_SNV AZI1 TOLERATED benign

Model#3/4 chr11#72406442#T#C# nonsynonymous_SNV ARAP1 DAMAGING possibly damaging

Model#3/4 chr5#39331804#A#G# nonsynonymous_SNV C9 DAMAGING probably damaging

Model#3/4 chr10#96741007#C#A# nonsynonymous_SNV CYP2C9 TOLERATED possibly damaging

Model#3/4 chr11#108389007#G#A# nonsynonymous_SNV EXPH5 DAMAGING probably damaging

Model#3/4 chr17#10432311#C#T# nonsynonymous_SNV MYH2 DAMAGING probably damaging

Model#3/4 chr8#24339740#T#C# nonsynonymous_SNV ADAM7 TOLERATED possibly damaging

Model#3/4 chr22#24939978#C#A# nonsynonymous_SNV C22orf13 TOLERATED benign

Model#3/4 chr16#85813453#G#A# nonsynonymous_SNV COX4NB TOLERATED benign

Model#3/4 chr2#233546356#G#A# nonsynonymous_SNV EFHD1 DAMAGING probably damaging

Model#3/4 chr9#36148648#G#A# splice-site GLIPR2 Not_scored not given

Model#3/4 chr12#44161949#G#A# nonsynonymous_SNV IRAK4 DAMAGING probably damaging

Model#3/4 chr6#39602710#G#A# nonsynonymous_SNV KIF6 DAMAGING probably damaging

Model#3/4 chr18#39542537#A#G# nonsynonymous_SNV PIK3C3 TOLERATED possibly damaging

Model#3/4 chr11#65618312#A#C# nonsynonymous_SNV SNX32 TOLERATED benign

Page 128: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

115

Model#3/4 chr9#7103702#A#C# nonsynonymous_SNV KDM4C DAMAGING probably damaging

Model#3/4 chr14#51347190#G#A# nonsynonymous_SNV ABHD12B TOLERATED possibly damaging

Model#3/4 chr9#399245#C#A# nonsynonymous_SNV DOCK8 TOLERATED benign

Model#3/4 chr8#11142438#A#G# nonsynonymous_SNV MTMR9 TOLERATED benign

Model#3/4 chr11#61015898#A#C# nonsynonymous_SNV PGA5 TOLERATED benign

Model#3/4 chr12#112194215#G#A# nonsynonymous_SNV ACAD10 DAMAGING probably damaging

Model#3/4 chr12#124104074#G#A# nonsynonymous_SNV DDX55 TOLERATED benign

Model#3/4 chr5#79809468#G#A# nonsynonymous_SNV FAM151B TOLERATED benign

Model#3/4 chr1#230895257#G#C# splice-site CAPN9 Not_scored not given

Model#3/4 chr2#84670490#C#T# nonsynonymous_SNV SUCLG1 DAMAGING probably damaging

Model#3/4 chr7#6474403#G#A# nonsynonymous_SNV DAGLB TOLERATED benign

Model#3/4 chr14#68060533#G#A# nonsynonymous_SNV PIGH DAMAGING probably damaging

Model#3/4 chr9#99522502#C#T# nonsynonymous_SNV ZNF510 TOLERATED benign

Model#3/4 chr2#211521333#A#G# nonsynonymous_SNV CPS1 DAMAGING benign

Model#3/4 chr8#22211863#G#A# nonsynonymous_SNV PIWIL2 TOLERATED benign

Model#3/4 chr17#14139702#G#A# nonsynonymous_SNV CDRT15 TOLERATED benign

Model#3/4 chr12#21014025#A#G# nonsynonymous_SNV SLCO1B3 TOLERATED possibly damaging

Model#3/4 chr10#123845149#C#T# nonsynonymous_SNV TACC2 DAMAGING probably damaging

Model#3/4 chr12#27571118#A#G# nonsynonymous_SNV ARNTL2 TOLERATED benign

Model#3/4 chr12#27059313#T#C# nonsynonymous_SNV ASUN TOLERATED probably damaging

Model#3/4 chr7#2472653#G#A# nonsynonymous_SNV CHST12 TOLERATED benign

Model#3/4 chr17#76562706#G#A# nonsynonymous_SNV DNAH17 TOLERATED probably damaging

Model#3/4 chr6#6146007#T#C# splice-site F13A1 Not_scored not given

Model#3/4 chr9#72006662#G#A# nonsynonymous_SNV FAM189A2 DAMAGING probably damaging

Model#3/4 chr13#42404723#C#T# nonsynonymous_SNV KIAA0564 TOLERATED probably damaging

Model#3/4 chr9#86482712#C#G# nonsynonymous_SNV KIF27 TOLERATED benign

Model#3/4 chr12#96412989#G#C# nonsynonymous_SNV LTA4H TOLERATED possibly damaging

Page 129: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

116

Model#3/4 chr9#100423254#C#G# nonsynonymous_SNV NCBP1 TOLERATED benign

Model#3/4 chr6#126236528#G#C# nonsynonymous_SNV NCOA7 DAMAGING possibly damaging

Model#3/4 chr11#77781071#A#G# nonsynonymous_SNV NDUFC2-KCTD14 DAMAGING probably damaging

Model#3/4 chr14#73717738#G#T# nonsynonymous_SNV PAPLN DAMAGING probably damaging

Model#3/4 chr11#70184526#A#T# nonsynonymous_SNV PPFIA1 TOLERATED benign

Model#3/4 chr12#114352786#C#T# nonsynonymous_SNV RBM19 TOLERATED benign

Model#3/4 chr14#81743769#T#A# nonsynonymous_SNV STON2 DAMAGING possibly damaging

Model#3/4 chr12#10959195#C#T# nonsynonymous_SNV TAS2R8 DAMAGING possibly damaging

Model#3/4 chr6#54173656#G#A# nonsynonymous_SNV TINAG TOLERATED benign

Model#3/4 chr12#29904627#A#G# nonsynonymous_SNV TMTC1 TOLERATED benign

Model#3/4 chr14#74824559#G#A# nonsynonymous_SNV VRTN DAMAGING possibly damaging

Model#3/4 chr16#3142568#C#G# nonsynonymous_SNV ZSCAN10 DAMAGING possibly damaging

Model#3/4 chr22#46643104#(-T) frameshift_indel C22orf40 FRAMESHIFT n/a

Model#4 chr2#21256234#T#C# nonsynonymous_SNV APOB TOLERATED benign

Model#4 chr5#141033932#A#G# nonsynonymous_SNV ARAP3 DAMAGING possibly damaging

Model#4 chr2#127953008#G#A# nonsynonymous_SNV CYP27C1 DAMAGING probably damaging

Model#4 chr6#10704808#A#G# nonsynonymous_SNV PAK1IP1 TOLERATED benign

Model#4 chrX#153609141#C#T# nonsynonymous_SNV EMD DAMAGING benign

Model#4 chr16#81045669#G#A# nonsynonymous_SNV CENPN TOLERATED probably damaging

Model#4 chr14#37737909#C#G# nonsynonymous_SNV MIPOL1 TOLERATED probably damaging

Model#4 chr5#149753777#C#T# nonsynonymous_SNV TCOF1 TOLERATED probably damaging

Model#4 chr6#151626965#T#A# nonsynonymous_SNV AKAP12 TOLERATED benign

Model#4 chr6#83754169#G#T# nonsynonymous_SNV UBE2CBP DAMAGING. probably damaging

Model#4 chr12#9307415#G#A# nonsynonymous_SNV PZP DAMAGING probably damaging

Model#4 chr11#108013182#A#G# nonsynonymous_SNV ACAT1 DAMAGING probably damaging

Page 130: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

117

Model#4 chr8#62578060#C#T# nonsynonymous_SNV ASPH TOLERATED probably damaging

Model#4 chr5#75950796#C#T# nonsynonymous_SNV IQGAP2 DAMAGING probably damaging

Model#4 chr9#34256831#G#A# nonsynonymous_SNV KIF24 TOLERATED benign

Model#4 chr19#10664763#C#T# nonsynonymous_SNV KRI1 DAMAGING probably damaging

Model#4 chr12#52841335#G#T# nonsynonymous_SNV KRT6B DAMAGING benign

Model#4 chr6#90438697#T#C# nonsynonymous_SNV MDN1 DAMAGING possibly damaging

Model#4 chr11#102826185#G#A# nonsynonymous_SNV MMP13 TOLERATED benign

Model#4 chr7#47870890#C#T# stopgain_SNV PKD1L1 N/A not given

Model#4 chr1#204226965#A#G# nonsynonymous_SNV PLEKHA6 DAMAGING possibly damaging

Model#4 chr16#74678573#C#T# nonsynonymous_SNV RFWD3 DAMAGING. probably damaging

Model#4 chr17#46000447#T#A# nonsynonymous_SNV SP2 TOLERATED benign

Model#4 chr11#62346444#T#C# nonsynonymous_SNV TUT1 DAMAGING not given

Model#4 chr5#145895520#C#T# nonsynonymous_SNV GPR151 DAMAGING probably damaging

Model#4 chr11#58891961#(-T) frameshift_indel FAM111B FRAMESHIFT n/a

Model#4 chr16#69748923#(-CACT) frameshift_indel NQO1 FRAMESHIFT n/a

Four of the missense variants and two of the indels in the final list of candidates are in olfactory receptor

(OR) genes which we automatically downgrade on our list because they are functionally unlikely to be

cancer susceptibility genes, they are commonly affected by variants, and they have many homologous

pseudogenes that may inadvertently be captured and sequenced. A fifth missense variant belongs to a

pseudogene called RPL21P44, and it was also excluded.

Model#1, comprising variants shared by all three affected relatives and absent in the unaffected aunt,

generated the shortest list with only 9 SNVs and zero indels. Model#2, including shared variants by the

siblings and uncles without incorporating the aunt in the filtration, generated a final list of 16 genes (14

SNVs + 2 indels). Model#3 contained variants shared by the siblings and absent in the aunt, regardless of

whether they were called in the uncle; the final list consists of 69 genes (68 SNVs + 1 indels). Model#4

in our filtration strategy yielded the longest list of variants, producing 98 SNVs and 5 indels shared by the

two siblings irrespective of their status in the uncle and aunt, including 9 protein-truncating variants. No

gene contained more than one novel/rare variant in any model.

Page 131: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

118

We also reviewed the list of filtered out variants in untranslated regions of the gene, but found that no

additional genes are added to Model#1 and #2 lists, only 6 additional variants in Model#3, and 11

additional variants in Model#4. These variants are identified separately in Table 18.

Table 18 – Additional candidate variants in untranslated regions shared by exome subjects

Variant Model Gene Position chr12#48131310#G#A# Model#3/4 RAPGEF3 3' UTR chr7#138732309#T#C# Model#3/4 ZC3HAV1 3' UTR chr14#75201644#A#T# Model#3/4 FCF1 3' UTR chr14#94563251#G#C# Model#3/4 IFI27L1 5' UTR chr6#131571655#A#G# Model#3/4 AKAP7 5' UTR chr19#5561228#G#A# Model#3/4 PLAC2 predicted noncodingRNA chr16#69997537#G#C# Model#4 CLEC18A 3' UTR chr9#40772067#C#T# Model#4 ZNF658 3' UTR chr7#74173181#C#T# Model#4 GTF2I 3' UTR chr7#15240888#G#A# Model#4 TMEM195 3' UTR chrX#148627384#A#G# Model#4 CXorf40A 3'UTR

None of the genes with SNVs or indels in our exome data contained coding-region CNVs in the CNV

study, nor were any reported to be associated with pancreatic cancer in published case-control studies (see

Literature Search). Due to time and resource constraints, the focus of the remainder of this chapter is on

discussing the results of model#1, the most stringent and shortest list of candidate susceptibility genes.

Using Sanger sequencing, we validated the missense variants in the 9 genes in the three affecteds and

verified absence in the aunt. Four genes had variants that were identified as damaging by SIFT as well as

Polyphen-2. Moreover, three of those genes have functions that suggest potential importance in tumor

development: APLF (aprataxin and PNKP like factor) has been shown to play a role in DNA single- and

double-strand repair by interacting with members of the PARP (Poly-ADP-Ribose-Polymerase) family567,

and APLF undergoes ATM-dependent hyperphosphorylation following ionizing radiation568; RASSF6

(Ras asssociation (RalGDS/AF-6) domain family member 6) is a Ras effector and candidate tumor

suppressor that is downregulated in some tumors569; and CEP110 (centriolin) encodes a protein required

for centrosome function as a microtubule organizing centre and is associatd with centrosomal

maturation570. A fourth gene, MUC7 (mucin 7, secreted), is overexpressed in pancreatic

adenocarcinoma571; however, we ranked it lower than the other above-mentioned genes since (a) SIFT

and Polyphen-2 did not provide a strong prediction of damaging effect for this variant, likely because it

was poorly conserved, and (b) most hereditary cancer syndromes are caused by inactivating mutations in

tumor suppressor genes that cause decreased expression of the encoded protein, and mucin 7 appeared to

be more of a marker and potential oncogene in pancreatic cancer rather than a tumor suppressor. The

remaining genes (ASTN2, TAF5L, CCDC141, C1orf65, and PCYOX1) were ranked lower on the list of

Page 132: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

119

candidates due to lack of evidence linking them to cancer, and most of these variants were predicted to be

benign.

We PCR-amplified and Sanger sequenced each exon of APLF (10 exons) and RASSF6 (11 exons) in a

cohort of approximately 70 pancreatic cancer cases. No novel variant was identified in either gene in the

screening cohort. CEP110 was not screened in the same manner due to its very large size (42 coding

exons), and instead it will be investigated for variants in other subjects using future data from planned

whole-exome sequencing of 75 additional familial pancreatic cancer patients.

5. Discussion We have presented a list of candidate susceptibility genes for FPC by performing exome sequencing in a

family with a strong history of pancreatic cancer in two of seven siblings, their mother, and a maternal

uncle. Initially, our plan was to filter variants shared by the three affected members (2 siblings +

maternal uncle) while excluding variants present in the aunt (who was unaffected by age 80). This model

is based on an autosomal dominant mode of inheritance of a relatively high-penetrance gene. However,

since we do not actually know the penetrance of the gene in question, we also decided to account for the

possibility that the unaffected aunt may be a carrier. Thus model#2 comprised genes with variants shared

by the three affecteds irrespective of the status in the aunt. This approximately doubled the number of

candidate genes (16 vs. 9), but the list size remained manageable. Interestingly, the model#1 list, while

containing three functionally interesting genes, did not have any truncating mutations, whereas model#2

yielded two frameshift indels. Most familial cancer syndromes are caused by tumor suppressor genes that

segregate protein-truncating mutations in the affected members of the family. Nonetheless, although

several additional genes in the model#2 group are of potential interest, we elected to focus our

investigation on top candidates in model#1 for the purpose of this thesis due to time and resource

constraints.

One of the most interesting genes in our list is APLF, encoding a protein that has been demonstrated to

participate in DNA repair and is also thought to be a histone chaperone567,568,573. The DNA repair genes

BRCA2, PALB2, and ATM have all been linked to FPC in recent years, suggesting the importance of this

pathway in pancreatic tumorigenesis. However, a Sanger-based screen of all 10 exons of APLF in ~70

unrelated pancreatic cancer subjects yielded no novel variants. Similarly, RASSF6 is appealing as a

susceptibility gene in pancreatic cancer due to its regulatory effect on Ras, a protein whose activation has

been demonstrated in the majority of pancreatic adenocarcinomas and is an early event in tumorigenesis.

Sanger sequencing of the 11 exons of RASSF6 also failed to show novel variants in the screening cohort.

We note that the variants affecting each of these gene in Family C are rare (~0.2%), and both were

Page 133: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

120

predicted to be damaging by both SIFT and Polyphen-2. This emphasizes the challenge inherent in using

exome sequencing in a single family, particularly with closely-related relatives, for attempting to identify

the genetic cause of a familial cancer syndrome. The presence of many potentially deleterious variants in

the exome of any individual has been well demonstrated by multiple whole-genomes and exomes

published to date. (See Literature Search for details). The successful studies that used exome sequencing

to identify high-penetrance cancer genes in autosomal dominant syndromes did so either by accessing

paired-tumor sequence to identify second hits or else by sequencing multiple unaffected individuals.

Whole exome sequencing does not yield good results from formalin-fixed paraffin-embedded (FFPE)

tumors, and the only resected specimen available in Family C belonged to the mother and was indeed

FFPE.

An alternative method of guiding exome data filtering in autosomal dominant syndromes is with linkage

analysis data, as has been demonstrated in several studies in other Mendelian diseases. As described in

the Literature Search, our group is part of a multi-centre consortium that has collected eligible families for

linkage analysis. Unfortunately, to date no useable results have been generated to allow us to guide our

exome sequencing. We also did not find any of our variants among the genes reported to be associated

with pancreatic cancer in case-control studies.

Our study had some technical limitations; perhaps the most significant was the lower depth of coverage in

the uncle’s exome compared to the other sequenced samples, which resulted in only half as many variants

being called in the uncle as in each of the siblings and aunt. Importantly, the distribution of novel SNVs

across chromosomes differed between the uncle and the other three subjects; suggesting that the uncle’s

decreased coverage is not evenly distributed across the genome and some chromosomes appear to be

particularly under-represented compared to the siblings and aunt (e.g. chromosomes 7 and 12). Sanger

sequencing indicated that the specificity of variant calling in the uncle was equivalent to that of the other

subjects but the sensitivity was significantly lower in the uncle. For this reason, we also considered

models that did not take the uncle’s data into account (#3 and #4). These analyses produced a much

longer list of candidate genes (75-110, depending on whether the aunt’s exome was used to filter out

variants). Those genes are too numerous to be individually screened in other pancreatic cancer patients

using Sanger sequencing. We present those genes here as additional candidates, and anticipate that data

from additional exomes will facilitate variant filtration and allow screening of interesting genes in a more

cost-effective manner.

Another limitation observed in our data is the low specificity of homozygous variant calls. It is not clear

what is causing these erroneous calls, and certainly it raises the importance of individual Sanger

validation of any homozygous variant. However, we note that all the homozygous variants we found to be

Page 134: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

121

inaccurately called were at positions reported as SNPs in dbSNP; the only two novel homozygous

variants validated by Sanger were actually true calls. This suggests that homozygous calls in our final

filtration models may still have a higher validation rate than observed from the comparison to SNP chips.

Had our analysis been based on an autosomal recessive model of inheritance, this issue would have been

of greater significance (as we would have focused on homozygous variants in the siblings). In any case,

only three variants in any of our models were called as homozygotes (one of which we had successfully

validated by Sanger), and they were only present in the model#3 and #4 lists.

In conclusion, we present a list of candidate susceptibility genes for familial pancreatic cancer based on

exome sequencing of three affected members and one unaffected member of a single family. Our

screening of two top candidates in a cohort of unrelated cases failed to identify novel variants to support

the role of these genes in pancreatic cancer causation. However, other potential candidates remain to be

investigated and further screening of those candidates will be facilitated by large-scale exome sequencing

of other families.

Page 135: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

122

Chapter 5 - General Discussion, Conclusions, and Future Directions

General Discussion

The overall aim of my research has been to better understand genetic susceptibility to pancreatic cancer, a

highly lethal malignancy that has dismal outcome for the majority of affected patients. More specifically,

I am interested in relatively highly-penetrant genetic variants that explain some or most of familial

pancreatic cancer (FPC), the autosomal dominant syndrome that has been proposed to explain clustering

of pancreatic cancer in families, often occurring at a younger age of onset than in sporadic cases. The

benefits of identifying such susceptibility genes include: to facilitate development of early-detection and

intervention by enriching trials with subjects that carry known predisposition genes; to calculate the

attributable risk of a particular variant through case-control and/or cohort studies, allowing more accurate

estimation of individual risk in members of FPC families and providing more informed genetic

counseling to such individuals; to identify individuals who may benefit from specific forms of therapy

that target the specific pathways implicated in tumorigenesis and to enable development of targeted

biological therapies.

To date, only a small proportion of hereditary pancreatic cancer cases is attributable to mutations in

specific genes, almost all of these occurring in the context of rare cancer syndromes such as Peutz-Jeghers

Syndrome or Familial Atypical Multiple Mole Melanoma. The most frequently identified mutated gene

in hereditary pancreatic cancer cases is BRCA2, accounting for up to 19%103 of pancreatic cancer families

and conferring an estimated lifetime risk of up to 5%.502 Often, BRCA2 families demonstrate other

associated cancers as well, particularly breast or ovarian cancer; however, a subset of BRCA2-associated

pancreatic cancer patients have no family history of other cancers, and indeed this gene has even been

implicated in apparently sporadic cases.112,113 Given this well-established link between BRCA2 and

pancreatic cancer, investigators have sought to determine if a similar association exists with BRCA1.

Indeed, as discussed in detail in Chapter 1, multiple studies have suggested that BRCA1 increases risk of

pancreatic cancer, albeit to a lesser extent than BRCA2. However, most of previous studies have been

criticized for being biased by their family-based design and population-based studies have produced

conflicting results. Notwithstanding these limitations, I felt that the role of BRCA1 in pancreatic cancer

required further consideration, not only for the value of providing more complete genetic counseling to

affected families and possibly including carriers in screening studies, but also because of the recent

accumulation of anecdotal reports indicating that BRCA1 and BRCA2 mutation carriers respond well to

certain chemotherapies (e.g. platinum-based chemotheraphy, PARP-1 inhibitors) which targeted the

Page 136: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

123

impaired DNA repair system resulting from BRCA1/2 gene inactivation in these tumors. At the time of

conducting my study, our research group had collected seven FFPE-tumor specimens from pancreatic

cancer patients with confirmed germline BRCA1 mutations. Therefore, for the first section of my thesis, I

decided to conduct a loss-of-heterozygosity (LOH) analysis on these samples and compare with nine

sporadic cases that have no known BRCA1 mutations or familial history of breast/ovarian cancer. I

hypothesized that tumors with germline heterozygous inactivating mutations in BRCA1 demonstrate loss

of the remaining functional allele. My analysis indeed demonstrated that LOH at the BRCA1 locus was a

common event in tumors of mutation carriers, with evidence of loss of the functional allele, occurring in

5/7 BRCA1-mutation carriers while only 1/9 sporadic cases demonstrated LOH.

The limitations of my study, namely small sample size and the variable quality of DNA extracted from

FFPE tissue, are challenges that characterize the field of pancreatic cancer research. Due to the rapid

lethality of pancreatic cancer, only a small percentage of patients undergo resection before death.

Moreover, most specimens available for research exist as paraffin blocks of formalin-fixed tissue;

formalin fixation causes cross-linking of nucleic acids, often resulting in degradation of DNA and RNA.

For those reasons, molecular analyses of pancreatic tumors are fraught with difficulties and potential

biases. In my analysis, I attempted to circumvent the potential bias of DNA degradation by selecting

microsatellite markers that generate small amplicons, well below the lower limit of expected DNA

fragments in FFPE tissue (180bp).

To my knowledge, this is the first LOH analysis using familial pancreatic cancer cases with deleterious

BRCA1 mutations. Only two molecular studies previous to mine had investigated BRCA1 in pancreatic

tumors, and both assessed sporadic tumors only. Beger et al.510 found decreased mRNA and protein

expression of BRCA1 in half of 50 pancreatic cancers, with worse 1-year survival in the group with

decreased expression. Peng et al.523 reported frequent BRCA1 methylation in sporadic pancreatic cancers.

No additional studies have since been reported. Interestingly, although sporadic breast and ovarian

cancers do not usually have somatic BRCA1 mutations, they have been reported to have frequent LOH

events at the BRCA1 locus, prompting speculation about potential haploinsufficiency of BRCA1 in these

tumors that drives further genetic alterations.574 My findings suggest that sporadic pancreatic cancer

cases do not have frequent loss at the BRCA1 loss; this would be consistent with Peng et al.’s523 findings

of methylation being a frequent event, since it would function as an alternative to LOH for gene

inactivation. However, I acknowledge that my small sample size, due to the scarcity of resected tumor

samples from pancreatic cancer patients and particularly those with BRCA1 germline mutations, limits the

generalizability of my results. Further investigation of molecular alterations of BRCA1 in pancreatic

tumors is needed on a larger scale before drawing more conclusions regarding its mechanism of action in

the pancreas. Nonetheless, although my findings do not definitively implicate BRCA1 as a familial

Page 137: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

124

pancreatic cancer gene, they certainly suggest such a role for this gene and indicate that larger

epidemiologic studies need to be conducted to establish the risk associated with BRCA1 mutations and

pancreatic cancer.

While my first study contributed toward understanding the role of a specific candidate gene (BRCA1) in

pancreatic tumorigenesis, the expected attributable risk of this particular gene to familial pancreatic

cancer is fairly low. Several approaches can be taken to identify genetic predisposition for the majority of

FPC cases not linked to a known gene. Candidate genes can be identified based either on function or

connection to the pathway of another established susceptibility gene, which was the rationale for pursuing

BRCA1. It is possible to screen high-risk pancreatic cancer patients for mutations in additional genes

associated with BRCA1 or BRCA2, or even other genes in pathways that have been implicated in

pancreatic tumorigenesis from somatic studies37, but performing Sanger sequencing on all coding regions

of each candidate gene is a costly and laborious process. Furthermore, the functional and pathway

properties of many genes are incompletely understood at this time, thus biasing the investigation to the

relatively small proportion of genes that have been well annotated thus far. One can also derive a

candidate gene list for screening in high-risk subjects based on results of genome-wide association studies

conducted on a large number of sporadic cases; as would be expected, these variants are invariably

associated with low odds ratios in sporadic cases, but some may be of greater significance in smaller

populations enriched for familial cases. However, most variants identified by genome-wide association

studies are not within coding sequences, requiring further fine-mapping and delineation of the actual

genes affected.

Under ideal conditions, genetic linkage analysis would be a powerful approach for identifying high-risk

variants segregating with a disease that is inherited in an autosomal dominant fashion in family-based

studies. Indeed, much effort has been invested in collecting families with multiple cases of pancreatic

cancer in closely-related members for the purpose of performing genetic linkage. One of the largest such

projects has been undertaken by the PACGENE consortium (described in the Literature Search), which

has been investigating FPC genetics for about 10 years. Thus far, no linkage results have been released

by PACGENE, and indeed only one FPC linkage analysis has been published by any group to date, in a

single high-risk family that does not resemble most FPC cases.187 The latter found evidence of linkage to

a region on chromosome 4q and proposed the gene of interest to be Palladin; however, multiple

subsequent analyses of Palladin in high-risk populations refuted it as a likely FPC gene. Genetic linkage

analysis is a statistics-based method that requires a sufficient number of genotyped affected and

unaffected members in a family to generate power for detecting regions segregating with disease status.

It is significantly weakened if there is genetic heterogeneity (i.e. multiple loci involved in causing the

same phenotype) or if some of the affected subjects are phenocopies. Moreover, linkage analysis alone

Page 138: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

125

cannot pinpoint the causative gene, as illustrated by the aforementioned 4q linked region and the failure to

determine the responsible gene in that region. For all those reasons, I elected to approach the FPC

question from two novel directions: mapping the copy-number variable portion of the genome in a cohort

of probands from high-risk families and mapping the whole exome (single nucleotide variants and small

indels) of members of a single high-risk family.

The 2004 seminal papers demonstrating that structural variation of the human genome is detectable in all

individuals, regardless of phenotype or disease status, generated a paradigm shift in the field of

genomics.197,198 After multiple reports established that CNVs are a significant source of genomic

variability, attention turned to investigating their association with disease. To date, the majority of such

studies have been in diseases other than cancer, particularly the neuropsychiatric disorders; however,

copy number alteration is in fact a well-known characteristic of tumor genomes, often causing the

inactivation or amplification of important cancer-suppressing or cancer-driving genes, respectively.

Furthermore, germline genomic rearrangements represent a well-recognized mechanism of heredity in

familial cancer syndromes, usually affecting a small but non-negligible portion of cases. When I

embarked on this study, only two published report of germline CNVs in familial cancer syndromes were

available. The first was a survey of CNVs in 57 FPC subjects using an oligonucleotide-based CGH

array.345 This study presented several candidate regions, but lacked in array resolution and coverage,

sample size, and the size of the control dataset available for data filtration. The second report was based

on Li-Fraumeni syndrome patients who carry TP53 mutations348: the authors found that patients with

germline TP53 mutations have a significantly more unstable genome, manifested as higher frequency of

germline copy number variation than control genomes. They proposed that the increased frequency of

CNVs in Li-Fraumeni genomes predisposes to somatic expansion of deletions or duplications that affect

cancer-suppressing or cancer-driving genes, respectively. Since pancreatic cancer contains a high degree

of somatic genome instability, I hypothesized that the genomic profile of germline CNVs in FPC patients

may be distinct from that of controls. Furthermore, I hypothesized that identifying germline deletions or

duplications in cases that are not observed in healthy controls would generate a list of candidate

susceptibility genes for FPC.

For the third chapter of my thesis, I focused on a single family that was part of my CNV study. This

family contained two siblings (in a sibship of seven) who had died of pancreatic cancer at young ages

(30s and 40s), and whose mother and maternal uncle also died of the disease. At the time of this study,

the technology for sequencing most of the coding region of the genome (i.e. the exome) had become

accessible for considerably lower expense than in the past. Many studies had been published describing

the use of whole-exome analysis to pinpoint the causative variant in rare Mendelian disorders. Only one

report applying whole-exome sequencing to familial cancer had been published, showing PALB2 to be a

Page 139: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

126

susceptibility gene for FPC. Notably, this latter paper did not use exome-capture and next-generation

sequencing as with all other reports, but was based on a large-scale Sanger-sequencing based effort to

sequence pancreatic tumors and paired blood-derived DNA to identify germline variants. I hypothesized

that whole-exome sequencing would reveal susceptibility genes in this high-risk family by identifying

rare variants shared by affected members.

My CNV results refuted the first part of my hypothesis, indicating that no discernible difference in

genome stability or other CNV characteristics exist between FPC cases and healthy controls. Since

conducting my study, only a couple of other such studies have been published in familial/hereditary

cancer populations.346,347 Neither offered much beyond a list of susceptibility genes, as we have done, and

neither described a significant difference in the frequency of germline CNVs between cases and controls.

While it is difficult to draw firm conclusions based on only a few studies, thus far there is little to suggest

that the phenomenon observed by Shlien et al.348 in Li-Fraumeni patients is replicated in other familial

cancer cases. TP53 is known to act as the “guardian of the genome”.575 Given our observations, we

would conclude that most FPC cases are not caused by mutations in genes with a similar impact on

genomic stability. Furthermore, CNVs in general do not appear to play as significant a role in

susceptibility to most familial cancers as they do in other diseases like neuropsychiatric and

developmental disorders.

Both the CNV study and the whole-exome analysis relied on relatively novel technology and were

significantly dependent on recently developed bioinformatic tools, and as such both had limitations

related to the technology and/or the available resources for analyzing the data. In the CNV study, the

Affymetrix GeneChip Human Mapping 500K SNP array used for CNV detection, consisting of two chips

that together genotype approximately 500,000 genome-wide SNPs, was originally designed for the

purpose of accurate SNP genotyping to enable sufficiently powered SNP-based genome-wide association

studies. As such, SNPs selected for inclusion in the array underwent rigorous validation for accuracy of

genotype, call rate, and linkage disequilibrium in different populations, but the probe design was not

optimized for accurate copy number. The median physical distance between SNPs on the array is 2.5kb,

but the density of genotyped SNPs across the genome is not uniform resulting in excellent coverage for

some regions and incomplete or entirely absent coverage in others. Nonetheless, at the time of my study

design, this array was one of the highest-resolution and best coverage platforms available for CNV

detection. When subsequent generations of CNV detection platforms were developed, it became evident

that most common CNVs tend to not be captured well by the Affy 500K array, due to this bias of SNP

distribution. However, this was not a significant concern for my analysis since I was specifically

interested in rare or low-frequency deletions or duplications. Since the use of the Affy500K array for

CNV analysis only began shortly before the design of my study, new algorithms had to be developed to

Page 140: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

127

analyze the data with variable sensitivity and specificity. Therefore, it was necessary to use multiple

algorithms, and moreover I needed to demonstrate an approach that generates a well-validated set of

CNVs. I utilized qPCR to validate a subset of CNVs, but given the time- and resource-consuming nature

of individually validating individual CNVs in this manner, I also performed a secondary CNV analysis on

a subset of samples using a newer array (the Affy6.0). Therefore, I generated a set of high-confidence

CNVs with a validation rate of 95% or higher and due to logistical constraints I did not address any of the

remaining low-confidence CNVs. Since my validation experiments suggested that approximately half of

the 491 low-confidence case CNVs are likely to be real, it is likely that my approach missed some

additional FPC-specific CNVs containing candidate genes. Future investigations of FPC cases using

newer and higher-resolution platforms would serve to validate my results as well as fill the gaps in

coverage due to limitations of the 500K array and my analysis strategies.

Similarly, the technologies and algorithms used for studying the high-risk Family C were rapidly evolving

even as I was conducting my study. First, no target-capture array available to me at the time of my study

targeted 100% of coding regions in the genome, but rather they aimed to capture most of the well-

annotated coding regions. Even then, technical problems sometimes result in incomplete capture of this

target region. One of my samples, from the uncle in Family C, could not be sequenced to the same depth

of coverage as the remaining samples due to technical problems, resulting in a significantly lower number

of variants called in this individual. Since my hypothesis relied to a greater extent on filtering unshared

variants between the affected cases, and since the uncle’s second-degree relation to the siblings means

that he is expected to share fewer variants with the siblings than they share with each other, the

incomplete variant list generated in the uncle invariably meant that I would almost certainly miss

potential candidate genes if I included the uncle’s exome data. To address this shortcoming, I presented

alternative filtering models that did not necessarily exclude variants shared by the siblings but not called

in the uncle. As expected, these models generate considerably longer variant lists and require other

methods of prioritizing the results for further investigation. Furthermore, since my project was conducted

as a collaboration with the laboratory that performed the whole-exome sequencing, I was not directly

involved in running the analysis pipeline implemented by their group. I was able to validate the resultant

variant calls and determined that the dataset generated by this pipeline was accurate for both heterozygous

single variant and indel calls; validation of homozygous variant calls, however, was significantly lower. I

could not directly assess sensitivity on a large scale and so it is possible that additional true variants were

missed. Therefore, as with the CNV analysis, I needed to prioritize high specificity of variant calling at

the expense of slightly lower sensitivity so that I could work with a reliable dataset for downstream

analysis.

Page 141: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

128

Another common theme to both types of analyses is the large number of variants generated for each

sample, even after applying quality controls to ensure maximum validity of data. Clearly, this is a direct

result of the higher resolution and genome-wide coverage of these approaches compared to older

techniques that assess only one or a few genomic regions or genes at a time. While such high coverage is

one of the primary attractive features of these technologies, it also creates significant challenges in

interpretation and prioritization of data. One component of data prioritization in my studies was the focus

on “rare” variants; since I am most interested in identifying variants with a relatively high effect size to

explain familial inheritance of pancreatic cancer, the frequency of such variant in the general population

is expected to be very low. The identification of a rare variant posed some interesting challenges for the

CNV and the exome analyses. To interpret the significance of CNVs, particularly in the context of my

hypothesis, I needed to have a control set for comparison to the cases. Approximately 45 spousal controls

were selected for genotyping alongside the cases; genotyping additional controls was not feasible at the

time due to financial constraints. Instead, I took advantage of a large control cohort that was previously

genotyped on the same Affy 500k array for a genome-wide association study of colorectal cancer

(ARCTIC). Approximately 1,100 controls were genotyped at a different facility from the cases, but I

analyzed these controls in a parallel manner to the cases, applying the same algorithm parameters and

filtering rules. It became evident during analysis that there was a greater level of noise in the ARCTIC

controls, manifesting as a greater proportion of control CNVs that were “low-confidence”. This

highlights the importance of study design in facilitating CNV analysis, which is more sensitive to “batch-

effect” than SNP studies. These data also suggests that some real CNVs in controls may be missed in our

analysis, and if those regions overlap rare CNVs in cases then they would be inaccurately identified as

candidate FPC-specific CNVs under our hypothesis. To address this concern, I noted the FPC-specific

CNVs that overlapped a low-confidence CNV in controls and validated the region before investigating

that region further. Furthermore, I also utilized the Database of Genomic Variants (DGV), but the quality

of data in this resource is directly linked to the limitations of the platform and algorithms used in each

source publication. While I was unable to determine the accuracy of each data source, I chose to exclude

CNVs detected by studies that used BAC clone arrays because those were later demonstrated to greatly

overestimate CNV size.

For filtering the exome variants, I turned to the dbSNP database which is continuously updated and

houses a large set of single base as well as indel variants. Older versions of dbSNP were largely

populated by data from the HapMap study, which mostly identified common variants present at a

population frequency of > 1%. However, as more human genomes were being sequenced in their

entirety, including results from the 1000 genome project and the Exome Sequencing Project, the dataset

became more difficult to interpret since most variants were not adequately validated and/or their

Page 142: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

129

population frequency were not calculated, and many variants had a minor allele frequency < 1%. Indeed,

for my exome analysis, I decided to use a relatively strict definition of “rare” (< 0.2%) since variants with

higher frequencies have been described as “low-frequency” variants and some have been demonstrated to

have an intermediate effect size on disease predisposition rather than the high-penetrance effect in which I

am interested.494,495 It should be noted that indel reporting in dbSNP is significantly less accurate than

single nucleotide variants, particularly from next-generation sequencing platforms. As such, the accurate

determination of population frequency of indels is even more challenging. Moreover, dbSNP has been

contaminated with somatic variants found in tumors and other potentially pathogenic germline variants in

cancer. Therefore, I performed a careful screen of my final dataset to ensure that I did not filter out a

variant linked to cancer if the frequency of the variant was low.

Beyond filtering by frequency of variants, I attempted to take advantage of common phenotypes. For

CNVs, I attempted to identify CNVs present in multiple cases (but not in controls), but ultimately found

none (except for the TGFBR3 duplication, discussed below). For the exome data, I filtered by shared

variants among the three affected relatives, incorporating an unaffected family member as a negative

control (i.e. to filter out variants identified in this relative). My rationale for doing so was that Family C

had a very strong history of pancreatic cancer occurring at young ages in most of the affecteds, and thus

the unaffected 80-year-old aunt seemed significantly less likely to be a carrier of the putative high-

penetrance variant responsible for the disease in this family. Indeed, I modeled our primary filtering

approach on this premise, and it successfully reduced the number of eligible candidate genes to a

workable size. However, since I do not know the actual penetrance of the variant in question, I risked

losing the actual causal FPC gene by excluding all variants found in the aunt. I offered alternative

filtering models that took this ambiguity into account, and they generated significantly longer lists of

candidate genes.

In addition to using other cases (or family members) to filter variants, I turned to functional annotation.

For my CNV data, I focused on coding region variants and turned to available databases containing

somatic cancer variants (COSMIC) and pancreatic expression data (Pancreas Expression Database) to

annotate involved genes. While many genes did have potential connections to pancreatic cancer or

carcinogenesis in general, it was evident that none were immediately obvious candidates. This again

emphasizes the limitations of available functional annotation for most genes, and the challenge in

utilizing this approach to identifying susceptibility genes. Similarly, I attempted to prioritize variants

from my exome analysis based on likelihood to damage protein function (using two well-known

algorithms), as well as referring to the aforementioned databases for gene annotation. However, it is

difficult to be certain of the accuracy of prediction for any one variant, particularly if the prediction is

“benign” or “tolerated”, without adequate functional assays.

Page 143: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

130

In both the CNV and exome analyses, I selected top-prioritized candidate genes for further investigation.

In the CNV study, overlapping duplications in two unrelated cases were found to intersect TGFBR3, a

receptor gene in the TGF-beta pathway that is of importance in the initiation and progression of

pancreatic cancer. This region overlapped only one duplication in controls, but with different breakpoints

from the case CNVs. Importantly, the control CNV did not appear to extend into the gene except for a

small part of one isoform that was longer than most other isoforms. I conducted a series of experiments

to validate the duplications in the cases, demonstrate heritability of the CNV in members of one of the

subjects’ families, delineate the exact location of the CNV breakpoints, and sequence the amplicon

containing tandem duplication breakpoints. However, an affected sister of the proband with only FFPE

tissue available did not harbor the duplication, indicating that it does not segregate with disease in that

family. In my exome analysis, I performed Sanger sequencing of all exons in the two top-ranked genes

identified by filtering Model #1 (rare variants shared by the three affecteds, absent in the unaffected

relative). Each gene had an exome variant predicted to be damaging, and both were reported to be have

potential tumor-suppressor roles. Yet, I did not find any other rare variants in the ~70 unrelated cases that

I screened.

These results raise several important issues. First, they highlight the significant challenge associated with

using a limited number of samples in genome-wide analyses such as CNV surveys or exome sequencing.

In the case of CNVs, since only a small percentage of all FPC cases attributed to a particular gene would

be expected to have a genomic rearrangement rather than a single base mutation or indel in that gene, a

small sample size reduces the likelihood of identifying multiple cases with the same affected gene. This

is particularly more challenging due to genetic heterogeneity. The fact that linkage analysis on the best

available families to date has failed to generate strong locus-specific linkage scores strongly suggests that

the families included in the analysis have different causal genes. Alternatively, there may be inaccuracy

in identifying FPC families, leading to inclusion of subjects who do not carry a high-penetrance variant.

For the exome analysis, it is evident that every individual genome contains a large number of low-

frequency or rare variants, many of which appear to be potentially damaging. Therefore, in a family-

based design, it is most helpful to sequence multiple affected subjects who have some genetic distance

(i.e. not just first-degree pairs) to maximize the filtering potential of identifying shared variants. Even

then, use of whole-exome data in a single family to identify a dominant-acting variant is difficult. Most

successful exome analyses of dominant Mendelian diseases have used more than one family, or at least in

the case of cancer they have utilized data from paired tumor genome to identify second-hits in candidate

genes. Genetic heterogeneity may also pose a problem in this setting, since the accepted method of

conclusively demonstrating involvement of a gene in prediposition to familial cancer is by identifying

rare deleterious variants in the same gene in other unrelated cases. However, if there are many different

Page 144: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

131

genes that cause the disease, the possibility exists of “family-specific” genes (or more likely, genes acting

in a small percentage of families). This makes the decision to discard genes that do not demonstrate

variants in other samples difficult. Finally, there always remains the possibility that a non-coding variant

(whether a CNV or SNV/indel) may in fact be the causative agent. The reason for prioritizing coding

region of the genome in these types of analyses is more practical rather than dogmatic: while it is evident

from a number of studies that apparent “gene deserts” or unexpressed regions of genes such as introns can

impact gene expression (short- or long-range), there is little to no annotation of those regions to allow

prioritization and interpretation of the potential variant effect. Given that genic regions alone generate

sufficiently long lists of candidate genes, many studies, including mine, elect to ignore the non-genic

regions. However, should extensive investigations of the exome fail to yield answers, it will become

necessary to cast a wider net and characterize non-coding variants.

Conclusions I have successfully tested and proven my first hypothesis (that LOH occurs frequently at the BRCA1 locus

in pancreatic tumors from germline BRCA1-mutation carriers), thus contributing novel information to

understanding the role of BRCA1 in susceptibility to pancreatic cancer. For my second hypothesis, I

found no evidence of a distinct CNV profile in high-risk pancreatic cancer cases relative to controls but

demonstrated that FPC-specific losses and gains overlap some genes that have the potential to be involved

in pancreatic tumorigenesis. My data constitute the most comprehensive set of annotated germline CNVs

in high-risk familial pancreatic cancer patients to date. Finally, for the third part of my thesis, I applied a

heirarchical filtering approach to generate a list of candidate susceptibility genes responsible for FPC.

Similar to the list of genes generated by my CNV analysis, the exome candidates include many that have

a potential role in tumorigenesis. The combined list of genes generated by my thesis represents an

important resource for future studies of candidate FPC susceptibility genes.

Future Directions

As discussed above, a number of follow-up investigations flow naturally from the results of my studies,

including: validation of detected variants using more uptodate, higher resolution platforms and larger

sample sizes; sequencing the entire coding region of candidate genes identified by the CNV and/or exome

analysis in additional cases; and performing additional exome sequencing on other families to increase the

power to detect additional variants in the same gene(s).

In addition, several new directions may be taken in the future for the investigation of heritable

susceptibility to familial pancreatic cancer. One limitation to my studies was the focus on protein-coding

Page 145: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

132

genes as the causative agent for heritability of pancreatic cancer. In part, this was necessary because of

the relative lack of annotation of non-coding regions of the genome and the challenge of studying such

regions. Another constraint is the single-view approach of each study; only one platform was utilized at a

time, and generalizing the results of different platforms used in different samples is challenging. A more

valuable approach would be to integrate data from multiple profiling techniques (e.g. genomic,

epigenomic, transcriptomic, immunohistochemistry) for specimens from the same individuals, thus

allowing for a more comprehensive assessment of potential hertiable factors in disease susceptibility. Of

course, there are practical limitations to such an approach, foremost among them the challenge of

obtaining pancreatic tumors from familial cancer patients due to the high mortality of the disease.

However, the aforementioned ICGC consortium has been addressing this issue by prospectively

collecting tumor specimens and developing xenografts and cell lines to allow further investigations on

recruited subjects.

An important question that arises after considering the results of my studies is whether a significant

portion of familial pancreatic cancer cases can be explained by relatively highly-penetrant variants in a

single gene. The fact that I did not find evidence for one gene being affected by deleterious variants in

more than one family suggests the possibility of many private genes contributing to familial pancreatic

cancer in different families. This would make the identification of such susceptibility genes considerably

more difficult. Certainly, functional analyses genes would become much more important in delineating

the causative agents, but pathway analysis may aid in identifying genes affected in different individuals

that lead to similar outcomes (i.e. pancreatic cancer development).

Another possibility that must be considered is the role of intermediate-effect variants and gene-gene

interactions within the same individual. Recently, our group has found evidence of rare deleterious

variants in cancer-predisposing genes that do not segregate with all pancreatic cancer patients in the same

family. While the non-carriers may be phenocopies, this observation also raises important questions

about the extent of genotyping that should be performed in a given family before attributing familial

cancer to a specific gene, and the importance of more extensive population data in understanding the

effect size of rare variants. Such data is forthcoming from large-scale exome and genome-sequencing

projects (such as the 1000 Genomes Project and the Exome Sequencing Project), but it also requires the

assessment of much larger FPC cohorts.

Page 146: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

133

References

1. Hruban RH, Fukushima N. Pancreatic adenocarcinoma: update on the surgical pathology of

carcinomas of ductal origin and PanINs. Mod Pathol. 2007 Feb;20 Suppl 1:S61-70.

2. Howlader N, Noone AM, Krapcho M, et al. (eds). SEER Cancer Statistics Review, 1975-2008,

National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2008/, based on November

2010 SEER data submission, posted to the SEER web site, 2011.

3. Canadian Cancer Society’s Steering Committee on Cancer Statistics. Canadian Cancer Statistics

2011. Toronto, ON: Canadian Cancer Society; 2011.

4. Tada M, Nakai Y, Sasaki T, et al. Recent progress and limitations of chemotherapy for pancreatic

and biliary tract cancers. World J Clin Oncol. 2011 Mar 10;2(3):158-63.

5. Cleary SP, Gryfe R, Guindi M, et al. Prognostic factors in resected pancreatic adenocarcinoma:

analysis of actual 5-year survivors. J Am Coll Surg. 2004 May;198(5):722-31.

6. Sipos B, Frank S, Gress T, et al. Pancreatic intraepithelial neoplasia revisited and updated.

Pancreatology. 2009;9(1-2):45-54.

7. Hruban RH, Goggins M, Parsons J, et al. Progression model for pancreatic cancer. Clin Cancer Res.

2000 Aug;6(8):2969-72.

8. Yamaguchi K, Yokohata K, Noshiro H, et al. Mucinous cystic neoplasm of the pancreas or intraductal

papillary-mucinous tumour of the pancreas. Eur J Surg 2000;166(2):141–148.

9. Tanaka M, Chari S, Adsay NV, et al. International consensus guidelines for management of

intraductal papillary mucinous neoplasms and mucinous cystic neoplasms of the pancreas.

Pancreatology 2006;6(17):32.

10. Canto MI, Goggins M, Yeo CJ, et al. Screening for pancreatic neoplasia in high-risk individuals: an

EUS-based approach. Clin Gastroenterol Hepatol. 2004 Jul;2(7):606-21.

11. Canto MI, Goggins M, Hruban RH, et al. Screening for early pancreatic neoplasia in high-risk

individuals: a prospective controlled study. Clin Gastroenterol Hepatol. 2006 Jun;4(6):766-81.

12. Abe K, Suda K, Arakawa A, et al. Different patterns of p16INK4A and p53 protein expressions in

intraductal papillary-mucinous neoplasms and pancreatic intraepithelial neoplasia. Pancreas. 2007

Jan;34(1):85-91.

13. Tanno S, Nakano Y, Nishikawa T, et al. Natural history of branch duct intraductal papillary-mucinous

neoplasms of the pancreas without mural nodules: long-term follow-up results. Gut. 2008

Mar;57(3):339-43.

14. Al-Sukhni W, Borgida A, Rothenmund H, et al. Screening for pancreatic cancer in a high-risk cohort:

an eight-year experience. J Gastrointest Surg. 2012 Apr;16(4):771-83.

Page 147: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

134

15. Jeurnink SM, Vleggaar FP, Siersema PD. Overview of the clinical problem: facts and current issues

of mucinous cystic neoplasms of the pancreas. Dig Liver Dis. 2008 Nov;40(11):837-46.

16. Maitra A, Hruban RH. Pancreatic cancer. Annu Rev Pathol. 2008;3:157-88.

17. Maitra A, Fukushima N, Takaori K, et al. Precursors to invasive pancreatic cancer. Adv Anat Pathol.

2005 Mar;12(2):81-91.

18. Calhoun ES, Jones JB, Ashfaq R, et al. BRAF and FBXW7 (CDC4, FBW7, AGO, SEL10) mutations

in distinct subsets of pancreatic cancer: potential therapeutic targets. Am J Pathol. 2003

Oct;163(4):1255-60.

19. Cheng JQ, Ruggeri B, Klein WM, et al. Amplification of AKT2 in human pancreatic cells and

inhibition of AKT2 expression and tumorigenicity by antisense RNA. Proc Natl Acad Sci U S A.

1996 Apr 16;93(8):3636-41.

20. Morris JP 4th, Wang SC, Hebrok M. KRAS, Hedgehog, Wnt and the twisted developmental biology

of pancreatic ductal adenocarcinoma. Nat Rev Cancer. 2010 Oct;10(10):683-95.

21. Thayer SP, di Magliano MP, Heiser PW, et al. Hedgehog is an early and late mediator of pancreatic

cancer tumorigenesis. Nature. 2003 Oct 23;425(6960):851-6.

22. Satoh K, Kanno A, Hamada S, et al. Expression of Sonic hedgehog signaling pathway correlates with

the tumorigenesis of intraductal papillary mucinous neoplasm of the pancreas. Oncol Rep. 2008

May;19(5):1185-90.

23. Morton JP, Mongeau ME, Klimstra DS, et al. Sonic hedgehog acts at multiple stages during

pancreatic tumorigenesis. Proc Natl Acad Sci U S A. 2007 Mar 20;104(12):5103-8.

24. Dai J, Ai K, Du Y, et al. Sonic hedgehog expression correlates with distant metastasis in pancreatic

adenocarcinoma. Pancreas. 2011 Mar;40(2):233-6.

25. Feldmann G, Karikari C, dal Molin M, et al. Inactivation of Brca2 cooperates with Trp53(R172H) to

induce invasive pancreatic ductal adenocarcinomas in mice: a mouse model of familial pancreatic

cancer. Cancer Biol Ther. 2011 Jun 1;11(11):959-68.

26. Maitra A, Hruban RH. Pancreatic cancer. Annu Rev Pathol. 2008;3:157-88.

27. Redston MS, Caldas C, Seymour AB, et al. p53 mutations in pancreatic carcinoma and evidence of

common involvement of homocopolymer tracts in DNA microdeletions. Cancer Res. 1994;54:3025–

33.

28. Iacobuzio-Donahue CA, Klimstra DS, et al. Dpc-4 protein is expressed in virtually all human

intraductal papillary mucinous neoplasms of the pancreas: comparison with conventional ductal

carcinomas. Am J Pathol. 2000;157(3):755–761.

29. Blackford A, Serrano OK, Wolfgang CL, et al. SMAD4 gene mutations are associated with poor

prognosis in pancreatic cancer. Clin Cancer Res. 2009 Jul 15;15(14):4674-9.

Page 148: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

135

30. van Heek NT, Meeker AK, Kern SE, et al. Telomere shortening is nearly universal in pancreatic

intraepithelial neoplasia. Am J Pathol. 2002;161:1541–47.

31. Siveke JT, Schmid RM. Chromosomal instability in mouse metastatic pancreatic cancer--it's Kras

and Tp53 after all. Cancer Cell. 2005 May;7(5):405-7.

32. Hiyama E, Kodama T, Shinbara K, et al. Telomerase activity is detected in pancreatic cancer but not

in benign tumors. Cancer Res. 1997 Jan 15;57(2):326-31.

33. Sato N, Goggins M. The role of epigenetic alterations in pancreatic cancer. J Hepatobiliary Pancreat

Surg. 2006;13:286–95.

34. Sato N, Maitra A, Fukushima N, et al. Frequent hypomethylation of multiple genes overexpressed in

pancreatic ductal adenocarcinoma. Cancer Res. 2003;63:4158–66.

35. Szafranska AE, Davison TS, John J, et al. MicroRNA expression alterations are linked to

tumorigenesis and non-neoplastic processes in pancreatic ductal adenocarcinoma. Oncogene

2007;26:4442–52.

36. Erkan M, Reiser-Erkan C, Michalski CW, et al. Tumor microenvironment and progression of

pancreatic cancer. Exp Oncol. 2010 Sep;32(3):128-31.

37. Jones S, Zhang X, Parsons DW, et al. Core signaling pathways in human pancreatic cancers revealed

by global genomic analyses. Science. 2008 Sep 26;321(5897):1801-6.

38. Campbell PJ, Yachida S, Mudie LJ, et al. The patterns and dynamics of genomic instability in

metastatic pancreatic cancer. Nature. 2010 Oct 28;467(7319):1109-13.

39. Fuchs CS, Colditz GA, Stampfer MJ, et al. A prospective study of cigarette smoking and the risk of

pancreatic cancer. Arch Intern Med. 1996 Oct 28;156(19):2255-60.

40. Genkinger JM, Spiegelman D, Anderson KE, et al. Alcohol intake and pancreatic cancer risk: a

pooled analysis of fourteen cohort studies. Cancer Epidemiol Biomarkers Prev. 2009 Mar;18(3):765-

76.

41. Santibañez M, Vioque J, Alguacil J, et al. Occupational exposures and risk of pancreatic cancer. Eur J

Epidemiol. 2010 Oct;25(10):721-30.

42. Huxley R, Ansary-Moghaddam A, Berrington de González A, et al. Type-II diabetes and pancreatic

cancer: a meta-analysis of 36 studies. Br J Cancer. 2005;92: 2076–2083.

43. Risch HA, Yu H, Lu L, Kidd MS. ABO blood group, Helicobacter pylori seropositivity, and risk of

pancreatic cancer: a case-control study. J Natl Cancer Inst. 2010 Apr 7;102(7):502-5.

44. Talamini G, Falconi M, Bassi C, et al. Incidence of cancer in the course of chronic pancreatitis. Am J

Gastroenterol. 1999 May;94(5):1253-60.

45. Eppel A, Cotterchio M, Gallinger S. Allergies are associated with reduced pancreas cancer risk: A

population-based case-control study in Ontario, Canada. Int J Cancer. 2007 Nov 15;121(10):2241-5.

Page 149: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

136

46. Bao Y, Ng K, Wolpin BM, et al. Predicted vitamin D status and pancreatic cancer risk in two

prospective cohort studies. Br J Cancer. 2010 Apr 27;102(9):1422-7.

47. Stolzenberg-Solomon RZ, Jacobs EJ, Arslan AA, et al. Circulating 25-hydroxyvitamin D and risk of

pancreatic cancer: Cohort Consortium Vitamin D Pooling Project of Rarer Cancers. Am J Epidemiol.

2010 Jul 1;172(1):81-93.

48. Jansen RJ, Robinson DP, Stolzenberg-Solomon RZ, et al. Fruit and vegetable consumption is

inversely associated with having pancreatic cancer. Cancer Causes Control. 2011 Dec;22(12):1613-

25.

49. Jiao L, Mitrou PN, Reedy J, et al. A combined healthy lifestyle score and risk of pancreatic cancer in

a large cohort study. Arch Intern Med. 2009 Apr 27;169(8):764-70.

50. Prizment AE, Gross M, Rasmussen-Torvik L, et al. Genes related to diabetes may be associated with

pancreatic cancer in a population-based case-control study in Minnesota. Pancreas. 2012

Jan;41(1):50-3.

51. Dong X, Li Y, Tang H, et al. Insulin-like growth factor axis gene polymorphisms modify risk of

pancreatic cancer. Cancer Epidemiol. 2012 Apr;36(2):206-11.

52. Li D, Tanaka M, Brunicardi FC, et al. Association between somatostatin receptor 5 gene

polymorphisms and pancreatic cancer risk and survival. Cancer. 2011 Jul 1;117(13):2863-72.

53. Dong X, Li Y, Chang P, et al. DNA mismatch repair network gene polymorphism as a susceptibility

factor for pancreatic cancer. Mol Carcinog. 2011 Jun 16. doi: 10.1002/mc.20817.

54. Pierce BL, Ahsan H. Genome-wide "pleiotropy scan" identifies HNF1A region as a novel pancreatic

cancer susceptibility locus. Cancer Res. 2011 Jul 1;71(13):4352-8.

55. Theodoropoulos GE, Panoussopoulos GS, Michalopoulos NV, et al. Analysis of the stromal cell-

derived factor 1-3'A gene polymorphism in pancreatic cancer. Mol Med Report. 2010 Jul-

Aug;3(4):693-8.

56. Pierce BL, Austin MA, Ahsan H. Association study of type 2 diabetes genetic susceptibility variants

and risk of pancreatic cancer: an analysis of PanScan-I data. Cancer Causes Control. 2011

Jun;22(6):877-83.

57. Mazaki T, Masuda H, Takayama T. Polymorphisms and pancreatic cancer risk: a meta-analysis. Eur

J Cancer Prev. 2011 May;20(3):169-83.

58. Dong X, Li Y, Chang P, et al. Glucose metabolism gene variants modulate the risk of pancreatic

cancer. Cancer Prev Res (Phila). 2011 May;4(5):758-66.

59. Diergaarde B, Brand R, Lamb J, et al. Pooling-based genome-wide association study implicates

gamma-glutamyltransferase 1 (GGT1) gene in pancreatic carcinogenesis. Pancreatology. 2010;10(2-

3):194-200.

Page 150: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

137

60. Theodoropoulos GE, Michalopoulos NV, Panoussopoulos SG, et al. Effects of caspase-9 and survivin

gene polymorphisms in pancreatic cancer risk and tumor characteristics. Pancreas. 2010

Oct;39(7):976-80.

61. Fong PY, Fesinmeyer MD, White E, et al. Association of diabetes susceptibility gene calpain-10 with

pancreatic cancer among smokers. J Gastrointest Cancer. 2010 Sep;41(3):203-8.

62. Chen J, Amos CI, Merriman KW, et al. Genetic variants of p21 and p27 and pancreatic cancer risk in

non-Hispanic Whites: a case-control study. Pancreas. 2010 Jan;39(1):1-4.

63. Vrana D, Novotny J, Holcatova I, et al. CYP1B1 gene polymorphism modifies pancreatic cancer risk

but not survival. Neoplasma. 2010;57(1):15-9.

64. McWilliams RR, Petersen GM, Rabe KG, et al. Cystic fibrosis transmembrane conductance regulator

(CFTR) gene mutations and risk for pancreatic adenocarcinoma. Cancer. 2010 Jan 1;116(1):203-9.

65. Vrana D, Pikhart H, Mohelnikova-Duchonova B, et al. The association between glutathione S-

transferase gene polymorphisms and pancreatic cancer in a central European Slavonic population.

Mutat Res. 2009 Nov-Dec;680(1-2):78-81.

66. Duell EJ, Holly EA, Kelsey KT, et al. Genetic variation in CYP17A1 and pancreatic cancer in a

population-based case-control study in the San Francisco Bay Area, California. Int J Cancer. 2010

Feb 1;126(3):790-5.

67. Fesinmeyer MD, Stanford JL, Brentnall TA, et al. Association between the peroxisome proliferator-

activated receptor gamma Pro12Ala variant and haplotype and pancreatic cancer in a high-risk cohort

of smokers: a pilot study. Pancreas. 2009 Aug;38(6):631-7.

68. Zhao D, Xu D, Zhang X, et al. Interaction of cyclooxygenase-2 variants and smoking in pancreatic

cancer: a possible role of nucleophosmin. Gastroenterology. 2009 May;136(5):1659-68.

69. McWilliams RR, Bamlet WR, de Andrade M, et al. Nucleotide excision repair pathway

polymorphisms and pancreatic cancer risk: evidence for role of MMS19L. Cancer Epidemiol

Biomarkers Prev. 2009 Apr;18(4):1295-302.

70. Hamacher R, Diersch S, Scheibel M, et al. Interleukin 1 beta gene promoter SNPs are associated with

risk of pancreatic cancer. Cytokine. 2009 May;46(2):182-6.

71. Li D, Suzuki H, Liu B, et al. DNA repair gene polymorphisms and risk of pancreatic cancer. Clin

Cancer Res. 2009 Jan 15;15(2):740-6.

72. Suzuki H, Li Y, Dong X, et al. Effect of insulin-like growth factor gene polymorphisms alone or in

interaction with diabetes on the risk of pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2008

Dec;17(12):3467-73.

73. Suzuki T, Matsuo K, Sawaki A, et al. Alcohol drinking and one-carbon metabolism-related gene

polymorphisms on pancreatic cancer risk. Cancer Epidemiol Biomarkers Prev. 2008

Oct;17(10):2742-7.

Page 151: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

138

74. Ohnami S, Sato Y, Yoshimura K, et al. His595Tyr polymorphism in the methionine synthase

reductase (MTRR) gene is associated with pancreatic cancer risk. Gastroenterology. 2008

Aug;135(2):477-88.

75. Yang M, Sun T, Wang L, et al. Functional variants in cell death pathway genes and risk of pancreatic

cancer. Clin Cancer Res. 2008 May 15;14(10):3230-6.

76. Ayaz L, Ercan B, Dirlik M, et al. The association between N-acetyltransferase 2 gene polymorphisms

and pancreatic cancer. Cell Biochem Funct. 2008 Apr;26(3):329-33.

77. Jiao L, Hassan MM, Bondy ML, et al. XRCC2 and XRCC3 gene polymorphism and risk of

pancreatic cancer. Am J Gastroenterol. 2008 Feb;103(2):360-7.

78. Jiao L, Hassan MM, Bondy ML, et al. The XPD Asp312Asn and Lys751Gln polymorphisms,

corresponding haplotype, and pancreatic cancer risk. Cancer Lett. 2007 Jan 8;245(1-2):61-8.

79. Wang L, Miao X, Tan W, et al. Genetic polymorphisms in methylenetetrahydrofolate reductase and

thymidylate synthase and risk of pancreatic cancer. Clin Gastroenterol Hepatol. 2005 Aug;3(8):743-

51.

80. Li D, Jiao L, Li Y, et al. Polymorphisms of cytochrome P4501A2 and N-acetyltransferase genes,

smoking, and risk of pancreatic cancer. Carcinogenesis. 2006 Jan;27(1):103-11.

81. Bartsch DK, Fendrich V, Slater EP, et al. RNASEL germline variants are associated with pancreatic

cancer. Int J Cancer. 2005 Dec 10;117(5):718-22.

82. Ockenga J, Vogel A, Teich N, et al. UDP glucuronosyltransferase (UGT1A7) gene polymorphisms

increase the risk of chronic pancreatitis and pancreatic cancer. Gastroenterology. 2003

Jun;124(7):1802-8.

83. Duell EJ, Holly EA, Bracci PM, et al. A population-based study of the Arg399Gln polymorphism in

X-ray repair cross- complementing group 1 (XRCC1) and risk of pancreatic adenocarcinoma. Cancer

Res. 2002 Aug 15;62(16):4630-6.

84. Amundadottir L, Kraft P, Stolzenberg-Solomon RZ, et al. Genome-wide association study identifies

variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat Genet. 2009

Sep;41(9):986-90.

85. Petersen GM, Amundadottir L, Fuchs CS, et al. A genome-wide association study identifies

pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat Genet. 2010

Mar;42(3):224-8.

86. Low SK, Kuchiba A, Zembutsu H, et al. Genome-wide association study of pancreatic cancer in

Japanese population. PLoS One. 2010 Jul 29;5(7):e11824.

87. Wu C, Miao X, Huang L, et al. Genome-wide association study identifies five loci associated with

susceptibility to pancreatic cancer in Chinese populations. Nat Genet. 2011 Dec 11;44(1):62-6.

Page 152: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

139

88. Wolpin BM, Kraft P, Gross M, et al. Pancreatic cancer risk and ABO blood group alleles: results

from the pancreatic cancer cohort consortium. Cancer Res. 2010 Feb 1;70(3):1015-23.

89. Risch HA, Yu H, Lu L, et al. ABO blood group, Helicobacter pylori seropositivity, and risk of

pancreatic cancer: a case-control study. J Natl Cancer Inst. 2010 Apr 7;102(7):502-5.

90. Greer JB, Yazer MH, Raval JS, et al. Significant association between ABO blood group and

pancreatic cancer. World J Gastroenterol. 2010 Nov 28;16(44):5588-91.

91. Iodice S, Maisonneuve P, Botteri E, et al. ABO blood group and cancer. Eur J Cancer. 2010

Dec;46(18):3345-50.

92. Wolpin BM, Kraft P, Xu M, et al. Variant ABO blood group alleles, secretor status, and risk of

pancreatic cancer: results from the pancreatic cancer cohort consortium. Cancer Epidemiol

Biomarkers Prev. 2010 Dec;19(12):3140-9.

93. Ben Q, Wang K, Yuan Y, et al. Pancreatic cancer incidence and outcome in relation to ABO blood

groups among Han Chinese patients: a case-control study. Int J Cancer. 2011 Mar 1;128(5):1179-86.

94. Nakao M, Matsuo K, Hosono S, et al. ABO blood group alleles and the risk of pancreatic cancer in a

Japanese population. Cancer Sci. 2011 May;102(5):1076-80.

95. Wang DS, Chen DL, Ren C, et al. ABO blood group, hepatitis B viral infection and risk of pancreatic

cancer. Int J Cancer. 2011 Aug 19. doi: 10.1002/ijc.26376. [Epub ahead of print]

96. Aird I, Lee DR, Roberts JA. ABO blood groups and cancer of oesophagus, cancerof pancreas, and

pituitary adenoma. Br Med J. 1960 Apr 16;1(5180):1163-6.

97. Lennon AM, Klein AP, Goggins M. ABO blood group and other genetic variants associated with

pancreatic cancer. Genome Med. 2010 Jun 22;2(6):39.

98. Giardiello FM, Welsh SB, Hamilton SR, et al. Increased risk of cancer in the Peutz-Jeghers

syndrome. N Engl J Med. 1987 Jun 11;316(24):1511-4.

99. Giardiello FM, Brensinger JD, Tersmette AC, et al. Very high risk of cancer in familial Peutz-Jeghers

syndrome. Gastroenterology. 2000 Dec;119(6):1447-53.

100. Lowenfels AB, Maisonneuve P, Cavallini G, et al. Pancreatitis and the risk of pancreatic cancer.

International Pancreatitis Study Group. N Engl J Med. 1993 May 20;328(20):1433-7.

101. Lowenfels AB, Maisonneuve P, DiMagno EP, et al. Hereditary pancreatitis and the risk of

pancreatic cancer. International Hereditary Pancreatitis Study Group. J Natl Cancer Inst. 1997 Mar

19;89(6):442-6.

102. de Snoo FA, Riedijk SR, van Mil AM, et al. Genetic testing in familial melanoma: uptake and

implications. Psychooncology. 2008 Aug;17(8):790-6.

103. Hahn SA, Greenhalf B, Ellis I, et al. BRCA2 germline mutations in familial pancreatic

carcinoma. J Natl Cancer Inst. 2003 Feb 5;95(3):214-21.

Page 153: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

140

104. Murphy KM, Brune KA, Griffin C, et al. Evaluation of candidate genes MAP2K4, MADH4,

ACVR1B, and BRCA2 in familial pancreatic cancer: deleterious BRCA2 mutations in 17%. Cancer

Res. 2002 Jul 1;62(13):3789-93.

105. Martin ST, Matsubayashi H, Rogers CD, et al. Increased prevalence of the BRCA2 polymorphic

stop codon K3326X among individuals with familial pancreatic cancer. Oncogene. 2005 May

19;24(22):3652-6.

106. Stadler ZK, Salo-Mullen E, Patil SM, et al. Prevalence of BRCA1 and BRCA2 mutations in

Ashkenazi Jewish families with breast and pancreatic cancer. Cancer. 2012 Jan 15;118(2):493-9.

107. Ghiorzo P, Pensotti V, Fornarini G, et al. Contribution of germline mutations in the BRCA and

PALB2 genes to pancreatic cancer in Italy. Fam Cancer. 2012 Mar;11(1):41-47.

108. Schneider R, Slater EP, Sina M, et al. German national case collection for familial pancreatic

cancer (FaPaCa): ten years experience. Fam Cancer. 2011 Jun;10(2):323-30.

109. Slater EP, Langer P, Fendrich V, et al. Prevalence of BRCA2 and CDKN2a mutations in German

familial pancreatic cancer families. Fam Cancer. 2010 Sep;9(3):335-43.

110. Cho JH, Bang S, Park SW, et al. BRCA2 mutations as a universal risk factor for pancreatic

cancer has a limited role in Korean ethnic group. Pancreas. 2008 May;36(4):337-40.

111. Real FX, Malats N, Lesca G, et al. Family history of cancer and germline BRCA2 mutations in

sporadic exocrine pancreatic cancer. Gut. 2002 May;50(5):653-7.

112. Greer JB, Whitcomb DC. Role of BRCA1 and BRCA2 mutations in pancreatic cancer. Gut. 2007

May;56(5):601-5.

113. Goggins M, Schutte M, Lu J, et al. Germline BRCA2 gene mutations in patients with apparently

sporadic pancreatic carcinomas. Cancer Res. 1996 Dec 1;56(23):5360-4.

114. Wooster R, Neuhausen SL, Mangion J, et al. Localization of a breast cancer susceptibility gene,

BRCA2, to chromosome 13q12-13. Science. 1994 Sep 30;265(5181):2088-90.

115. Schutte M, da Costa LT, Hahn SA, et al. Identification by representational difference analysis of

a homozygous deletion in pancreatic carcinoma that lies within the BRCA2 region. Proc Natl Acad

Sci U S A. 1995 Jun 20;92(13):5950-4.

116. Schutte M, Rozenblum E, Moskaluk CA, et al. An integrated high-resolution physical map of the

DPC/BRCA2 region at chromosome 13q12. Cancer Res. 1995 Oct 15;55(20):4570-4.

117. Jones S, Hruban RH, Kamiyama M, et al. Exomic sequencing identifies PALB2 as a pancreatic

cancer susceptibility gene. Science. 2009 Apr 10;324(5924):217.

118. Tischkowitz MD, Sabbaghian N, Hamel N, et al. Analysis of the gene coding for the BRCA2-

interacting protein PALB2 in familial and sporadic pancreatic cancer. Gastroenterology. 2009

Sep;137(3):1183-6.

Page 154: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

141

119. Slater EP, Langer P, Niemczyk E, et al. PALB2 mutations in European familial pancreatic cancer

families. Clin Genet. 2010 Nov;78(5):490-4.

120. Adank MA, van Mil SE, Gille JJ, et al. PALB2 analysis in BRCA2-like families. Breast Cancer

Res Treat. 2011 Jun;127(2):357-62.

121. Lal G, Liu G, Schmocker B, et al. Inherited predisposition to pancreatic adenocarcinoma: role of

family history and germ-line p16, BRCA1, and BRCA2 mutations. Cancer Res. 2000 Jan

15;60(2):409-16.

122. Skudra S, Staka A, Pukitis A, et al. Association of genetic variants with pancreatic cancer.

Cancer Genet Cytogenet 2007;179:76-8.

123. Axilbund JE, Argani P, Kamiyama M, et al. Absence of germline BRCA1 mutations in familial

pancreatic cancer patients. Cancer Biol Ther. 2009 Jan;8(2):131-5.

124. Roberts NJ, Jiao Y, Yu J, et al. ATM mutations in patients with hereditary pancreatic cancer.

Cancer Discov. 2012 Jan;2:41-46.

125. van der Heijden MS, Yeo CJ, Hruban RH, et al. Fanconi anemia gene mutations in young-onset

pancreatic cancer. Cancer Res. 2003 May 15;63(10):2585-8.

126. Rogers CD, van der Heijden MS, Brune K, et al. The genetics of FANCC and FANCG in

familial pancreatic cancer. Cancer Biol Ther. 2004 Feb;3(2):167-9.

127. Rogers CD, Couch FJ, Brune K, et al. Genetics of the FANCA gene in familial pancreatic cancer.

J Med Genet. 2004 Dec;41(12):e126.

128. Couch FJ, Johnson MR, Rabe K, et al. Germ line Fanconi anemia complementation group C

mutations and pancreatic cancer. Cancer Res. 2005 Jan 15;65(2):383-6.

129. Gargiulo S, Torrini M, Ollila S, et al. Germline MLH1 and MSH2 mutations in Italian pancreatic

cancer patients with suspected Lynch syndrome. Fam Cancer. 2009;8(4):547-53.

130. Kastrinos F, Mukherjee B, Tayob N, et al. Risk of pancreatic cancer in families with Lynch

syndrome. JAMA. 2009 Oct 28;302(16):1790-5.

131. Kempers MJ, Kuiper RP, Ockeloen CW, et al. Risk of colorectal and endometrial cancers in

EPCAM deletion-positive Lynch syndrome: a cohort study. Lancet Oncol. 2011 Jan;12(1):49-55.

132. Lindor NM, Petersen GM, Spurdle AB, et al. Pancreatic cancer and a novel MSH2 germline

alteration. Pancreas. 2011 Oct;40(7):1138-40.

133. Ruijs MW, Verhoef S, Rookus MA, et al. TP53 germline mutation testing in 180 families

suspected of Li-Fraumeni syndrome: mutation detection rate and relative frequency of cancers in

different familial phenotypes. J Med Genet. 2010 Jun;47(6):421-8.

134. Groen EJ, Roos A, Muntinghe FL, et al. Extra-intestinal manifestations of familial adenomatous

polyposis. Ann Surg Oncol. 2008 Sep;15(9):2439-50.

Page 155: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

142

135. Sheldon CD, Hodson ME, Carpenter LM, et al. A cohort study of cystic fibrosis and malignancy.

Br J Cancer. 1993 Nov;68(5):1025-8.

136. Hruban RH, Canto MI, Goggins M, et al. Update on familial pancreatic cancer. Adv Surg.

2010;44:293-311.

137. MacDermott RP, Kramer P. Adenocarcinoma of the pancreas in four siblings. Gastroenterology.

1973 Jul;65(1):137-9.

138. Friedman JM, Fialkow PJ. Carcinoma of the pancreas in four brothers. Birth Defects Orig Artic

Ser. 1976;12(1):145-50.

139. Danes BS, Lynch HT. A familial aggregation of pancreatic cancer. An in vitro study. JAMA.

1982 May 28;247(20):2798-802.

140. Dat NM, Sontag SJ. Pancreatic carcinoma in brothers. Ann Intern Med. 1982 Aug;97(2):282.

141. Grajower MM. Familial pancreatic cancer. Ann Intern Med. 1983 Jan;98(1):111.

142. Ehrenthal D, Haeger L, Griffin T, et al. Familial pancreatic adenocarcinoma in three generations.

A case report and a review of the literature. Cancer. 1987 May 1;59(9):1661-4.

143. Lynch HT, Fitzsimmons ML, Smyrk TC, et al. Familial pancreatic cancer: clinicopathologic

study of 18 nuclear families. Am J Gastroenterol. 1990 Jan;85(1):54-60.

144. Ghadirian P, Boyle P, Simard A, et al. Reported family aggregation of pancreatic cancer within a

population-based case-control study in the Francophone community in Montreal, Canada. Int J

Pancreatol. 1991 Nov-Dec;10(3-4):183-96.

145. Fernandez E, La Vecchia C, D'Avanzo B, et al. Family history and the risk of liver, gallbladder,

and pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 1994 Apr-May;3(3):209-12.

146. Silverman DT, Schiffman M, Everhart J, et al. Diabetes mellitus, other medical conditions and

familial history of cancer as risk factors for pancreatic cancer. Br J Cancer. 1999 Aug;80(11):1830-7.

147. Schenk M, Schwartz AG, O'Neal E, et al. Familial risk of pancreatic cancer. J Natl Cancer Inst.

2001 Apr 18;93(8):640-4.

148. Ghadirian P, Liu G, Gallinger S, et al. Risk of pancreatic cancer among individuals with a family

history of cancer of the pancreas. Int J Cancer. 2002 Feb 20;97(6):807-10.

149. Inoue M, Tajima K, Takezaki T, et al. Epidemiology of pancreatic cancer in Japan: a nested case-

control study from the Hospital-based Epidemiologic Research Program at Aichi Cancer Center

(HERPACC). Int J Epidemiol. 2003 Apr;32(2):257-62.

150. Rulyak SJ, Lowenfels AB, Maisonneuve P, et al. Risk factors for the development of pancreatic

cancer in familial pancreatic cancer kindreds. Gastroenterology. 2003 May;124(5):1292-9.

151. Cote ML, Schenk M, Schwartz AG, et al. Risk of other cancers in individuals with a family

history of pancreas cancer. J Gastrointest Cancer. 2007;38(2-4):119-26.

Page 156: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

143

152. Hassan MM, Bondy ML, Wolff RA, et al. Risk factors for pancreatic cancer: case-control study.

Am J Gastroenterol. 2007 Dec;102(12):2696-707.

153. Jacobs EJ, Chanock SJ, Fuchs CS, et al. Family history of cancer and risk of pancreatic cancer: a

pooled analysis from the Pancreatic Cancer Cohort Consortium (PanScan). Int J Cancer. 2010 Sep

1;127(6):1421-8.

154. Matsubayashi H, Maeda A, Kanemoto H, et al. Risk factors of familial pancreatic cancer in

Japan: current smoking and recent onset of diabetes. Pancreas. 2011 Aug;40(6):974-8.

155. Coughlin SS, Calle EE, Patel AV, et al. Predictors of pancreatic cancer mortality among a large

cohort of United States adults. Cancer Causes Control. 2000 Dec;11(10):915-23.

156. Tersmette AC, Petersen GM, Offerhaus GJ, et al. Increased risk of incident pancreatic cancer

among first-degree relatives of patients with familial pancreatic cancer. Clin Cancer Res. 2001

Mar;7(3):738-44.

157. Hemminki K, Li X. Familial and second primary pancreatic cancers: a nationwide epidemiologic

study from Sweden. Int J Cancer. 2003 Feb 10;103(4):525-30.

158. Klein AP, Brune KA, Petersen GM, et al. Prospective risk of pancreatic cancer in familial

pancreatic cancer kindreds. Cancer Res. 2004 Apr 1;64(7):2634-8.

159. Jacobs EJ, Rodriguez C, Newton CC, et al. Family history of various cancers and pancreatic

cancer mortality in a large cohort. Cancer Causes Control. 2009 Oct;20(8):1261-9.

160. Brune KA, Lau B, Palmisano E, et al. Importance of age of onset in pancreatic cancer kindreds. J

Natl Cancer Inst. 2010 Jan 20;102(2):119-26.

161. Klein AP, Beaty TH, Bailey-Wilson JE, et al. Evidence for a major gene influencing risk of

pancreatic cancer. Genet Epidemiol. 2002 Aug;23(2):133-49.

162. Lynch HT, Fusaro L, Lynch JF. Familial pancreatic cancer: a family study. Pancreas.

1992;7(5):511-5.

163. Bartsch DK, Kress R, Sina-Frey M, et al. Prevalence of familial pancreatic cancer in Germany.

Int J Cancer. 2004 Jul 20;110(6):902-6.

164. James TA, Sheldon DG, Rajput A, et al. Risk factors associated with earlier age of onset in

familial pancreatic carcinoma. Cancer. 2004 Dec 15;101(12):2722-6.

165. Petersen GM, de Andrade M, Goggins M, et al. Pancreatic cancer genetic epidemiology

consortium. Cancer Epidemiol Biomarkers Prev. 2006 Apr;15(4):704-10.

166. McFaul CD, Greenhalf W, Earl J, et al. Anticipation in familial pancreatic cancer. Gut. 2006

Feb;55(2):252-8.

167. Rieder H, Sina-Frey M, Ziegler A, et al. German national case collection of familial pancreatic

cancer - clinical-genetic analysis of the first 21 families. Onkologie. 2002 Jun;25(3):262-6.

Page 157: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

144

168. Rulyak SJ, Lowenfels AB, Maisonneuve P, et al. Risk factors for the development of pancreatic

cancer in familial pancreatic cancer kindreds. Gastroenterology. 2003 May;124(5):1292-9.

169. Schneider R, Slater EP, Sina M, et al. German national case collection for familial pancreatic

cancer (FaPaCa): ten years experience. Fam Cancer. 2011 Jun;10(2):323-30.

170. Olson SH, Chou JF, Ludwig E, et al. Allergies, obesity, other risk factors and survival from

pancreatic cancer. Int J Cancer. 2010 Nov 15;127(10):2412-9.

171. Barton JG, Schnelldorfer T, Lohse CM, et al. Patterns of pancreatic resection differ between

patients with familial and sporadic pancreatic cancer. J Gastrointest Surg. 2011 May;15(5):836-42.

172. Ji J, Forsti A, Sundquist J, et al. Survival in familial pancreatic cancer. Pancreatology.

2008;8(3):252-6.

173. Yeo TP, Hruban RH, Brody J, et al. Assessment of "gene-environment" interaction in cases of

familial and sporadic pancreatic cancer. J Gastrointest Surg. 2009 Aug;13(8):1487-94.

174. Fogelman DR, Wolff RA, Kopetz S, et al. Evidence for the efficacy of Iniparib, a PARP-1

inhibitor, in BRCA2-associated pancreatic cancer. Anticancer Res. 2011 Apr;31(4):1417-20.

175. Villarroel MC, Rajeshkumar NV, Garrido-Laguna I, et al. Personalizing cancer treatment in the

age of global genomic analyses: PALB2 gene mutations and the response to DNA damaging agents in

pancreatic cancer. Mol Cancer Ther. 2011 Jan;10(1):3-8.

176. James E, Waldron-Lynch MG, Saif MW. Prolonged survival in a patient with BRCA2 associated

metastatic pancreatic cancer after exposure to camptothecin: a case report and review of literature.

Anticancer Drugs. 2009 Aug;20(7):634-8.

177. Sonnenblick A, Kadouri L, Appelbaum L, et al. Complete remission, in BRCA2 mutation carrier

with metastatic pancreatic adenocarcinoma, treated with cisplatin based therapy. Cancer Biol Ther.

2011 Aug 1;12(3):165-8.

178. Lowery MA, Kelsen DP, Stadler ZK, et al. An emerging entity: pancreatic adenocarcinoma

associated with a known BRCA mutation: clinical descriptors, treatment implications, and future

directions. Oncologist. 2011;16(10):1397-402.

179. Shi C, Klein AP, Goggins M, et al. Increased Prevalence of Precursor Lesions in Familial

Pancreatic Cancer Patients. Clin Cancer Res. 2009 Dec 15;15(24):7737-7743.

180. Brune K, Abe T, Canto M, et al. Multifocal neoplastic precursor lesions associated with lobular

atrophy of the pancreas in patients having a strong family history of pancreatic cancer. Am J Surg

Pathol. 2006 Sep;30(9):1067-76.

181. Abe T, Fukushima N, Brune K, et al. Genome-wide allelotypes of familial pancreatic

adenocarcinomas and familial and sporadic intraductal papillary mucinous neoplasms. Clin Cancer

Res. 2007 Oct 15;13(20):6019-25.

Page 158: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

145

182. Iacobuzio-Donahue CA, van der Heijden MS, Baumgartner MR, et al. Large-scale allelotype of

pancreaticobiliary carcinoma provides quantitative estimates of genome-wide allelic loss. Cancer Res.

2004 Feb 1;64(3):871-5.

183. Calhoun ES, Hucl T, Gallmeier E, et al. Identifying allelic loss and homozygous deletions in

pancreatic cancer without matched normals using high-density single-nucleotide polymorphism

arrays. Cancer Res. 2006 Aug 15;66(16):7920-8.

184. Brune K, Hong SM, Li A, et al. Genetic and epigenetic alterations of familial pancreatic cancers.

Cancer Epidemiol Biomarkers Prev. 2008 Dec;17(12):3536-42.

185. Bodmer WF, Bailey CJ, Bodmer J, et al. Localization of the gene for familial adenomatous

polyposis on chromosome 5. Nature. 1987 Aug 13-19;328(6131):614-6.

186. Hall JM, Lee MK, Newman B, et al. Linkage of early-onset familial breast cancer to

chromosome 17q21. Science. 1990 Dec 21;250(4988):1684-9.

187. Eberle MA, Pfützer R, Pogue-Geile KL, et al. A new susceptibility locus for autosomal dominant

pancreatic cancer maps to chromosome 4q32-34. Am J Hum Genet. 2002 Apr;70(4):1044-8.

188. Earl J, Yan L, Vitone LJ, et al. Evaluation of the 4q32-34 locus in European familial pancreatic

cancer. Cancer Epidemiol Biomarkers Prev. 2006 Oct;15(10):1948-55.

189. Klein AP, de Andrade M, Hruban RH, et al. Linkage analysis of chromosome 4 in families with

familial pancreatic cancer. Cancer Biol Ther. 2007 Mar;6(3):320-3.

190. Pogue-Geile KL, Chen R, Bronner MP, et al. Palladin mutation causes familial pancreatic cancer

and suggests a new cancer mechanism. PLoS Med. 2006 Dec;3(12):e516.

191. Salaria SN, Illei P, Sharma R, et al. Palladin is overexpressed in the non-neoplastic stroma of

infiltrating ductal adenocarcinomas of the pancreas, but is only rarely overexpressed in neoplastic

cells. Cancer Biol Ther. 2007 Mar;6(3):324-8.

192. Zogopoulos G, Rothenmund H, Eppel A, et al. The P239S palladin variant does not account for a

significant fraction of hereditary or early onset pancreas cancer. Hum Genet. 2007 Jun;121(5):635-7.

193. Slater E, Amrillaeva V, Fendrich V, et al. Palladin mutation causes familial pancreatic cancer:

absence in European families. PLoS Med. 2007 Apr;4(4):e164.

194. Klein AP, Borges M, Griffith M, et al. Absence of deleterious palladin mutations in patients with

familial pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2009 Apr;18(4):1328-30.

195. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev

Genet. 2011 May;12(5):363-76.

196. Morrow EM. Genomic copy number variation in disorders of cognitive development. J Am Acad

Child Adolesc Psychiatry. 2010 Nov;49(11):1091-104..

197. Sebat J, Lakshmi B, Troge J, et al. Large-scale copy number polymorphism in the human

genome. Science. 2004;305:525-528.

Page 159: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

146

198. Iafrate AJ, Feuk L, Rivera MN, et al. Detection of large-scale variation in the human genome.

Nat Genet. 2004;36:949-51

199. Sharp AJ, Locke DP, McGrath SD, et al. Segmental duplications and copy-number variation in

the human genome. Am J Hum Genet. 2005;77:78-88.

200. Tuzun E, Sharp AJ, Bailey JA, et al. Fine-scale structural variation of the human genome. Nat

Genet. 2005;37:727-32.

201. Conrad DF, Andrews TD, Carter NP, et al. A high-resolution survey of deletion polymorphism

in the human genome. Nat Genet. 2006;38:75-81.

202. McCarroll SA, Hadnott TN, Perry GH, et al. Common deletion polymorphisms in the human

genome. Nat Genet. 2006;38:86-92.

203. Hinds DA, Kloek AP, Jen M, et al. Common deletions and SNPs are in linkage disequilibrium in

the human genome. Nat Genet. 2006;38:82-5.

204. Locke DP, Sharp AJ, McCarroll SA, et al. Linkage disequilibrium and heritability of copy-

number polymorphisms within duplicated regions of the human genome. Am J Hum Genet.

2006;79:275-90.

205. Mills RE, Luttig CT, Larkins CE, et al. An initial map of insertion and deletion (INDEL)

variation in the human genome. Genome Res. 2006;16:1182-90.

206. Redon R, Ishikawa S, Fitch KR, et al. Global variation in copy number in the human genome.

Nature. 2006;444:444-54.

207. Simon-Sanchez J, Scholz S, Fung HC, et al. Genome-wide SNP assay reveals structural genomic

variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum Mol

Genet. 2007;16:1-14.

208. Wong KK, deLeeuw RJ, Dosanjh NS, et al. A comprehensive analysis of common copy-number

variations in the human genome. Am J Hum Genet. 2007;80:91-104.

209. Levy S, Sutton G, Ng PC, et al. The diploid genome sequence of an individual human. PLoS

Biol. 2007;5:e254.

210. Korbel JO, Urban AE, Affourtit JP, et al. Paired-end mapping reveals extensive structural

variation in the human genome. Science. 2007;318:420-6.

211. Pinto D, Marshall C, Feuk L, et al. Copy-number variation in control population cohorts. Hum

Mol Genet. 2007;16 Spec No. 2:R168-73.

212. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden Markov model designed for

high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome

Res. 2007;17:1665-74.

213. Zogopoulos G, Ha KC, Naqib F, et al. Germ-line DNA copy number variation frequencies in a

large North American population. Hum Genet. 2007;122:345-53.

Page 160: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

147

214. deSmith AJ, Tsalenko A, Sampas N, et al. Array CGH analysis of copy number variation

identifies 1284 new genes variant in healthy white males: implications for association studies of

complex diseases. Hum Mol Genet. 2007;16:2783-94.

215. Jakobsson M, Scholz SW, Scheet P, et al. Genotype, haplotype and copy-number variation in

worldwide human populations. Nature. 2008;451:998-1003.

216. Perry GH, Ben-Dor A, Tsalenko A, et al. The fine-scale and complex architecture of human

copy-number variation. Am J Hum Genet. 2008;82:685-95.

217. Takahashi N, Tsuyama N, Sasaki K, et al. Segmental copy-number variation observed in

Japanese by array-CGH. Ann Hum Genet. 2008;72:193-204.

218. Wheeler DA, Srinivasan M, Egholm M, et al. The complete genome of an individual by

massively parallel DNA sequencing. Nature. 2008;452:872-6.

219. McCarroll SA, Kuruvilla FG, Korn JM, et al. Integrated detection and population-genetic

analysis of SNPs and copy number variation. Nat Genet. 2008 Oct;40(10):1166-74.

220. Cooper GM, Zerr T, Kidd JM, et al. Systematic assessment of copy number variant detection via

genome-wide SNP genotyping. Nat Genet. 2008 Oct;40(10):1199-203.

221. Kidd JM, Cooper GM, Donahue WF, et al. Mapping and sequencing of structural variation from

eight human genomes. Nature. 2008;453:56-64.

222. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human genome

sequencing using reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9.

223. Wang J, Wang W, Li R, et al. The diploid genome sequence of an Asian individual. Nature. 2008

Nov 6;456(7218):60-5.

224. Gusev A, Lowe JK, Stoffel M, et al. Whole population, genome-wide mapping of hidden

relatedness. Genome Res. 2009 Feb;19(2):318-26.

225. Itsara A, Cooper GM, Baker C, et al. Population analysis of large copy number variants and

hotspots of human genetic disease. Am J Hum Genet. 2009 Feb;84(2):148-61.

226. Shaikh TH, Gai X, Perin JC, et al. High-resolution mapping and analysis of copy number

variations in the human genome: a data resource for clinical and research applications. Genome Res.

2009 Sep;19(9):1682-90.

227. Kim JI, Ju YS, Park H, et al. A highly annotated whole-genome sequence of a Korean individual.

Nature. 2009 Aug 20;460(7258):1011-5.

228. Ahn SM, Kim TH, Lee S, et al. The first Korean genome sequence and analysis: full genome

sequencing for a socio-ethnic group. Genome Res. 2009 Sep;19(9):1622-9.

229. Matsuzaki H, Wang PH, Hu J, et al. High resolution discovery and confirmation of copy number

variants in 90 Yoruba Nigerians. Genome Biol. 2009;10(11):R125.

Page 161: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

148

230. McKernan KJ, Peckham HE, Costa GL, et al. Sequence and structural variation in a human

genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.

Genome Res. 2009 Sep;19(9):1527-41.

231. McElroy JP, Nelson MR, Caillier SJ, et al. Copy number variation in African Americans. BMC

Genet. 2009 Mar 24;10:15.

232. Conrad DF, Pinto D, Redon R, et al. Origins and functional impact of copy number variation in

the human genome. Nature. 2010 Apr 1;464(7289):704-12.

233. Alkan C, Kidd JM, Marques-Bonet T, et al. Personalized copy number and segmental duplication

maps using next-generation sequencing. Nat Genet. 2009 Oct;41(10):1061-7.

234. Lin CH, Lin YC, Wu JY, et al. A genome-wide survey of copy number variations in Han Chinese

residing in Taiwan. Genomics. 2009 Oct;94(4):241-6.

235. Li J, Yang T, Wang L, et al. Whole genome distribution and ethnic differentiation of copy

number variation in Caucasian and Asian populations. PLoS One. 2009 Nov 23;4(11):e7958.

236. International HapMap 3 Consortium, Altshuler DM, Gibbs RA, et al. Integrating common and

rare genetic variation in diverse human populations. Nature. 2010 Sep 2;467(7311):52-8.

237. Ju YS, Hong D, Kim S, et al. Reference-unbiased copy number variant analysis using CGH

microarrays. Nucleic Acids Res. 2010 Nov;38(20):e190.

238. Pang AW, MacDonald JR, Pinto D, et al. Towards a comprehensive structural variation map of

an individual human genome. Genome Biol. 2010;11(5):R52.

239. Park H, Kim JI, Ju YS, et al. Discovery of common Asian copy number variants using integrated

high-resolution array CGH and massively parallel DNA sequencing. Nat Genet. 2010 May;42(5):400-

5.

240. Teague B, Waterman MS, Goldstein S, et al. High-resolution human genome structure by single-

molecule analysis. Proc Natl Acad Sci U S A. 2010 Jun 15;107(24):10848-53.

241. Kidd JM, Sampas N, Antonacci F, et al. Characterization of missing human genome sequences

and copy-number polymorphic insertions. Nat Methods. 2010 May;7(5):365-71.

242. Kidd JM, Graves T, Newman TL, et al. A human genome structural variation sequencing

resource reveals insights into mutational mechanisms. Cell. 2010 Nov 24;143(5):837-47.

243. Schuster SC, Miller W, Ratan A, et al. Complete Khoisan and Bantu genomes from southern

Africa. Nature. 2010 Feb 18;463(7283):943-7.

244. Yim SH, Kim TM, Hu HJ, et al. Copy number variations in East-Asian population and their

evolutionary and functional implications. Hum Mol Genet. 2010 Mar 15;19(6):1001-8.

245. Gayán J, Galan JJ, González-Pérez A, et al. Genetic structure of the Spanish population. BMC

Genomics. 2010 May 25;11:326.

Page 162: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

149

246. 1000 Genomes Project Consortium. A map of human genome variation from population-scale

sequencing. Nature. 2010 Oct 28;467(7319):1061-73.

247. Mills RE, Walter K, Stewart C, et al. Mapping copy number variation by population-scale

genome sequencing. Nature. 2011 Feb 3;470(7332):59-65.

248. Chen W, Hayward C, Wright AF, et al. Copy number variation across European populations.

PLoS One. 2011;6(8):e23087.

249. Moon S, Kim YJ, Hong CB, et al. Data-driven approach to detect common copy-number

variations and frequency profiles in a population-based Korean cohort. Eur J Hum Genet. 2011

Nov;19(11):1167-72.

250. Helen V. Firth, Shola M. et al. DECIPHER: Database of Chromosomal Imbalance and

Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 2009;84(4):524-533.

251. Feenstra I, Fang J, Koolen DA, et al. European Cytogeneticists Association Register of

Unbalanced Chromosome Aberrations (ECARUCA); an online database for rare chromosome

abnormalities. Eur J Med Genet. 2006 Jul-Aug;49(4):279-91.

252. Futreal PA, Coin L, Marshall M, et al. A census of human cancer genes. Nat Rev Cancer. 2004

Mar;4(3):177-83.

253. Cutts RJ, Gadaleta E, Hahn SA, et al. The Pancreatic Expression database: 2011 update. Nucleic

Acids Res. 2011 Jan;39(Database issue):D1023-8.

254. Malcolm S. Microdeletion and microduplication syndromes. Prenat Diagn. 1996

Dec;16(13):1213-9.

255. Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural

variation with next-generation sequencing. Nature Methods. 2009;6:S13–S20.

256. Riley MC, Kirkup BC Jr, Johnson JD, et al. Rapid whole genome optical mapping of

Plasmodium falciparum. Malar J. 2011 Aug 26;10:252.

257. Kim Y, Kim KS, Kounovsky KL, et al. Nanochannel confinement: DNA stretch approaching full

contour length. Lab Chip. 2011 May 21;11(10):1721-9.

258. Xu MY, Aragon AD, Mascarenas MR, et al. Dual primer emulsion PCR for next- generation

DNA sequencing. Biotechniques. 2010 May;48(5):409-12.

259. Winchester L, Yau C, Ragoussis J. Comparing CNV detection methods for SNP arrays. Brief

Funct Genomic Proteomic. 2009 Sep;8(5):353-66.

260. Gautam P, Jha P, Kumar D, et al. Spectrum of large copy number variations in 26 diverse Indian

populations: potential involvement in phenotypic diversity. Hum Genet. 2012 Jan;131(1):131-43.

261. Scherer SW, Lee C, Birney E, et al. Challenges and standards in integrating surveys of structural

variation. Nat Genet. 2007 Jul;39(7 Suppl):S7-15.

Page 163: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

150

262. Stankiewicz P, Pursley AN, Cheung SW. Challenges in clinical interpretation of

microduplications detected by array CGH analysis. Am J Med Genet A. 2010 May;152A(5):1089-

100.

263. Hastings PJ, Lupski JR, Rosenberg SM, et al. Mechanisms of change in gene copy number. Nat

Rev Genet. 2009 Aug;10(8):551-64.

264. Lee C, Scherer SW. The clinical context of copy number variation in the human genome. Expert

Rev Mol Med. 2010 Mar 9;12:e8.

265. Schrider DR, Hahn MW. Gene copy-number polymorphism in nature. Proc Biol Sci. 2010 Nov

7;277(1698):3213-21.

266. Nguyen DQ, Webber C, Hehir-Kwa J, et al. Reduced purifying selection prevails over positive

selection in human copy number variant evolution. Genome Res. 2008 Nov;18(11):1711-23.

267. Perry GH, Dominy NJ, Claw KG, et al. Diet and the evolution of human amylase gene copy

number variation. Nat Genet. 2007 Oct;39(10):1256-60.

268. Yim SH, Kim TM, Hu HJ, et al. Copy number variations in East-Asian population and their

evolutionary and functional implications. Hum Mol Genet. 2010 Mar 15;19(6):1001-8.

269. Perry GH, Yang F, Marques-Bonet T, et al. Copy number variation and evolution in humans and

chimpanzees. Genome Res. 2008 Nov;18(11):1698-710.

270. Stranger BE, Forrest MS, Dunning M, et al. Relative impact of nucleotide and copy number

variation on gene expression phenotypes. Science. 2007 Feb 9;315(5813):848-53.

271. Schlattl A, Anders S, Waszak SM, et al. Relating CNVs to transcriptome data at fine resolution:

assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 2011

Dec;21(12):2004-13.

272. Henrichsen CN, Vinckenbosch N, Zöllner S, et al. Segmental copy number variation shapes

tissue transcriptomes.Nat Genet. 2009 Apr;41(4):424-9.

273. Guryev V, Saar K, Adamovic T, et al. Distribution and functional impact of DNA copy number

variation in the rat. Nat Genet. 2008 May;40(5):538-45.

274. Zhou J, Lemos B, Dopman EB, et al. Copy-number variation: the balance between gene dosage

and expression in Drosophila melanogaster. Genome Biol Evol. 2011;3:1014-24.

275. Nuytemans K, Meeus B, Crosiers D, et al. Relative contribution of simple mutations vs. copy

number variations in five Parkinson disease genes in the Belgian population. Hum Mutat. 2009

Jul;30(7):1054-61.

276. Walters RG, Jacquemont S, Valsesia A, et al. A new highly penetrant form of obesity due to

deletions on chromosome 16p11.2. Nature. 2010 Feb 4;463(7281):671-5.

277. Prescott NJ, Dominy KM, Kubo M, et al. Independent and population-specific association of risk

variants at the IRGM locus with Crohn's disease. Hum Mol Genet. 2010 May 1;19(9):1828-39.

Page 164: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

151

278. Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, et al. Genome-wide

association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.

Nature. 2010 Apr 1;464(7289):713-20.

279. de Cid R, Riveira-Munoz E, Zeeuwen PL, et al. Deletion of the late cornified envelope LCE3B

and LCE3C genes as a susceptibility factor for psoriasis. Nat Genet. 2009 Feb;41(2):211-5.

280. Morris DL, Roberts AL, Witherden AS, et al. Evidence for both copy number and allelic

(NA1/NA2) risk at the FCGR3B locus in systemic lupus erythematosus. Eur J Hum Genet. 2010

Sep;18(9):1027-31.

281. Gonzalez E, Kulkarni H, Bolivar H, et al. The influence of CCL3L1 gene-containing segmental

duplications on HIV-1/AIDS susceptibility. Science. 2005 Mar 4;307(5714):1434-40.

282. O'Donovan MC, Kirov G, Owen MJ. Phenotypic variations on the theme of CNVs. Nat Genet.

2008 Dec;40(12):1392-3.

283. Itsara A, Wu H, Smith JD, et al. De novo rates and selection of large copy number variation.

Genome Res. 2010 Nov;20(11):1469-81.

284. Piotrowski A, Bruder CE, Andersson R, et al. Somatic mosaicism for copy number variation in

differentiated human tissues. Hum Mutat. 2008 Sep;29(9):1118-24.

285. Rodríguez-Santiago B, Malats N, Rothman N, et al. Mosaic uniparental disomies and

aneuploidies as large structural variants of the human genome. Am J Hum Genet. 2010 Jul

9;87(1):129-38.

286. Bruder CE, Piotrowski A, Gijsbers AA, et al. Phenotypically concordant and discordant

monozygotic twins display different DNA copy-number-variation profiles. Am J Hum Genet. 2008

Mar;82(3):763-71.

287. Sasaki H, Emi M, Iijima H, et al. Copy number loss of (src homology 2 domain containing)-

transforming protein 2 (SHC2) gene: discordant loss in monozygotic twins and frequent loss in

patients with multiple system atrophy. Mol Brain. 2011 Jun 10;4:24.

288. Pamphlett R, Morahan JM. Copy number imbalances in blood and hair in monozygotic twins

discordant for amyotrophic lateral sclerosis. J Clin Neurosci. 2011 Sep;18(9):1231-4.

289. Thompson SL, Bakhoum SF, Compton DA. Mechanisms of chromosomal instability. Curr Biol.

2010 Mar 23;20(6):R285-95.

290. Thompson SL, Compton DA. Chromosomes and cancer cells. Chromosome Res. 2011

Apr;19(3):433-44.

291. Meza-Zepeda LA, Kresse SH, Barragan-Polania AH, et al. Array comparative genomic

hybridization reveals distinct DNA copy number differences between gastrointestinal stromal tumors

and leiomyosarcomas. Cancer Res. 2006 Sep 15;66(18):8984-93.

Page 165: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

152

292. Vollebergh MA, Lips EH, Nederlof PM, et al. An aCGH classifier derived from BRCA1-mutated

breast cancer and benefit of high-dose platinum-based chemotherapy in HER2-negative breast cancer

patients. Ann Oncol. 2011 Jul;22(7):1561-70.

293. Johansson B, Bardi G, Heim S, et al. Nonrandom chromosomal rearrangements in pancreatic

carcinomas. Cancer. 1992 Apr 1;69(7):1674-81.

294. Brat DJ, Hahn SA, Griffin CA, et al. The structural basis of molecular genetic deletions. An

integration of classical cytogenetic and molecular analyses in pancreatic adenocarcinoma. Am J

Pathol. 1997 Feb;150(2):383-91.

295. Heidenblad M, Schoenmakers EF, Jonson T, et al. Genome-wide array-based comparative

genomic hybridization reveals multiple amplification targets and novel homozygous deletions in

pancreatic carcinoma cell lines. Cancer Res. 2004 May 1;64(9):3052-9.

296. Aguirre AJ, Brennan C, Bailey G, et al. High-resolution characterization of the pancreatic

adenocarcinoma genome. Proc Natl Acad Sci U S A. 2004 Jun 15;101(24):9067-72.

297. Holzmann K, Kohlhammer H, Schwaenen C, et al. Genomic DNA-chip hybridization reveals a

higher incidence of genomic amplifications in pancreatic cancer than conventional comparative

genomic hybridization and leads to the identification of novel candidate genes. Cancer Res. 2004 Jul

1;64(13):4428-33.

298. Mahlamäki EH, Kauraniemi P, Monni O, et al. High-resolution genomic and expression profiling

reveals 105 putative amplification target genes in pancreatic cancer. Neoplasia. 2004 Sep-

Oct;6(5):432-9.

299. Bashyam MD, Bair R, Kim YH, et al. Array-based comparative genomic hybridization identifies

localized DNA amplifications and homozygous deletions in pancreatic cancer. Neoplasia. 2005

Jun;7(6):556-62.

300. Nowak NJ, Gaile D, Conroy JM, et al. Genome-wide aberrations in pancreatic adenocarcinoma.

Cancer Genet Cytogenet. 2005 Aug;161(1):36-50.

301. Loukopoulos P, Shibata T, Katoh H, et al. Genome-wide array-based comparative genomic

hybridization analysis of pancreatic adenocarcinoma: identification of genetic indicators that predict

patient outcome. Cancer Sci. 2007 Mar;98(3):392-400.

302. Harada T, Baril P, Gangeswaran R, et al. Identification of genetic alterations in pancreatic cancer

by the combined use of tissue microdissection and array-based comparative genomic hybridisation.

Br J Cancer. 2007 Jan 29;96(2):373-82.

303. Suzuki A, Shibata T, Shimada Y, et al. Identification of SMURF1 as a possible target for 7q21.3-

22.1 amplification detected in a pancreatic cancer cell line by in-house array-based comparative

genomic hybridization. Cancer Sci. 2008 May;99(5):986-94.

Page 166: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

153

304. Kwei KA, Bashyam MD, Kao J, et al. Genomic profiling identifies GATA6 as a candidate

oncogene amplified in pancreatobiliary cancer. PLoS Genet. 2008 May 23;4(5):e1000081.

305. Harada T, Chelala C, Crnogorac-Jurcevic T, et al. Genome-wide analysis of pancreatic cancer

using microarray-based techniques. Pancreatology. 2009;9(1-2):13-24.

306. Birnbaum DJ, Adélaïde J, Mamessier E, et al. Genome profiling of pancreatic adenocarcinoma.

Genes Chromosomes Cancer. 2011 Jun;50(6):456-65.

307. Calhoun ES, Hucl T, Gallmeier E, et al. Identifying allelic loss and homozygous deletions in

pancreatic cancer without matched normals using high-density single-nucleotide polymorphism

arrays. Cancer Res. 2006 Aug 15;66(16):7920-8.

308. Harada T, Chelala C, Bhakta V, et al. Genome-wide DNA copy number analysis in pancreatic

cancer using high-density single nucleotide polymorphism arrays. Oncogene. 2008 Mar

20;27(13):1951-60.

309. Lin LJ, Asaoka Y, Tada M, et al. Integrated analysis of copy number alterations and loss of

heterozygosity in human pancreatic cancer using a high-resolution, single nucleotide polymorphism

array. Oncology. 2008;75(1-2):102-12.

310. Fu B, Luo M, Lakkur S, et al. Frequent genomic copy number gain and overexpression of GATA-

6 in pancreatic carcinoma. Cancer Biol Ther. 2008 Oct;7(10):1593-601.

311. Michils G, Tejpar S, Thoelen R, et al. Large deletions of the APC gene in 15% of mutation-

negative patients with classical polyposis (FAP): a Belgian study. Hum Mutat. 2005 Feb;25(2):125-

34.

312. Richards FM, Crossey PA, Phipps ME, et al. Detailed mapping of germline deletions of the von

Hippel-Lindau disease tumour suppressor gene. Hum Mol Genet. 1994 Apr;3(4):595-8.

313. Oliveira C, Senz J, Kaurah P, et al. Germline CDH1 deletions in hereditary diffuse gastric cancer

families. Hum Mol Genet. 2009 May 1;18(9):1545-55.

314. Palanca Suela S, Esteban Cardeñosa E, Barragán González E, et al. Identification of a novel

BRCA1 large genomic rearrangement in a Spanish breast/ovarian cancer family. Breast Cancer Res

Treat. 2008 Nov;112(1):63-7.

315. Vasickova P, Machackova E, Lukesova M, et al. High occurrence of BRCA1 intragenic

rearrangements in hereditary breast and ovarian cancer syndrome in the Czech Republic. BMC Med

Genet. 2007 Jun 11;8:32.

316. Buffone A, Capalbo C, Ricevuto E, et al. Prevalence of BRCA1 and BRCA2 genomic

rearrangements in a cohort of consecutive Italian breast and/or ovarian cancer families. Breast Cancer

Res Treat. 2007 Dec;106(2):289-96.

Page 167: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

154

317. Smith LD, Tesoriero AA, Ramus SJ, et al. BRCA1 promoter deletions in young women with

breast cancer and a strong family history: a population-based study. Eur J Cancer. 2007

Mar;43(5):823-7.

318. Casilli F, Tournier I, Sinilnikova OM, et al. The contribution of germline rearrangements to the

spectrum of BRCA2 mutations. J Med Genet. 2006 Sep;43(9):e49.

319. Walsh T, Casadei S, Coats KH, et al. Spectrum of mutations in BRCA1, BRCA2, CHEK2, and

TP53 in families at high risk of breast cancer. JAMA. 2006 Mar 22;295(12):1379-88.

320. Gad S, Caux-Moncoutier V, Pagès-Berhouet S, et al. Significant contribution of large BRCA1

gene rearrangements in 120 French breast and ovarian cancer families. Oncogene. 2002 Oct

3;21(44):6841-7.

321. Taylor CF, Charlton RS, Burn J, et al. Genomic deletions in MSH2 or MLH1 are a frequent cause

of hereditary non-polyposis colorectal cancer: identification of novel and recurrent deletions by

MLPA. Hum Mutat. 2003 Dec;22(6):428-33.

322. Gylling A, Ridanpää M, Vierimaa O, et al. Large genomic rearrangements and germline

epimutations in Lynch syndrome. Int J Cancer. 2009 May 15;124(10):2333-40.

323. Hearle NC, Rudd MF, Lim W, et al. Exonic STK11 deletions are not a rare cause of Peutz-

Jeghers syndrome. J Med Genet. 2006 Apr;43(4):e15.

324. van Hattem WA, Brosens LA, de Leng WW, et al. Large genomic deletions of SMAD4,

BMPR1A and PTEN in juvenile polyposis. Gut. 2008 May;57(5):623-7.

325. Blanco A, de la Hoya M, Balmaña J, et al. Detection of a large rearrangement in PALB2 in

Spanish breast cancer families with male breast cancer. Breast Cancer Res Treat. 2012

Feb;132(1):307-15.

326. Sabatier R, Adélaïde J, Finetti P, et al. BARD1 homozygous deletion, a possible alternative to

BRCA1 mutation in basal breast cancer. Genes Chromosomes Cancer. 2010 Dec;49(12):1143-51.

327. Ahvenainen T, Lehtonen HJ, Lehtonen R, et al. Mutation screening of fumarate hydratase by

multiplex ligation-dependent probe amplification: detection of exonic deletion in a patient with

leiomyomatosis and renal cell cancer. Cancer Genet Cytogenet. 2008 Jun;183(2):83-8.

328. Chibon F, Primois C, Bressieux JM, et al. Contribution of PTEN large rearrangements in Cowden

disease: a multiplex amplifiable probe hybridisation (MAPH) screening approach. J Med Genet. 2008

Oct;45(10):657-65.

329. Knappskog S, Geisler J, Arnesen T, et al. A novel type of deletion in the CDKN2A gene

identified in a melanoma-prone family. Genes Chromosomes Cancer. 2006 Dec;45(12):1155-63.

330. Wu R, López-Correa C, Rutkowski JL, et al. Germline mutations in NF1 patients with

malignancies. Genes Chromosomes Cancer. 1999 Dec;26(4):376-80.

Page 168: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

155

331. Broeks A, de Klein A, Floore AN, et al. ATM germline mutations in classical ataxia-

telangiectasia patients in the Dutch population. Hum Mutat. 1998;12(5):330-7.

332. Plummer SJ, Santibáñez-Koref M, Kurosaki T, et al. A germline 2.35 kb deletion of p53

genomic DNA creating a specific loss of the oligomerization domain inherited in a Li-Fraumeni

syndrome family. Oncogene. 1994 Nov;9(11):3273-80.

333. Otterson GA, Chen W, Coxon AB, et al. Incomplete penetrance of familial retinoblastoma linked

to germ-line mutations that result in partial loss of RB function. Proc Natl Acad Sci U S A. 1997 Oct

28;94(22):12036-40.

334. Fukuuchi A, Nagamura Y, Yaguchi H, et al. A whole MEN1 gene deletion flanked by Alu

repeats in a family with multiple endocrine neoplasia type 1. Jpn J Clin Oncol. 2006 Nov;36(11):739-

44.

335. Rumilla K, Schowalter KV, Lindor NM, et al. Frequency of deletions of EPCAM (TACSTD1) in

MSH2-associated Lynch syndrome cases. J Mol Diagn. 2011 Jan;13(1):93-9.

336. Kuiper RP, Vissers LE, Venkatachalam R, et al. Recurrence and variability of germline EPCAM

deletions in Lynch syndrome. Hum Mutat. 2011 Apr;32(4):407-14.

337. Calva-Cerqueira D, Dahdaleh FS, Woodfield G, et al. Discovery of the BMPR1A promoter and

germline mutations that cause juvenile polyposis. Hum Mol Genet. 2010 Dec 1;19(23):4654-62.

338. Nørskov MS, Frikke-Schmidt R, Bojesen SE, et al. Copy number variation in glutathione-S-

transferase T1 and M1 predicts incidence and 5-year survival from prostate and bladder cancer, and

incidence of corpus uteri cancer in the general population. Pharmacogenomics J. 2011

Aug;11(4):292-9.

339. Frank B, Bermejo JL, Hemminki K, et al. Copy number variant in the candidate tumor

suppressor gene MTUS1 and familial breast cancer risk. Carcinogenesis. 2007 Jul;28(7):1442-5.

340. Diskin SJ, Hou C, Glessner JT, et al. Copy number variation at 1q21.1 associated with

neuroblastoma. Nature. 2009 Jun 18;459(7249):987-91.

341. Liu W, Sun J, Li G, et al. Association of a germ-line copy number variation at 2p24.3 and risk

for aggressive prostate cancer. Cancer Res. 2009 Mar 15;69(6):2176-9.

342. Jin G, Sun J, Liu W, et al. Genome-wide copy-number variation analysis identifies common

genetic variants at 20p13 associated with aggressiveness of prostate cancer. Carcinogenesis. 2011

Jul;32(7):1057-62.

343. Tse KP, Su WH, Yang ML, et al. A gender-specific association of CNV at 6p21.3 with NPC

susceptibility. Hum Mol Genet. 2011 Jul 15;20(14):2889-96.

344. Huang L, Yu D, Wu C, et al. Copy number variation at 6q13 functions as a long-range regulator

and is associated with pancreatic cancer risk. Carcinogenesis. 2012 Jan;33(1):94-100.

Page 169: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

156

345. Lucito R, Suresh S, Walter K, et al. Copy-number variants in patients with a strong family history

of pancreatic cancer. Cancer Biol Ther. 2007 Oct;6(10):1592-9.

346. Yoshihara K, Tajima A, Adachi S, et al. Germline copy number variations in BRCA1-associated

ovarian cancer patients. Genes Chromosomes Cancer. 2011 Mar;50(3):167-77.

347. Venkatachalam R, Verwiel ET, Kamping EJ, et al. Identification of candidate predisposing copy

number variants in familial and early-onset colorectal cancer patients. Int J Cancer. 2011 Oct

1;129(7):1635-42.

348. Shlien A, Tabori U, Marshall CR, et al. Excessive genomic DNA copy number variation in the

Li-Fraumeni cancer predisposition syndrome. Proc Natl Acad Sci U S A. 2008 Aug

12;105(32):11264-9.

349. Talos F, Moll UM. Role of the p53 family in stabilizing the genome and preventing

polyploidization. Adv Exp Med Biol. 2010;676:73-91.

350. McPherson JD, Marra M, Hillier L, et al. A physical map of the human genome. Nature. 2001

Feb 15;409(6822):934-41.

351. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001 Feb

16;291(5507):1304-51.

352. Sachidanandam R, Weissman D, Schmidt SC, et al. A map of human genome sequence variation

containing 1.42 million single nucleotide polymorphisms. Nature. 2001 Feb 15;409(6822):928-33.

353. Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated high-density

picolitre reactors. Nature. 2005 Sep 15;437(7057):376-80.

354. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010 Jan;11(1):31-

46.

355. Wadman M. James Watson's genome sequenced at high speed. Nature. 2008 Apr

17;452(7189):788.

356. Kitzman JO, Mackenzie AP, Adey A, et al. Haplotype-resolved genome sequencing of a Gujarati

Indian individual. Nat Biotechnol. 2011 Jan;29(1):59-63.

357. Cirulli ET, Singh A, Shianna KV, et al. Screening the human exome: a comparison of whole

genome and whole transcriptome sequencing. Genome Biol. 2010;11(5):R57.

358. Tong P, Prendergast JG, Lohan AJ, et al. Sequencing and analysis of an Irish human genome.

Genome Biol. 2010;11(9):R91.

359. Fujimoto A, Nakagawa H, Hosono N, et al. Whole-genome sequencing and comprehensive

variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet. 2010

Nov;42(11):931-6.

360. Mardis ER. A decade's perspective on DNA sequencing technology. Nature. 2011 Feb

10;470(7333):198-203.

Page 170: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

157

361. Kahn SD. On the future of genomic data. Science. 2011 Feb 11;331(6018):728-9.

362. McPherson JD. Next-generation gap. Nat Methods. 2009 Nov;6(11 Suppl):S2-5.

363. Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet.

2010 Oct 15;19(R2):R227-40.

364. Ashley EA, Butte AJ, Wheeler MT, et al. Clinical assessment incorporating a personal genome.

Lancet. 2010 May 1;375(9725):1525-35.

365. Roach JC, Glusman G, Smit AF, et al. Analysis of genetic inheritance in a family quartet by

whole-genome sequencing. Science. 2010 Apr 30;328(5978):636-9.

366. Lupski JR, Reid JG, Gonzaga-Jauregui C, et al. Whole-genome sequencing in a patient with

Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010 Apr 1;362(13):1181-91.

367. Sobreira NL, Cirulli ET, Avramopoulos D, et al. Whole-genome sequencing of a single proband

together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 2010 Jun

17;6(6):e1000991.

368. Bainbridge MN, Wiszniewski W, Murdock DR, et al. Whole-genome sequencing for optimized

patient management. Sci Transl Med. 2011 Jun 15;3(87):87re3.

369. Dewey FE, Chen R, Cordero SP, et al. Phased whole-genome genetic risk in a family quartet

using a major allele reference sequence. PLoS Genet. 2011 Sep;7(9):e1002280.

370. Baranzini SE, Mudge J, van Velkinburgh JC, et al. Genome, epigenome and RNA sequences of

monozygotic twins discordant for multiple sclerosis. Nature. 2010 Apr 29;464(7293):1351-6.

371. Rios J, Stein E, Shendure J, et al. Identification by whole-genome resequencing of gene defect

responsible for severe hypercholesterolemia. Hum Mol Genet. 2010 Nov 15;19(22):4313-8.

372. Hodges E, Xuan Z, Balija V, et al. Genome-wide in situ exon capture for selective resequencing.

Nat Genet. 2007 Dec;39(12):1522-7.

373. Garber K. Fixing the front end. Nat Biotechnol. 2008 Oct;26(10):1101-4.

374. Pruitt KD, Harrow J, Harte RA, et al. The consensus coding sequence (CCDS) project:

Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009

Jul;19(7):1316-23.

375. Asan, Xu Y, Jiang H, et al. Comprehensive comparison of three commercial human whole-exome

capture platforms. Genome Biol. 2011 Sep 28;12(9):R95.

376. Ng PC, Levy S, Huang J, et al. Genetic variation in an individual human exome. PLoS Genet.

2008 Aug 15;4(8):e1000160.

377. Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing of

12 human exomes. Nature. 2009 Sep 10;461(7261):272-6.

378. Vissers LE, de Ligt J, Gilissen C, J et al. A de novo paradigm for mental retardation. Nat Genet.

2010 Dec;42(12):1109-12.

Page 171: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

158

379. Walsh T, Shahin H, Elkan-Miller T, et al. Whole exome sequencing and homozygosity mapping

identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss

DFNB82. Am J Hum Genet. 2010 Jul 9;87(1):90-4.

380. Lalonde E, Albrecht S, Ha KC, et al. Unexpected allelic heterogeneity and spectrum of mutations

in Fowler syndrome revealed by next-generation exome sequencing. Hum Mutat. 2010

Aug;31(8):918-23.

381. Pierce SB, Walsh T, Chisholm KM, et al. Mutations in the DBP-deficiency protein HSD17B4

cause ovarian dysgenesis, hearing loss, and ataxia of Perrault Syndrome. Am J Hum Genet. 2010 Aug

13;87(2):282-8.

382. Ng SB, Bigham AW, Buckingham KJ, et al. Exome sequencing identifies MLL2 mutations as a

cause of Kabuki syndrome. Nat Genet. 2010 Sep;42(9):790-3.

383. Bilgüvar K, Oztürk AK, Louvi A, et al. Whole-exome sequencing identifies recessive WDR62

mutations in severe brain malformations. Nature. 2010 Sep 9;467(7312):207-10.

384. Gilissen C, Arts HH, Hoischen A, et al. Exome sequencing identifies WDR35 variants involved

in Sensenbrenner syndrome. Am J Hum Genet. 2010 Sep 10;87(3):418-23.

385. Krawitz PM, Schweiger MR, Rödelsperger C, et al. Identity-by-descent filtering of exome

sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome. Nat

Genet. 2010 Oct;42(10):827-9.

386. Anastasio N, Ben-Omran T, Teebi A, et al. Mutations in SCARF2 are responsible for Van Den

Ende-Gupta syndrome. Am J Hum Genet. 2010 Oct 8;87(4):553-9.

387. Johnson JO, Gibbs JR, Van Maldergem L, Houlden H, Singleton AB. Exome sequencing in

Brown-Vialetto-van Laere syndrome. Am J Hum Genet. 2010 Oct 8;87(4):567-9; author reply 569-

70.

388. Sirmaci A, Walsh T, Akay H, et al. MASP1 mutations in patients with facial, umbilical,

coccygeal, and auditory findings of Carnevale, Malpuech, OSA, and Michels syndromes. Am J Hum

Genet. 2010 Nov 12;87(5):679-86.

389. Haack TB, Danhauser K, Haberberger B, et al. Exome sequencing identifies ACAD9 mutations

as a cause of complex I deficiency. Nat Genet. 2010 Dec;42(12):1131-4.

390. Wang JL, Yang X, Xia K, et al. TGM6 identified as a novel causative gene of spinocerebellar

ataxias using exome sequencing. Brain. 2010 Dec;133(Pt 12):3510-8.

391. Musunuru K, Pirruccello JP, Do R, et al. Exome sequencing, ANGPTL3 mutations, and familial

combined hypolipidemia. N Engl J Med. 2010 Dec 2;363(23):2220-7.

392. Johnson JO, Mandrioli J, Benatar M, et al. Exome sequencing reveals VCP mutations as a cause

of familial ALS. Neuron. 2010 Dec 9;68(5):857-64.

Page 172: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

159

393. Bolze A, Byun M, McDonald D, et al. Whole-exome-sequencing-based discovery of human

FADD deficiency. Am J Hum Genet. 2010 Dec 10;87(6):873-81.

394. Liu W, Morito D, Takashima S, et al. Identification of RNF213 as a susceptibility gene for

moyamoya disease and its possible role in vascular development. PLoS One. 2011;6(7):e22542.

395. Züchner S, Dallman J, Wen R, et al. Whole-exome sequencing links a variant in DHDDS to

retinitis pigmentosa. Am J Hum Genet. 2011 Feb 11;88(2):201-6.

396. Glazov EA, Zankl A, Donskoi M, et al. Whole-exome re-sequencing in a family quartet identifies

POP1 mutations as the cause of a novel skeletal dysplasia. PLoS Genet. 2011 Mar;7(3):e1002027.

397. Worthey EA, Mayer AN, Syverson GD, et al. Making a definitive diagnosis: successful clinical

application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet

Med. 2011 Mar;13(3):255-62.

398. Simpson MA, Irving MD, Asilmaz E, et al. Mutations in NOTCH2 cause Hajdu-Cheney

syndrome, a disorder of severe and progressive bone loss. Nat Genet. 2011 Mar 6;43(4):303-5.

399. Becker J, Semler O, Gilissen C, et al. Exome sequencing identifies truncating mutations in human

SERPINF1 in autosomal-recessive osteogenesis imperfecta. Am J Hum Genet. 2011 Mar

11;88(3):362-71.

400. Ostergaard P, Simpson MA, Brice G, et al. Rapid identification of mutations in GJC2 in primary

lymphoedema using whole exome sequencing combined with linkage analysis with delineation of the

phenotype. J Med Genet. 2011 Apr;48(4):251-5.

401. Çalışkan M, Chong JX, Uricchio L, et al. Exome sequencing reveals a novel mutation for

autosomal recessive non-syndromic mental retardation in the TECR gene on chromosome 19p13.

Hum Mol Genet. 2011 Apr 1;20(7):1285-9.

402. Erlich Y, Edvardson S, Hodges E, et al. Exome sequencing and disease-network analysis of a

single family implicate a mutation in KIF1A in hereditary spastic paraparesis. Genome Res. 2011

May;21(5):658-64.

403. Sundaram SK, Huq AM, Sun Z, et al. Exome sequencing of a pedigree with Tourette syndrome or

chronic tic disorder. Ann Neurol. 2011 May;69(5):901-4.

404. Puente XS, Quesada V, Osorio FG, et al. Exome sequencing and functional analysis identifies

BANF1 mutation as the cause of a hereditary progeroid syndrome. Am J Hum Genet. 2011 May

13;88(5):650-6.

405. Vissers LE, Lausch E, Unger S, et al. Chondrodysplasia and abnormal joint development

associated with mutations in IMPAD1, encoding the Golgi-resident nucleotide phosphatase, gPAPP.

Am J Hum Genet. 2011 May 13;88(5):608-15.

Page 173: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

160

406. O'Sullivan J, Bitu CC, Daly SB, et al. Whole-Exome sequencing identifies FAM20A mutations as

a cause of amelogenesis imperfecta and gingival hyperplasia syndrome. Am J Hum Genet. 2011 May

13;88(5):616-20.

407. Götz A, Tyynismaa H, Euro L, et al. Exome sequencing identifies mitochondrial alanyl-tRNA

synthetase mutations in infantile mitochondrial cardiomyopathy. Am J Hum Genet. 2011 May

13;88(5):635-42.

408. Shi Y, Li Y, Zhang D, et al. Exome sequencing identifies ZNF644 mutations in high myopia.

PLoS Genet. 2011 Jun;7(6):e1002084.

409. Klein CJ, Botuyan MV, Wu Y, et al. Mutations in DNMT1 cause hereditary sensory neuropathy

with dementia and hearing loss. Nat Genet. 2011 Jun;43(6):595-600.

410. Barak T, Kwan KY, Louvi A, et al. Recessive LAMC3 mutations cause malformations of

occipital cortical development. Nat Genet. 2011 Jun;43(6):590-4.

411. O'Roak BJ, Deriziotis P, Lee C, et al. Exome sequencing in sporadic autism spectrum disorders

identifies severe de novo mutations. Nat Genet. 2011 Jun;43(6):585-9.

412. Alvarado DM, Buchan JG, Gurnett CA, et al. Exome sequencing identifies an MYH3 mutation in

a family with distal arthrogryposis type 1. J Bone Joint Surg Am. 2011 Jun 1;93(11):1045-50.

413. de Greef JC, Wang J, Balog J, et al. Mutations in ZBTB24 are associated with immunodeficiency,

centromeric instability, and facial anomalies syndrome type 2. Am J Hum Genet. 2011 Jun

10;88(6):796-804.

414. Yamaguchi T, Hosomichi K, Narita A, et al. Exome resequencing combined with linkage analysis

identifies novel PTH1R variants in primary failure of tooth eruption in Japanese. J Bone Miner Res.

2011 Jul;26(7):1655-61.

415. Zhou C, Zang D, Jin Y, et al. Mutation in ribosomal protein L21 underlies hereditary

hypotrichosis simplex. Hum Mutat. 2011 Jul;32(7):710-4.

416. Le Goff C, Mahaut C, Wang LW, et al. Mutations in the TGFβ binding-protein-like domain 5 of

FBN1 are responsible for acromicric and geleophysic dysplasias. Am J Hum Genet. 2011 Jul

15;89(1):7-14.

417. Hanson D, Murray PG, O'Sullivan J, et al. Exome sequencing identifies CCDC8 mutations in 3-

M syndrome, suggesting that CCDC8 contributes in a pathway with CUL7 and OBSL1 to control

human growth. Am J Hum Genet. 2011 Jul 15;89(1):148-53.

418. Vilariño-Güell C, Wider C, Ross OA, et al. VPS35 mutations in Parkinson disease. Am J Hum

Genet. 2011 Jul 15;89(1):162-7.

419. Zimprich A, Benet-Pagès A, Struhal W, et al. A mutation in VPS35, encoding a subunit of the

retromer complex, causes late-onset Parkinson disease. Am J Hum Genet. 2011 Jul 15;89(1):168-75.

Page 174: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

161

420. Sergouniotis PI, Davidson AE, Mackay DS, et al. Recessive mutations in KCNJ13, encoding an

inwardly rectifying potassium channel subunit, cause leber congenital amaurosis. Am J Hum Genet.

2011 Jul 15;89(1):183-90.

421. Albers CA, Cvejic A, Favier R, et al. Exome sequencing identifies NBEAL2 as the causative

gene for gray platelet syndrome. Nat Genet. 2011 Jul 17;43(8):735-7.

422. Sanna-Cherchi S, Burgess KE, Nees SN, et al. Exome sequencing identified MYO1E and NEIL1

as candidate genes for human autosomal recessive steroid-resistant nephrotic syndrome. Kidney Int.

2011 Aug;80(4):389-96.

423. Liu L, Okada S, Kong XF, et al. Gain-of-function human STAT1 mutations impair IL-17

immunity and underlie chronic mucocutaneous candidiasis. J Exp Med. 2011 Aug 1;208(8):1635-48.

424. Yariz KO, Walsh T, Uzak A, et al. Inherited mutation of the luteinizing

hormone/choriogonadotropin receptor (LHCGR) in empty follicle syndrome. Fertil Steril. 2011

Aug;96(2):e125-30.

425. Xu B, Roos JL, Dexheimer P, et al. Exome sequencing supports a de novo mutational paradigm

for schizophrenia. Nat Genet. 2011 Aug 7;43(9):864-8.

426. Sirmaci A, Spiliopoulos M, Brancati F, et al. Mutations in ANKRD11 cause KBG syndrome,

characterized by intellectual disability, skeletal malformations, and macrodontia. Am J Hum Genet.

2011 Aug 12;89(2):289-94.

427. Shaheen R, Faqeih E, Sunker A, et al. Recessive mutations in DOCK6, encoding the guanidine

nucleotide exchange factor DOCK6, lead to abnormal actin cytoskeleton organization and Adams-

Oliver syndrome. Am J Hum Genet. 2011 Aug 12;89(2):328-33.

428. Nosková L, Stránecký V, Hartmannová H, et al. Mutations in DNAJC5, encoding cysteine-string

protein alpha, cause autosomal-dominant adult-onset neuronal ceroid lipofuscinosis. Am J Hum

Genet. 2011 Aug 12;89(2):241-52.

429. Weedon MN, Hastings R, Caswell R, et al. Exome sequencing identifies a DYNC1H1 mutation

in a large pedigree with dominant axonal Charcot-Marie-Tooth disease. Am J Hum Genet. 2011 Aug

12;89(2):308-12.

430. Ozgül RK, Siemiatkowska AM, Yücel D, et al. Exome sequencing and cis-regulatory mapping

identify mutations in MAK, a gene encoding a regulator of ciliary length, as a cause of retinitis

pigmentosa. Am J Hum Genet. 2011 Aug 12;89(2):253-64.

431. Doi H, Yoshida K, Yasuda T, et al. Exome sequencing reveals a homozygous SYT14 mutation in

adult-onset, autosomal-recessive spinocerebellar ataxia with psychomotor retardation. Am J Hum

Genet. 2011 Aug 12;89(2):320-7.

432. Sloan JL, Johnston JJ, Manoli I, et al. Exome sequencing identifies ACSF3 as a cause of

combined malonic and methylmalonic aciduria. Nat Genet. 2011 Aug 14;43(9):883-6.

Page 175: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

162

433. Aldahmesh MA, Khan AO, Mohamed JY, et al. Identification of ADAMTS18 as a gene mutated

in Knobloch syndrome. J Med Genet. 2011 Sep;48(9):597-601.

434. Murdock DR, Clark GD, Bainbridge MN, et al. Whole-exome sequencing identifies compound

heterozygous mutations in WDR62 in siblings with recurrent polymicrogyria. Am J Med Genet A.

2011 Sep;155A(9):2071-7.

435. Regalado ES, Guo DC, Villamizar C, et al. Exome sequencing identifies SMAD3 mutations as a

cause of familial thoracic aortic aneurysm and dissection with intracranial and other arterial

aneurysms. Circ Res. 2011 Sep 2;109(6):680-6.

436. Dickinson RE, Griffin H, Bigley V, et al. Exome sequencing identifies GATA-2 mutation as the

cause of dendritic cell, monocyte, B and NK lymphoid deficiency. Blood. 2011 Sep 8;118(10):2656-

8.

437. Hor H, Bartesaghi L, Kutalik Z, et al. A missense mutation in myelin oligodendrocyte

glycoprotein as a cause of familial narcolepsy with cataplexy. Am J Hum Genet. 2011 Sep

9;89(3):474-9.

438. Marti-Masso JF, Ruiz-Martínez J, Makarov V, et al. Exome sequencing identifies GCDH

(glutaryl-CoA dehydrogenase) mutations as a cause of a progressive form of early-onset generalized

dystonia. Hum Genet. 2012 Mar;131(3):435-42.

439. Tariq M, Belmont JW, Lalani S, et al. SHROOM3 is a novel candidate for heterotaxy identified

by whole exome sequencing. Genome Biol. 2011 Sep 21;12(9):R91.

440. Takata A, Kato M, Nakamura M, et al. Exome sequencing identifies a novel missense variant in

RRM2B associated with autosomal recessive progressive external ophthalmoplegia. Genome Biol.

2011 Sep 28;12(9):R92.

441. Theis JL, Sharpe KM, Matsumoto ME, et al. Homozygosity mapping and exome sequencing

reveal GATAD1 mutation in autosomal recessive dilated cardiomyopathy. Circ Cardiovasc Genet.

2011 Dec;4(6):585-94.

442. Pierson TM, Adams D, Bonn F, et al. Whole-exome sequencing identifies homozygous AFG3L2

mutations in a spastic ataxia-neuropathy syndrome linked to mitochondrial m-AAA proteases. PLoS

Genet. 2011 Oct;7(10):e1002325.

443. Al Badr W, Al Bader S, Otto E, et al. Exome capture and massively parallel sequencing

identifies a novel HPSE2 mutation in a Saudi Arabian child with Ochoa (urofacial) syndrome. J

Pediatr Urol. 2011 Oct;7(5):569-73.

444. Cullinane AR, Vilboux T, O'Brien K, et al. Homozygosity mapping and whole-exome

sequencing to detect SLC45A2 and G6PC3 mutations in a single patient with oculocutaneous

albinism and neutropenia. J Invest Dermatol. 2011 Oct;131(10):2017-25.

Page 176: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

163

445. Ovunc B, Otto EA, Vega-Warner V, et al. Exome sequencing reveals cubilin mutation as a

single-gene cause of proteinuria. J Am Soc Nephrol. 2011 Oct;22(10):1815-20.

446. Bowne SJ, Humphries MM, Sullivan LS, et al. A dominant mutation in RPE65 identified by

whole-exome sequencing causes retinitis pigmentosa with choroidal involvement. Eur J Hum Genet.

2011 Oct;19(10):1074-81.

447. Kitamura A, Maekawa Y, Uehara H, et al. A mutation in the immunoproteasome subunit PSMB8

causes autoinflammation and lipodystrophy in humans. J Clin Invest. 2011 Oct;121(10):4150-60.

448. Tyynismaa H, Sun R, Ahola-Erkkilä S, et al. Thymidine kinase 2 mutations in autosomal

recessive progressive external ophthalmoplegia with multiple mitochondrial DNA deletions. Hum

Mol Genet. 2012 Jan 1;21(1):66-75.

449. Bjursell MK, Blom HJ, Cayuela JA, et al. Adenosine kinase deficiency disrupts the methionine

cycle and causes hypermethioninemia, encephalopathy, and abnormal liver function. Am J Hum

Genet. 2011 Oct 7;89(4):507-15.

450. Zangen D, Kaufman Y, Zeligson S, et al. XX ovarian dysgenesis is caused by a PSMC3IP/HOP2

mutation that abolishes coactivation of estrogen-driven transcription. Am J Hum Genet. 2011 Oct

7;89(4):572-9.

451. Galmiche L, Serre V, Beinat M, et al. Exome sequencing identifies MRPL3 mutation in

mitochondrial cardiomyopathy. Hum Mutat. 2011 Nov;32(11):1225-31.

452. Bredrup C, Saunier S, Oud MM, et al. Ciliopathies with skeletal anomalies and renal

insufficiency due to mutations in the IFT-A gene WDR19. Am J Hum Genet. 2011 Nov

11;89(5):634-43.

453. Saitsu H, Osaka H, Sasaki M, et al. Mutations in POLR3A and POLR3B encoding RNA

Polymerase III subunits cause an autosomal-recessive hypomyelinating leukoencephalopathy. Am J

Hum Genet. 2011 Nov 11;89(5):644-51.

454. Clayton-Smith J, O'Sullivan J, Daly S, et al. Whole-exome-sequencing identifies mutations in

histone acetyltransferase gene KAT6B in individuals with the Say-Barber-Biesecker variant of Ohdo

syndrome. Am J Hum Genet. 2011 Nov 11;89(5):675-81.

455. Aldahmesh MA, Mohamed JY, Alkuraya HS, et al. Recessive mutations in ELOVL4 cause

ichthyosis, intellectual disability, and spastic quadriplegia. Am J Hum Genet. 2011 Dec 9;89(6):745-

50.

456. Chen WJ, Lin Y, Xiong ZQ, et al. Exome sequencing identifies truncating mutations in PRRT2

that cause paroxysmal kinesigenic dyskinesia. Nat Genet. 2011 Nov 20;43(12):1252-5.

457. Logan CV, Lucke B, Pottinger C, et al. Mutations in MEGF10, a regulator of satellite cell

myogenesis, cause early onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD).

Nat Genet. 2011 Nov 20;43(12):1189-92.

Page 177: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

164

458. Dauber A, Nguyen TT, Sochett E, et al. Genetic defect in CYP24A1, the vitamin D 24-

hydroxylase gene, in a patient with severe infantile hypercalcemia. J Clin Endocrinol Metab. 2012

Feb;97(2):E268-74.

459. Shamseldin HE, Faden MA, Alashram W, et al. Identification of a novel DLX5 mutation in a

family with autosomal recessive split hand and foot malformation. J Med Genet. 2012 Jan;49(1):16-

20.

460. Sergouniotis PI, Davidson AE, Mackay DS, et al. Biallelic mutations in PLA2G5, encoding

group V phospholipase A2, cause benign fleck retina. Am J Hum Genet. 2011 Dec 9;89(6):782-91.

461. Berger I, Ben-Neriah Z, Dor-Wolman T, et al. Early prenatal ventriculomegaly due to an AIFM1

mutation identified by linkage analysis and whole exome sequencing. Mol Genet Metab. 2011

Dec;104(4):517-20.

462. Bhat V, Girimaji SC, Mohan G, et al. Mutations in WDR62, encoding a centrosomal and nuclear

protein, in Indian primary microcephaly families with cortical malformations. Clin Genet. 2011

Dec;80(6):532-40.

463. Wang X, Wang H, Cao M, et al. Whole-exome sequencing identifies ALMS1, IQCB1, CNGA3,

and MYO7A mutations in patients with Leber congenital amaurosis. Hum Mutat. 2011

Dec;32(12):1450-9.

464. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging

missense mutations. Nat Methods. 2010 Apr;7(4):248-9.

465. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on

protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81.

466. Pollard KS, Hubisz MJ, Rosenbloom KR, et al. Detection of nonneutral substitution rates on

mammalian phylogenies. Genome Res. 2010 Jan;20(1):110-21.

467. Cooper GM, Stone EA, Asimenos G, et al. Distribution and intensity of constraint in mammalian

genomic sequence. Genome Res. 2005 Jul;15(7):901-13.

468. Melton PE, Pankratz N. Joint analyses of disease and correlated quantitative phenotypes using

next-generation sequencing data. Genet Epidemiol. 2011;35 Suppl 1:S67-73.

469. Stitziel NO, Kiezun A, Sunyaev S. Computational and statistical approaches to analyzing variants

identified by exome sequencing. Genome Biol. 2011 Sep 14;12(9):227.

470. Ionita-Laza I, Makarov V, Yoon S, et al. Finding disease variants in Mendelian disorders by

using sequence data: methods and applications. Am J Hum Genet. 2011 Dec 9;89(6):701-12.

471. Sjöblom T, Jones S, Wood LD, et al. The consensus coding sequences of human breast and

colorectal cancers. Science. 2006 Oct 13;314(5797):268-74.

472. Parsons DW, Jones S, Zhang X, et al. An integrated genomic analysis of human glioblastoma

multiforme. Science. 2008 Sep 26;321(5897):1807-12.

Page 178: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

165

473. Ley TJ, Mardis ER, Ding L, et al. DNA sequencing of a cytogenetically normal acute myeloid

leukaemia genome. Nature. 2008 Nov 6;456(7218):66-72.

474. Mardis ER, Ding L, Dooling DJ, et al. Recurring mutations found by sequencing an acute

myeloid leukemia genome. N Engl J Med. 2009 Sep 10;361(11):1058-66.

475. Ley TJ, Ding L, Walter MJ, et al. DNMT3A mutations in acute myeloid leukemia. N Engl J Med.

2010 Dec 16;363(25):2424-33.

476. Shah SP, Morin RD, Khattra J, et al. Mutational evolution in a lobular breast tumour profiled at

single nucleotide resolution. Nature. 2009 Oct 8;461(7265):809-13.

477. Ding L, Ellis MJ, Li S, et al. Genome remodelling in a basal-like breast cancer metastasis and

xenograft. Nature. 2010 Apr 15;464(7291):999-1005.

478. Pleasance ED, Stephens PJ, O'Meara S, et al. A small-cell lung cancer genome with complex

signatures of tobacco exposure. Nature. 2010 Jan 14;463(7278):184-90.

479. Lee W, Jiang Z, Liu J, et al. The mutation spectrum revealed by paired genome sequences from a

lung cancer patient. Nature. 2010 May 27;465(7297):473-7.

480. Harbour JW, Onken MD, Roberson ED, et al. Frequent mutation of BAP1 in metastasizing uveal

melanomas. Science. 2010 Dec 3;330(6009):1410-3.

481. Timmermann B, Kerick M, Roehr C, et al. Somatic mutation profiles of MSI and MSS colorectal

cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS One.

2010 Dec 22;5(12):e15661.

482. Chapman MA, Lawrence MS, Keats JJ, et al. Initial genome sequencing and analysis of multiple

myeloma. Nature. 2011 Mar 24;471(7339):467-72.

483. Totoki Y, Tatsuno K, Yamamoto S, et al. High-resolution characterization of a hepatocellular

carcinoma genome. Nat Genet. 2011 May;43(5):464-9.

484. Tiacci E, Trifonov V, Schiavoni G, et al. BRAF mutations in hairy-cell leukemia. N Engl J Med.

2011 Jun 16;364(24):2305-15.

485. Pasqualucci L, Trifonov V, Fabbri G, et al. Analysis of the coding genome of diffuse large B-cell

lymphoma. Nat Genet. 2011 Jul 31;43(9):830-7.

486. Jiao Y, Shi C, Edil BH, et al. DAXX/ATRX, MEN1, and mTOR pathway genes are frequently

altered in pancreatic neuroendocrine tumors. Science. 2011 Mar 4;331(6021):1199-203.

487. Wang K, Kan J, Yuen ST, et al. Exome sequencing identifies frequent mutation of ARID1A in

molecular subtypes of gastric cancer. Nat Genet. 2011 Oct 30;43(12):1219-23.

488. International Cancer Genome Consortium, Hudson TJ, Anderson W, et al. International network

of cancer genome projects. Nature. 2010 Apr 15;464(7291):993-8.

489. Byun M, Abhyankar A, Lelarge V, et al. Whole-exome sequencing-based discovery of STIM1

deficiency in a child with fatal classic Kaposi sarcoma. J Exp Med. 2010 Oct 25;207(11):2307-12.

Page 179: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

166

490. Snape K, Hanks S, Ruark E, et al. Mutations in CEP57 cause mosaic variegated aneuploidy

syndrome. Nat Genet. 2011 Jun;43(6):527-9.

491. Comino-Méndez I, Gracia-Aznárez FJ, Schiavi F, et al. Exome sequencing identifies MAX

mutations as a cause of hereditary pheochromocytoma. Nat Genet. 2011 Jun 19;43(7):663-7.

492. Saarinen S, Aavikko M, Aittomäki K, et al. Exome sequencing reveals germline NPAT mutation

as a candidate risk factor for Hodgkin lymphoma. Blood. 2011 Jul 21;118(3):493-8.

493. Bodmer W, Tomlinson I. Rare genetic variants and the risk of cancer. Curr Opin Genet Dev.

2010 Jun;20(3):262-7.

494. Kote-Jarai Z, Jugurnauth S, Mulholland S, et al. A recurrent truncating germline mutation in the

BRIP1/FANCJ gene and susceptibility to prostate cancer. Br J Cancer. 2009 Jan 27;100(2):426-30.

495. Zhang S, Phelan CM, Zhang P, et al. Frequency of the CHEK2 1100delC mutation among

women with breast cancer: an international study. Cancer Res. 2008 Apr 1;68(7):2154-7.

496. Yokoyama S, Woods SL, Boyle GM, et al. A novel recurrent mutation in MITF predisposes to

familial and sporadic melanoma. Nature. 2011 Nov 13;480(7375):99-103.

497. Park DJ, Odefrey FA, Hammet F, et al. FAN1 variants identified in multiple-case early-onset

breast cancer families via exome sequencing: no evidence for association with risk for breast cancer.

Breast Cancer Res Treat. 2011 Dec;130(3):1043-9.

498. Risch HA, McLaughlin JR, Cole DEC, et al. Population BRCA1 and BRCA2 mutation

frequencies and cancer penetrances: a kin-cohort study in Ontario, Canada. J Natl Cancer Inst

2006;98:1694–706.

499. The Breast Cancer Linkage Consortium. Cancer risks in BRCA2 mutation carriers. J Natl Cancer

Inst 1999;91:1310–1316.

500. van Asperen CJ, Brohet RM, Meijers-Heijboer EJ, et al. Cancer risks in BRCA2 families:

estimates for sites other than breast and ovary. J Med Genet 2005;42:711–719.

501. Couch FJ, Johnson MR, Rabe KG, et al. The prevalence of BRCA2 mutations in familial

pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2007 Feb;16(2):342-6.

502. Ferrone CR, Levine DA, Tang LH, et al. BRCA germline mutations in Jewish patients with

pancreatic adenocarcinoma. J Clin Oncol. 2009 Jan 20;27(3):433-8.

503. Abbott DW, Freeman ML, Holt JT. Double-strand break repair deficiency and radiation

sensitivity in BRCA2 mutant cancer cells. J Natl Cancer Inst. 1998 Jul 1;90(13):978-85.

504. Goggins M, Hruban RH, Kern SE. BRCA2 is inactivated late in the development of pancreatic

intraepithelial neoplasia: evidence and implications. Am J Pathol. 2000 May;156(5):1767-71.

505. Skoulidis F, Cassidy LD, Pisupati V, et al. Germline Brca2 heterozygosity promotes Kras(G12D)

-driven carcinogenesis in a murine model of familial pancreatic cancer. Cancer Cell. 2010 Nov

16;18(5):499-509.

Page 180: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

167

506. Rowley M, Ohashi A, Mondal G, et al. Inactivation of Brca2 promotes Trp53-associated but

inhibits KrasG12D-dependent pancreatic cancer development in mice. Gastroenterology. 2011

Apr;140(4):1303-1313.e1-3.

507. Feldmann G, Karikari C, dal Molin M, et al. Inactivation of Brca2 cooperates with

Trp53(R172H) to induce invasive pancreatic ductal adenocarcinomas in mice: a mouse model of

familial pancreatic cancer. Cancer Biol Ther. 2011 Jun 1;11(11):959-68.

508. Thompson D, Easton DF, the Breast Cancer Linkage Consortium. Cancer Incidence in BRCA1

mutation carriers. J Natl Cancer Inst 2002;94:1358-65.

509. Brose MS, Rebbeck TR, Calzone KA, et al. Cancer risk estimates for BRCA1 mutation carriers

identified in a risk evaluation program. J Natl Cancer Inst 2002;94:1365–72.

510. Beger C, Ramadani M, Meyer S, et al. Down-regulation of BRCA1 in chronic pancreatitis and

sporadic pancreatic adenocarcinoma. Clinical Cancer Res 2004;10:3780-3787.

511. Honrado E, Benitez J, Palacios J. The molecular pathology of hereditary breast cancer: genetic

testing and therapeutic implications. Mod Pathol 2005;18:1305-20.

512. Esteller M, Fraga MF, Guo M, et al. DNA methylation patterns in hereditary human cancers

mimic sporadic tumorigenesis. Hum Mol Genet 2001;10:3001-3007.

513. Gudmundsdottir K, Ashworth A. The roles of BRCA1 and BRCA2 and associated proteins in the

maintenance of genomic stability. Oncogene 2006;25:5864-5874.

514. Struewing JP, Hartge P, Wacholder S, et al. The risk of cancer associated with specific mutations

of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 1997;336:1401-8.

515. Lynch HT, Deters CA, Snyder CL, et al. BRCA1 and pancreatic cancer: pedigree findings and

their causal relationships. Cancer Genetics and Cytogenetics 2005;158:119-125.

516. Tonin P, Weber B, Offit K, et al. Frequency of recurrent BRCA1 and BRCA2 mutations in

Ashkenazi Jewish breast cancer families. Nat Medicine 1996;2:1179-83.

517. Gruber SB, Petersen GM. Cancer risk in BRCA1 carriers: time for the next generation of

studies. J Natl Cancer Inst 2002;94:144-5.

518. Struewing JP, Abeliovich D, Peretz T, et al. The carrier frequency of the BRCA1 185delAG

mutation is approximately 1 percent in Ashkenazi Jewish individuals. Nat Genet. 1995

Oct;11(2):198-200.

519. Ford D, Easton DF, Peto J. Estimates of the gene frequency of BRCA1 and its contribution to

breast and ovarian cancer incidence. Am J Hum Genet. 1995 Dec;57(6):1457-62.

520. Kim DH, Crawford B, Ziegler J, et al. Prevalence and characteristics of pancreatic cancer in

families with BRCA1 and BRCA2 mutations. Fam Cancer. 2009;8(2):153-8.

521. Hall M, Olopade O. Pancreatic cancer and BRCA mutation in familial breast cancer families.

Journal of Clinical Oncology 2005;23(16S):9550.

Page 181: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

168

522. Ozcelik H, Schmoker B, Di Nicola N, et al. Germline BRCA2 6174delT mutations in Ashkenazi

Jewish pancreatic cancer patients. Nat Genet 1997;16:17-8.

523. Peng DF, Kanai Y, Sawada M, et al. DNA methylation of multiple tumor-related genes in

association with overexpression of DNA methyltransferase 1(DNMT1) during multistage

carcinogenesis of the pancreas. Carcinogenesis 2006;27:1160-8.

524. Saif MW. Controversies in adjuvant treatment of pancreatic adenocarcinoma. JOP 2007;8:545-

552.

525. McCabe N, Turner NC, Lord CJ, et al. Deficiency in the repair of DNA damage by homologous

recombination and sensitivity to poly(ADP-ribose) polymerase inhibition. Cancer Res. 2006 Aug

15;66(16):8109-15.

526. Yun J, Zhong Q, Kwak JY, et al. Hypersensitivity of Brca1-deficient MEF to the DNA

interstrand crosslinking agent mitomycin C is associated with defect in homologous recombination

repair and aberrant S-phase arrest. Oncogene 2006;24:4009-16.

527. Treszezamsky AD, Kachnic LA, Feng Z, et al. BRCA1- and BRCA2-deficient cells are sensitive

to etoposide-induced DNA double-strand breaks via topoisomerase II. Cancer Res 2007;67:7078-81.

528. James E, Waldron-Lynch MG, Saif MW. Prolonged survival in a patient with BRCA2 associated

metastatic pancreatic cancer after exposure to camptothecin: a case report and review of literature.

Anticancer Drugs. 2009 Aug;20(7):634-8.

529. Sonnenblick A, Kadouri L, Appelbaum L, et al. Complete remission, in BRCA2 mutation carrier

with metastatic pancreatic adenocarcinoma, treated with cisplatin based therapy. Cancer Biol Ther.

2011 Aug 1;12(3):165-8.

530. Lowery M, Shah MA, Smyth E, et al. A 67-year-old woman with BRCA 1 mutation associated

with pancreatic adenocarcinoma. J Gastrointest Cancer. 2011 Sep;42(3):160-4.

531. Gu W, Lupski JR. CNV and nervous system diseases--what's new? Cytogenet Genome Res.

2008;123(1-4):54-64.

532. Alaerts M, Del-Favero J. Searching genetic risk factors for schizophrenia and bipolar disorder:

learn from the past and back to the future. Hum Mutat. 2009;30:1139-52.

533. Schaschl H, Aitman TJ, Vyse TJ. Copy number variation in the human genome and its

implication in autoimmunity. Clin Exp Immunol. 2009;156:12-6.

534. Lanktree M, Hegele RA. Copy number variation in metabolic phenotypes. Cytogenet Genome

Res. 2008;123:169-75.

535. Karageorgi S, Prescott J, Wong JY, et al. GSTM1 and GSTT1 copy number variation in

population-based studies of endometrial cancer risk. Cancer Epidemiol Biomarkers Prev. 2011

Jul;20(7):1447-52.

Page 182: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

169

536. Engert S, Wappenschmidt B, Betz B, et al. MLPA screening in the BRCA1 gene from 1,506

German hereditary breast cancer cases: novel deletions, frequent involvement of exon 17, and

occurrence in single early-onset cases. Hum Mutat. 2008;29:948-58.

537. Madlensky L, Berk TC, Bapat BV, et al. A preventive registry for hereditary nonpolyposis

colorectal cancer.Can J Oncol. 1995;5:355-60.

538. Cotterchio M, Manno M, Klar N, et al. Colorectal screening is associated with reduced colorectal

cancer risk: a case-control study within the population-based Ontario Familial Colorectal Cancer

Registry. Cancer Causes Control. 2005;16:865-75.

539. Stewart AF, Dandona S, Chen L, et al. Kinesin family member 6 variant Trp719Arg does not

associate with angiographically defined coronary artery disease in the Ottawa Heart Genomics Study.

J Am Coll Cardiol. 2009;53:1471-2.

540. Krawczak M, Nikolaus S, von Eberstein H, et al. PopGen: population-based recruitment of

patiens and controls for the analysis of complex genotype-phenotype relationships. Community

Genet. 2006;9:55-61.

541. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus

genotype data. Genetics. 2000;155:945-59.

542. Li C, Hung Wong W. Model-based analysis of oligonucleotide arrays: model validation, design

issues and standard error application. Genome Biol. 2001;2(8):RESEARCH0032.

543. Nannya Y, Sanada M, Nakazaki K, et al. A robust algorithm for copy number detection using

high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res.

2005;65:6071-9.

544. Korn JM, Kuruvilla FG, McCarroll SA, et al. Integrated genotype calling and association analysis

of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet. 2008 Oct;40(10):1253-

60.

545. Pinto D, Pagnamenta AT, Klei L, et al. Functional impact of global rare copy number variation in

autism spectrum disorders. Nature. 2010 Jul 15;466(7304):368-72.

546. Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative C(T) method. Nat

Protoc. 2008;3:1101-8.

547. Zhang J, Feuk L, Duggan GE, et al. Development of bioinformatics resources for display and

analysis of copy number and other structural variants in the human genome. Cytogenet Genome Res.

2006;115:205-14.

548. Higgins ME, Claremont M, Major JE, et al. CancerGenes: a gene selection resource for cancer

genome projects. Nucleic Acids Res. 2007;35(Database issue):D721-6.

549. Shepherd R, Forbes SA, Beare D, et al. Data mining using the Catalogue of Somatic Mutations in

Cancer BioMart. Database (Oxford) 2011:bar018. Print 2011.

Page 183: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

170

550. Jin Q, Gao G, Mulder KM. Requirement of a dynein light chain in TGFbeta/Smad3 signaling. J

Cell Physiol. 2009 Dec;221(3):707-15.

551. Jiang J, Yu L, Huang X, et al. Identification of two novel human dynein light chain genes,

DNLC2A and DNLC2B, and their expression changes in hepatocellular carcinoma tissues from 68

Chinese patients. Gene. 2001;281:103-13.

552. Malinda KM, Kleinman HK. The laminins. Int J Biochem Cell Biol. 1996 Sep;28(9):957-9.

553. Kim YH, Lee HC, Kim SY, et al. Epigenomic analysis of aberrantly methylated genes in

colorectal cancer identifies genes commonly affected by epigenetic alterations. Ann Surg Oncol.

2011;18:2338-47.

554. Scrideli CA, Carlotti CG Jr, Okamoto OK, et al. Gene expression profile analysis of primary

glioblastomas and non-neoplastic brain tissue: identification of potential target genes by

oligonucleotide microarray and real-time quantitative PCR. J Neurooncol. 2008;88:281-91.

555. Pinto D, Darvishi K, Shi X, et al. Comprehensive assessment of array-based platforms and calling

algorithms for detection of copy number variants. Nat Biotechnol. 2011;29:512-20.

556. Wang H, Linghu H, Wang J, et al. The role of Crk/Dock180/Rac1 pathway in the malignant

behavior of human ovarian cancer cell SKOV3. Tumour Biol. 2010;31:59-67.

557. Sanders MA, Ampasala D, Basson MD. DOCK5 and DOCK1 regulate Caco-2 intestinal

epithelial cell spreading and migration on collagen IV. J Biol Chem. 2009;284:27-35.

558. Buchholz M, Braun M, Heidenblut A, et al. Transcriptome analysis of microdissected pancreatic

intraepithelial neoplastic lesions. Oncogene. 2005;24:6626-36.

559. Gu W, Zhang F, Lupski JR. Mechanisms for human genomic rearrangements. Pathogenetics.

2008 Nov 3;1(1):4.

560. Pruitt KD, Tatusova T, Brown GR, et al. NCBI Reference Sequences (RefSeq): current status,

new features and genome annotation policy. Nucleic Acids Res. 2012 Jan;40(Database issue):D130-5.

561. Griffiths-Jones S. miRBase: microRNA sequences and annotation. Curr Protoc Bioinformatics.

2010 Mar;Chapter 12:Unit 12.9.1-10.

562. Hercus C. 2009 [last accessed date November, 2009]. www.novocraft.com.

563. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework

for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297-303.

564. Affymetrix. BRLMM: An improved genotype calling method for the GeneChip® Mapping 500K

Array Set. http://affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf

565. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on

protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81.

566. Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL:

http://evs.gs.washington.edu/EVS/) [last accessed Dec 2011].

Page 184: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

171

567. Ahel I, Ahel D, Matsusaka T, et al. Poly(ADP-ribose)-binding zinc finger motifs in DNA

repair/checkpoint proteins. Nature. 2008 Jan 3;451(7174):81-5.

568. Macrae CJ, McCulloch RD, Ylanko J, et al. APLF (C2orf13) facilitates nonhomologous end-

joining and undergoes ATM-dependent hyperphosphorylation following ionizing radiation. DNA

Repair (Amst). 2008 Feb 1;7(2):292-302.

569. Allen NP, Donninger H, Vos MD, et al. RASSF6 is a novel member of the RASSF family of

tumor suppressors. Oncogene. 2007 Sep 13;26(42):6203-11.

570. Ou YY, Mack GJ, Zhang M, et al. CEP110 and ninein are located in a specific domain of the

centrosome associated with centrosome maturation. J Cell Sci. 2002 May 1;115(Pt 9):1825-35.

571. Carrara S, Cangi MG, Arcidiacono PG, et al. Mucin expression pattern in pancreatic diseases:

findings from EUS-guided fine-needle aspiration biopsies. Am J Gastroenterol. 2011

Jul;106(7):1359-63.

572. Fletcher O, Houlston RS. Architecture of inherited susceptibility to common cancer. Nat Rev

Cancer. 2010 May;10(5):353-61.

573. Mehrotra PV, Ahel D, Ryan DP, et al. DNA repair factor APLF is a histone chaperone. Mol Cell.

2011 Jan 7;41(1):46-55.

574. Okada S, Tokunaga E, Kitao H, et al. Loss of Heterozygosity at BRCA1 Locus Is Significantly

Associated with Aggressiveness and Poor Prognosis in Breast Cancer. Ann Surg Oncol. 2011 Dec 17.

[Epub ahead of print]

575. Lane DP. Cancer. p53, guardian of the genome. Nature. 1992 Jul 2;358(6381):15-6.

576. Chen XR, Zhang WZ, Lin XQ, et al. Genetic instability of BRCA1 gene at locus D17S855 is

related to clinicopathological behaviors of gastric cancer from Chinese population. World J

Gastroenterol. 2006 Jul 14;12(26):4246-9.

577. Pestonjamasp PH, Mittra I. Analysis of BRCA1 involvement in breast cancer in Indian women.

J Biosci. 2000 Mar;25(1):19-23.

578. Garcia-Patiño E, Gomendio B, Lleonart M, et al. Loss of heterozygosity in the region including

the BRCA1 gene on 17q in colon cancer. Cancer Genet Cytogenet. 1998 Jul 15;104(2):119-23.

Page 185: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

172

Appendices

1. Appendix Tables

Table S1: Primers for BRCA1 microsatellite markers Microsatellite marker Number of

repeats Expected average amplicon size (bp)

Primer sequences Annealing temp (°C)

D17S855 dinucleotide 151 F: GGA TGG CCT TTT AGA AAG TGG R: ACA CAG ACT TGT CCT ACT GCC

60

D17S1322 trinucleotide 130

F: CTA GCC TGG GCA ACA AAC GA R: GCA GGA AGC AGG AAT GGA AC

57

D17S579 dinucleotide 123 F: AGT CCT GTA GAC AAA ACC TG R: CAG TTT CAT ACC AAG TTC CT

57

D16S2616 trinucleotide 125

F: TGT GAT TCA GTA GGT CTT GGG R: GTG ACT AAA CCT GAC ATT GTG C

62

Table S2: BRCA1 mutations sequencing primers Mutation Expected amplicon size (bp) Primer sequences Annealing temp (°C) 5382insC 109 F: CAG AGG AGA TGT GGT CAA TG

R: GGG GTG AGA TTT TTG TCA AC 55

185delAg 91 F: CGT TGA AGA AGT ACA AAA TGT C R: CCC AAA TTA ATA CAC TCT TGT G

59

2318delG 103 F: CTA AGT GTT CAA ATA CCA GTG R: GCA TTA TTA GAC ACT TTA ACT G

55

Table S3: FPC cases in CNV study

(Table available as excel sheet on attached CD)

Table S4: Controls (OFCCR and FGICR) in CNV study

(Table available as excel sheet on attached CD)

Table S5: Primers for qPCR validation of CNVs

CNV ID F primer R primer D_180 GGAGGACATGGAATTGATGG CTGCAAGCAAAGATCACCAA D_19 GTAGCAGAGTGGGCCAAAAA GGGAAAAATTCACCCCTGAT

D_128 GCAGAATGAAATTTGGCACA AAGCCACCACTGAGGTTCAC D_152 CCAGAGAGGATGGTGAGAGG GCTTTGGGACTGACTGCTTC

D_234 (primer A) AAGGAGGCTGAGTGGCTACA CCTTGAAGACCTGGCTTCTG D_234 (primer B) AGGGAAGAACACCTCCACCT ATCCCTCTTCCTTGCTCCAT D_143 (primer A) TGCTCCATGGTGCTGATTTA CACACATCACTGCCCTTCAC D_143 (primer B) TCTGTTCCTATTCGGCCATC TTCTCCCAAACTCCACAAGC

D_220 GCTCCAAGATCCGTTCTGAG TCATTTGACGCATGACCCTA D_30 & D_36

(same region in two samples) TACAGGCAACCCCAGGTATC CACCCAGCCATGTTTTCTTT

D_40 AAAGAGGCCAACAGGAAACC TCTGAGAAAGCGTAGACATTTCC D_105 (primer A) TTTCTAGCTGGGCTCTCCAA CCAGCAATGGTAGGGTGAGT D_105 (primer B) CTGGCTTTTGTGGATGGTTT TGCATGCTTGAATCTCCTTG

D_83 ACAGCCAAGGGTGAAACATC CTGTGAACCTGGGTGAACCT D_48 CACTGGATTGGAGACCAGAA TTGGAAGAACTCGGCTTGAT

D_125 ACGGATTCCTCAACACTTGC CTGTCCTGGCTACTGCATCA D_134 GCATCCTTGCACTACCCATT GGGGGAAAGTGCTGTGTAAA

D_142 (primer A) CTACCTACTGGGCACCCAAA TTGATGTTGAAATGGGCTGA D_142 (primer B) TGGTGATACCCACTGCTGAA CCAGCTTGCTTTCTTTGTCC

D_56 GCAGATTTCAGGTGTGCTGA AAAGACACCCTGGCAGAGAA G_225 TGCCTTGGCTCCACTTCTAT GTCCAGCTCCACAAGAGAGG

Page 186: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

173

G_226 TGTGCCAGTGGACTCTGAAC TTTGTTGACCACTCCCTTCC G_365 (primer A) TCCCAACCATATCACCCAGT AAAACCAACCAAGGCATCAG G_365 (primer B) TGCCTGCTGCTTAAAAAGGT ATATCAACGACTGCCCTTGG

G_369 GGGGCAGCTGTAAATACCAA CCCCAGGTCATAGACCAGAA G_380 GGCAGGTAGACATGACAGCA CCATCTCAGCTCCAGTCACA G_407 TGCCCCCAAAATGAATGTAT CAAAAGTGTTGGCTGCTGAA

G_603/604 TAGGCCTTGGATGGAAATTG GTGATGAGGGGGTGAAGAGA G_69 TGGGAACCCCTGCTATAGTG TGCTCGCTTTGAATTTGATG G_88 AGGTCAGCGCTCCTCAATAA TGCCCCTGTGCATACAAATA

G_97 (primer A) CAGCTCTCCAGGTCATCCAT GAGTTCACCAGGTGGGAAAA G_97 (primer B) AGAACCGAGTGGAAAGAGCA TGAGGCCCAAAGATGGTAAC

Table S6: Primers for qPCR breakpoint mapping of TGFBR3-transecting duplication

CNV ID F primer R primer T_Out_1 CCAAGGCCTCTGGACTAGGT AGACTTGGAGCCCTAGGACAA T_Out_2 TCACTTGGCTTCATGAAAAGG AAATAGCCCCAGATGTGTGC T_Out_3 AGCCAAGAGCTGTGTTTGTGT AAATGCAATCAAGGCAGCTT T_Out_4 GGCCTCTAGCCCGAAATAAC GACTGCAAAATGGGTGTGG O_In_2 CTTGTGGTTTTGCCTGGAAT ACCACTGTGCAGCTCCTGA

O_Out_1 CCAGTTTGGAATGCAATGAA ACTCTCAGTTGTGGCTTGGAG O_Out_5 ACAAATTGCTGTTTCTTTCTACAGC TTACCTGCGAGCTACTGAATATAGG

Sequencing Primers CTGGTAGACAGTTGGGGTTTC ACATCTCTGGTGCCCTTTG

Table S7: High- and low-confidence losses on Affy500K array in FPC cases

(Table available as excel sheet on attached CD)

Table S8: High- and low-confidence gains on Affy500K array in FPC cases

(Table available as excel sheet on attached CD)

Table S9: High- and low-confidence losses on Affy500K array in controls

(Table available as excel sheet on attached CD)

Table S10: High- and low-confidence gains on Affy500K array in controls

(Table available as excel sheet on attached CD)

Table S11: High-confidence CNVs on Affy 6.0 array in FPC cases

(Table available as excel sheet on attached CD)

Table S12: High-confidence CNVs on Affy 6.0 array in controls

(Table available as excel sheet on attached CD)

Page 187: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

174

2. Appendix Figures One outlier excluded from each set of sample results if value is outside range of mean +/- 2SD

(for this purpose, 2*SD and range is calculated after removing the value in question)

Fold difference calculated relative to average dCt for control samples (i.e. ddCt for each sample is

dCt(sample)-dCt(average))

(error bars = 2*SD of fold difference)

For all figures, the sample with “Id_” is FPC case containing CNV; samples with “RD-“ identifiers are

controls.

Figure S1 – qPCR of region D_180

Figure S2 – qPCR of region D_19

Page 188: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

175

Figure S3 – qPCR of region D_128

Figure S4 – qPCR of region D_152

Figure S5 – qPCR of region D_234 (primer A)

Page 189: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

176

Figure S6 – qPCR of region D_234 (primer B)

Figure S7 – qPCR of region D_143 (primer A)

Figure S8 – qPCR of region D_143 (primer B)

Page 190: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

177

Figure S9 – qPCR of region D_220

Figure S10 – qPCR of region D_30 & D_36

Figure S11 – qPCR of region D_40

Page 191: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

178

Figure S12 – qPCR of region D_105 (primer A)

Figures S13 – qPCR of region D_105 (primer B)

Figure S14 – qPCR of region D_83

Page 192: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

179

Figure S15 – qPCR of region D_48

Figure S16 – qPCR of region D_125

Figure S17 – qPCR of region D_134

Page 193: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

180

Figure S18 – qPCR of region D_142 (primer A)

Figure S19 – qPCR of region D_142 (primer B)

Figure S20 – qPCR of region D_56

Page 194: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

181

Figure S21 – qPCR of region G_225

Figure S22 - Region: G_226

Figure S23 – qPCR of region G_365 (primer A)

Page 195: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

182

Figure S24 – qPCR of region G_365 (primer B)

Figure S25 – qPCR of region G_369

Figure S26 – qPCR of region G_380

Page 196: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

183

Figure S27 – qPCR of region G_407

Figure S28 – qPCR of region G_603/604

Figure S29 – qPCR of region G_69

Page 197: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

184

Figure S30 – qPCR of region G_88

Figure S31 – qPCR of region G_97 (primer A) – ID_27

Figure S32 – qPCR of region G_97 (primer B) – ID_27

Page 198: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

185

Figure S33 - Region G_97 (primer A) – qPCR in ID_203 and family members

Figure S34 - Region G_97 (primer A) – qPCR in ID_203’s family members

Figure S35 - Region G_97 (primer A) – qPCR in ID_203 and family members

Page 199: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

186

Figure S36 - Region G_97 (primer A) – qPCR in ID_203’s family members

Figure S37 - Region G_97 (primer A) – qPCR in ID_203’s family members

Figure S38 - Region G_97 (primer B) –qPCR in ID_203 and family members

Page 200: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

187

Figure S39 – “T_Out_1” – qPCR fine-mapping G_97 breakpoint in Id_203

Figure S40 – “T_Out_2” – qPCR fine-mapping G_97 breakpoint in Id_203

Figure S41 – “T_Out_3” – qPCR fine-mapping G_97 breakpoint in Id_203

Page 201: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

188

Figure S42 – “T_Out_4” – qPCR fine-mapping G_97 breakpoint in Id_203

Figure S43 – “O_In_2” – qPCR fine-mapping G_97 breakpoint in Id_203

Figure S44 – “O_Out_1” – qPCR fine-mapping G_97 breakpoint in Id_203

Page 202: Identifying Susceptibility Genes for Familial Pancreatic Cancer Using Novel High-resolution

189

Figure S45 – “O_Out_5” – qPCR fine-mapping G_97 breakpoint in Id_203