vanderbilt’s dna databank : biovu
Post on 24-Feb-2016
67 Views
Preview:
DESCRIPTION
TRANSCRIPT
Vanderbilt’s DNA Databank:BioVU
Personalized Medicine
• Integration of genomic information into clinical decision making
• Personalized disease treatment and also preventative therapies
What is BioVU?• The move towards personalized medicine requires very large
sample sets for discovery and validation
• BioVU: biobank intended to support a broad view of biology and enable personalized medicine
• Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out
• Linked to Synthetic Derivative: de-identified EMR
• Current sample number: 135,765
o 120,705 adult sampleso 15,099 pediatric samples
Patient Communication Modules
eligibleJohn
Doe
One
way
has
h A7C
CF9
9DE5
732…
.
A7C
CF9
9DE6
5732
….
scru
bbed
Extract DNA
A7C
CF9
9DE6
5732
….
John
Doe
The “synthetic derivative”(SD): can be updated
The Synthetic Derivative
• A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers
• Systematically shifted event dates
• Contains ~1.9 million recordso ~1 million with detailed longitudinal datao averaging 100,000 bytes in size o an average of 27 codes per record
• Records updated over time and are current through 4/30/11
Narratives, such as:• Clinical Notes• Discharge Summaries• History and Physicals• Problem Lists• Surgical Reports• Progress Notes• Letters
Diagnostic Codes, Procedural Codes
Forms (intake, assessment)
Reports (pathology, ECGs, echocardiograms)
Clinical Communications
Lab Values and Vital Signs
Medication Orders
TraceMaster (ECGs)
Synthetic Derivative Data Types
Synthetic Derivative vs. BioVU
A7C
DE6
532…
.
A7C
DE6
532
….
scru
bbed
+A
7CD
E653
2 …
.
scru
bbed
Synthetic Derivative BioVU ~1.9 million ~135,000
Jul-07
Jan-08Jul-0
8Jan-09
Jul-09
Jan-10Jul-1
0Jan-11
Jul-11
Jan-12Jul-1
2Jan-13
Jul-13
Jan-140
25,000
50,000
75,000
100,000
125,000
150,000
175,000
200,000
225,000
Anticipated pediatric samples
Anticipated adult sample accrual
Pediatric samples accrued
Adult samples accrued
Sample accrual
Current accrual as of 2-13-2012:135,765 samples15,099 pediatric
>75
71 - 75
61 - 70
51 - 60
41 - 50
31 - 40
21 - 30
11 - 20
1 - 10
<1
AGE
Female
Male
GENDER
White
Others
Hispanic
Asian
African American
RACE
BioVU Demographics
BioVU Sample Management
RTS SmaRTStore
Validation in BioVU• Sample handling algorithms
o Gender matcho 1/384 gender mismatches
• Ancestryo Characterize sample ancestry, assess usefulness of ‘race’ as
defined in EMRo Provide a panel of ancestry informative markers that define ancestryo No significant difference between the concordance of self-report or
observer-report with genetic ancestry
• Demonstration project – American Journal of Human Genetics, 2010o Can known associations between genetic variants and common
diseases be identified in the EMR?
The “demonstration project”• Genotype “high-value” SNPs in the first 8,000 samples
accrued.o including SNPs associated by replicated genome-wide
experiments with common diseases & traits 1. Atrial fibrillation2. Crohn’s disease3. Multiple Sclerosis4. Rheumatoid arthritis5. Type II Diabetes
• Develop Natural Language Processing methods to identify cases and controls
• Are genotype-phenotype relations replicated?
First results
0.5 5.01.0Odds Ratio
rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
disease gene / regionmarker
2.00.5 5
0.5 5.01.0Odds Ratio
rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
disease gene / regionmarker
2.00.5 5
First results
Types of projects
• Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses
• Discovery of new disease/susceptibility genes resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth)
• Access samples without disease X, or “normals” of specified ancestry, or old normals
• Phenome-wide association study (PheWAS): in development
Data Use Agreement
Genotyping Data Accrual
Q2 2010
Q3 2010
Q4 2010
Q1 2011
Q2 2011
Q3 2011
Q4 20110
2,0004,0006,0008,000
10,00012,00014,00016,000
Total GWAS SubjectsN=14,747
Q4 2008
Q1 2009
Q2 2009
Q3 2009
Q4 2009
Q1 2010
Q2 2010
Q3 2010
Q4 2010
Q1 2011
Q2 2011
Q3 2011
Q4 20110
10,000
20,000
30,000
40,000
50,000
60,000
Adult SamplesPed Samples
Total Genotyped SubjectsN=56,859
Common Diagnoses in BioVU
Examples of ICD-9 codesfor rare diseases
Example Rare Disease
Number in SD Number in BioVU
Microcephalus 1,070 85Pica 115 22Septicemic Plague 21 0Pick’s Disease 45 8Acromegaly and Gigantism 571 123Ehlers-Danlos Syndrome 285 34Narcolepsy without Cataplexy 438 76Spina Bifida 1968 238Stiff-Man Syndrome 82 17Tourette Syndrome 667 34Bell’s Palsy 2534 402Bulimia Nervosa 919 88Cushing’s 1443 298Peyronies Disease 694 157Wilson’s Disease 140 49Meningioma 1444 355Wegener’s 363 141
Not included in SD searches:• Bone marrow transplant• SCID
Flagged Compromised samples:• Transfusion within 2 weeks of blood draw• Leukemia• Myeloma• Lymphoma• Pre-leukemic states
General algorithm for determining EMR phenotype
• Iteratively refine case definition through partial manual review until case definition yields PPV ≥ 95%
• For small case sizes (~100), hand curate cases but use automated case definitions for others
• For samples with inadequate counts of “Definite Cases”, manually review possible cases to determine true positives
• For controls, exclude all potentially overlapping syndromes and possible matches, iteratively refine such that NPV ≥ 98%
Definite Cases(algorithm-defined)
Possible Cases(require manual review)
Controls(algorithm-defined)
Excluded(algorithm-defined)
The problem with ICD9 codes
• ICD9 give both false negatives and false positives
• False negatives:• Outpatient billing limited to 4 diagnoses/visit• Outpatient billing done by physicians (e.g., takes too long to find the
unknown ICD9)• Inpatient billing done by professional coders:
• omit codes that don’t pay well • can only code problems actually explicitly mentioned in documentation
• False positives:• Diagnoses evolve over time -- physicians may initially bill for suspected
diagnoses that later are determined to be incorrect• Billing the wrong code (perhaps it is easier to find for a busier clinician)• Physicians may bill for a different condition if it pays for a given
treatment• Example: Anti-TNF biologics (e.g., infliximab) originally not covered for
psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis
EMR Phenotyping
Medications Labs ICD-9s ≥3 codes
Exclusions Time Constraints
+ +
PHENOTYPE
Lessons from preliminary phenotype development
• Eliminating negated and uncertain terms:– “I don’t think this is MS”, “uncertain if multiple sclerosis”
• Delineating section tag of the note – “FAMILY MEDICAL HISTORY: Mother had multiple
sclerosis.”
• Adding requirements for further signs of “severity of disease”– For MS: an MRI with T2 enhancement, myelin basic
protein or oligoclonal bands on lumbar puncture, etc.– This could potentially miss patients with outside work-ups,
however
Other lessons (more difficult to correct)• A number of incorrect ICD9 codes for RA and MS assigned to
patients
• Evolving disease– “Recently diagnosed with Susac’s syndrome - prior diagnosis of
MS incorrect.” (Notes also included a thorough discussion of MS, ADEM, and Susac’s syndrome.)
• Difference between two doctors: – Presurgical admission H&P includes “rheumatoid arthritis” in the
past medical history – Rheumatology clinic visits notes say the diagnosis is
“dermatomyositis” - never mention RA
• Sometimes incorrect diagnoses are propagated through the record due to cutting-and-pasting / note reuse
ANALYSIS PLAN1. Sample size estimation2. Dependent/outcome variable3. Independent variables (include SNPs, covariates, confounders)
a. Should have race, gender, age in all plans4. Statistical method proposed
a. Type of model if appropriateb. How SNPs will be coded
5. Power calculation6. Population stratification plans7. QC plans
a. Call rate, gender checks, HWE – these will be important to do on each dataset pulled to check for phenotype specific QC issues
PHENOTYPE PLAN8. Trait of interest for study9. Demographic constraints (e.g. gender, age, and/or ethnicity)10.Cases and controls require outline of definition including:
• Inclusion criteria (e.g. ICD9 codes, keyword search, medications, laboratory results)
• Exclusion criteria (e.g. ICD9s, keywords, meds, labs, minimum data or follow up)
11.Validation plan for phenotype (e.g. manual review of all or some records)
VICTR Funding
Investigator query
cases
controls+
Data use agreement + IRB Approval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
Investigator query
cases
controls+
Data use agreement + IRB Approval
Manual Review
Sample retrieval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.
cases
controls+
Investigator query
cases
controls+
Data use agreement + IRB Approval
Sample retrieval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.
Genotyping, genotype-phenotype relations
cases
controls+
Investigator query
cases
controls+
Data use agreement + IRB Approval
BioVU Genotyping Process
Genotyped data analyzedby investigator
Investigator selects cases and controls from
Synthetic Derivative
Investigator signals BioVU programto initiate sample selection
BioVU notifies DNA resources core that samples are ready for
selection and pickingSamples are provided to
appropriate lab and are genotyped
Investigator and BioVU programreceive genotype data
BioVU Genotyping Process:
BioVU Requests
60 Total Requests43 Approvals
BioVU Requests BioVU Approvals0
10
20
30
40
50
60 DNA Requests Data Requests
71
BioVU: New Directions
A well characterized cohort of individuals without specific diseases across all ages to be used as controls
Expansion of BioVU to capture and store plasma to enable candidate proteomic/biomarker research
Expanding BioVU genotyping to include mitochondrial SNP genotyping and copy number variants
Link pediatric DNA samples to maternal samples (mom-baby pairs resource)
Expansion of BioVU sequencing activities to include whole exome sequencing on targeted populations
FAQ “answers”• SD access: “non-human subjects” IRB review (days)
• Current access costs: $4/sample
• Genotyping data: no charge
• Genotyping:o Investigator-funded
Consider VICTR as a funding source
o Genotyping/sequencing performed in VUMC Core Facilities Justification must be provided for outside genotyping, including quality
control plans
o Genotype “redeposit” part of the data use agreement
Questions?
Contact: Erica Bowton PhD
BioVU Program Manager
erica.bowton@vanderbilt.edu
322-1975
top related