aric why large databases as a topic? publically available. many relevant questions unanswered....

78
ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating new methods. Relevant for health disparities Health care not econometrics; Cohort not claims

Upload: amara-sheren

Post on 29-Mar-2015

224 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Why Large Databases as a Topic?

•Publically available.•Many relevant questions unanswered.•Information about health disparities.•Good for testing/validating new methods.•Relevant for health disparities

Health care not econometrics; Cohort not claims

Page 2: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Outline

ARIC Study• Description• Science• Results• Health disparities

Genomic data Other large database

Page 3: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

The Atherosclerosis Risk in Communities (ARIC) Study is an NHLBI-sponsored study of cardiovascular disease in four communities in the United States.

Includes a Community Surveillance and a Cohort Component.

Page 4: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

CHD and Atherosclerosis

Page 5: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Page 6: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Cohort Component

Probability samples of 4 communities 15,792 men and women 45-64 yrs at baseline

examination (1987-1989) Re-examined every three years

• 1987-1989, 1990-1992, 1993-1995, 1996-1998

Extensive examinations include medical, social and demographic data

Annual follow-ups by telephone to maintain contact and assess health status

Page 7: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Community Surveillance Component

CVD endpoint surveillance of all residents of the 4 communities, ages 35-74 years

Ascertainment and classification of coronary and cerebral clinical events, trends over time

Page 8: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Page 9: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Characteristics of the Four ARIC Communities

Study Community Population %

Ages 35-74 Total Black >12 education

Forsyth County, NC 95,863 243,683 24 63

Jackson, MS 68,303 202,895 48 71

Minneapolis suburbs, MN 69,338 192,004 1 85

Washington County, MD 45,539 113,068 4 60

US Total 279,043 751,650

Page 10: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Measure Variation in Cardiovascular Risk Factors, Medical Care & Disease by Race, Sex, Place & Time

ARIC communities differ in their reported cardiovascular mortality rates; atherosclerosis prevalence rates may also differ

Ecologic comparison of community rates with factors that may influence these rates

Study Community All-Cause Mortality Heart Disease Mortality

Men Women Men Women

Forsyth County, NC 16.3 8.7 6.7 2.7

Jackson, MS 20.8 10.0 6.6 2.9

Minneapolis suburbs, MN 9.4 6.3 4.2 1.3

Washington County, MD 16.1 8.2 7.8 2.8

US Total 14.4 8.0 5.7 2.6

Age-adjusted mortality rates* for men & women aged 35-74 years in ARIC study communities, 1980

*indirect age adjustments; annual rate per 1,000 population

Page 11: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Sampling Framework

Probability sample from the previous census, except for Jackson, MS, which is an all black sample

In Forsyth, the original sampling unit was a household.

In the other three locations, the sampling unit was an individual

• Jackson, MS – driver’s license database

• Minneapolis, MN – eligible for jury duty (driver’s license and voters)

• Washington County, MD – driver’s license database

Page 12: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Achilles Heal

Given what I just said, what is ARIC’s Achilles heal?

Page 13: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Achilles Heal

Given what I just said, what is ARIC’s Achilles heal?

Confounding between race and geography!

Terrible decision!

Page 14: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Elements of Baseline Examination

Sitting blood pressure – 3 measurements w/ random zero sphygmomanometer

Anthropometry – weight, standing & sitting height, triceps & subscapular skinfolds, waist, hip, arm & calf girths, wrist breadth

Venipuncture – fasting blood samples for lipids, hemostasis, hematology & chemistry

Electrocardiogram – digitally recorded 12-lead electrocardiogram & 2-minute rhythm strip

Page 15: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Lipid Determinations

8 or 12 hour (overnight) fast information Central laboratory CDC certification Cholesterol measured enzymatically HDL measured by precipitation LDL estimated by Friedewald formula LDL=Total – HDL - (Trigs/5)

Page 16: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

ATP III Classification LDL Cholesterol Description

<100 Optimal

100-129 Near optimal

130-159 Borderline high

160-189 High

>190 Very High

Total Cholesterol

<200 Desirable

200-239 Borderline high

>240 High

HDL Cholesterol

<40 Low

>60 High

Page 17: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Elements of Baseline Examination (cont’d)

Ultrasound, postural change – B-mode scan for wall & lumen measurements in both carotid arteries & 1 popliteal artery; supine brachial & ankle blood pressures, heart rate & blood pressures as participant rises

Interview – medical history, physical activity, TIA & respiratory symptoms, reproductive history, medication use, food frequency

Pulmonary function – digitally recorded forced vital capacity & timed expiratory volumes

Page 18: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Elements of Baseline Examination (cont’d)

Physical exam – brief exam including heart, lungs & extremities; neurologic & breast exam

Medical data review – verify selected positive findings, report selected results to participants, refer for diagnosis or treatment

Reporting of results (deferred) – mail results from routine medical tests to participants & their physicians

Page 19: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Definition of Hypertension

Systolic BP > 140 mmHg Diastolic BP > 90 mmHg Regular use of medications for high blood

pressure or hypertension (participants brought all medications with them to the examination)

Page 20: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Measurements of the Environment

Smoking

Alcohol

Diet

Exercise

Education and income

Psychosocial

Employment

GIS

Some previous exposures

Medications

Biomarkers

Page 21: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Data Collection & Quality Control Immediate entry of data from interviews & exams into

computer-assisted data collection system; data monitoring

Trained & certified staff; monitored performance; implement recertification & retraining as needed

Selected measures repeated during exams by same & different technicians

Duplicate blood samples drawn & shipped to labs with separate IDs; duplicate electrocardiograms transmitted blindly to ECG center

Page 22: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Study Questions

“It is better to know some of the questions than all of the answers.” James Thurber

Page 23: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Study Questions

Diversity of measurements included in ARIC permits many important questions to be addressed

3 primary objectives• Investigate the etiology and natural history of

atherosclerosis• Investigate the etiology of clinical atherosclerotic

diseases (especially incident diseases)• Measure variation in cardiovascular risk factors, medical

care and disease by race, sex, place and time

Page 24: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Investigate the Etiology & Natural History of Atherosclerosis

Ultrasound used to identify signs of early arterial disease• Arterial wall dimensions; Arterial distensibility

Expect atherosclerosis to be associated with the following lipid parameters• Elevated levels of total cholesterol, LDL-C, apoB, Lp(a), TGs• Reduced levels of HDL-C, apoA-I• Predominance of small LDL• DNA variations in specific genes (apolipoprotein E)

Page 25: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Investigate the Etiology & Natural History of Atherosclerosis (cont’d)

Evaluate associations of atherosclerosis with factors that are less directly related to lipid and thrombosis theories• Established risk factors (hypertension, smoking)• Fasting insulin and glucose levels• Routine hematologic measures (WBC, RBC and

platelet counts, hematocrit)• Lifestyle factors (diet, physical activity)

Page 26: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Investigate the Etiology of Clinical Atherosclerotic Diseases

Study both risk factors and indicators of pre-clinical disease in relation to subsequent incident CHD and stroke

Risk factors measured in ARIC permit testing of new hypotheses

Indications of preclinical disease include not only ultrasound measurements but also• Ankle-arm index of peripheral vascular disease • Subtle changes in digitized electrocardiogram

Page 27: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Processed CCA and Plaque Images

Fibrous cap segmentationPlaque segmentationCCA segmentation

Page 28: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Measurement/ascertainment of incident disease

Page 29: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Measurement/ascertainment of incident disease

Study both risk factors and indicators of pre-clinical disease in relation to subsequent incident CHD and stroke

Ascertainment of incident disease• Limited to CHD, CVD, Stroke (hospitalized)• Goal is 100% ascertainment• Annual telephone contact• Hospital record abstraction• Death certificates, death indices• Adjudication

Page 30: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Effects of Study Design

ARIC’s ability to meet its objectives is enhanced by several design features

Consistency is evaluated by studying associations in four geographic locations among men, women, blacks and whites

Generalizability is examined by nesting cohorts into communities covered by broad surveillance• Permits interpretation of study results in terms of representativeness

of cohort participants & their CHD events in their communities & the characteristics of those communities

Page 31: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Effects of Study Design (cont’d)

Surveillance rates are monitored and validated by each community cohort in two ways• Replication of event identification, investigation and diagnosis

activity• Greater effort for accuracy that is afforded each potential cohort

event

Cohorts also provide information on risk factors, preclinical disease and medical care which are used to interpret the rates of clinical disease found in surveillance

Page 32: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Effects of Study Design (cont’d)

ARIC cohort study is prospective• Design of choice for identifying precursors of disease• Important for studying any potential risk factor that may be

influenced by disease or by changes in medications, diet or habits resulting from disease

ARIC observes directly the early signs of atherosclerosis, assessing the association of factors with atherosclerosis in particular• Attempts to unravel some complexity by investigating risk factor

associations with both atherosclerosis and its clinical sequelae

Page 33: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Effects of Study Design (cont’d)

Statistical power in ARIC permits• Subgroup analyses “Is fibrinogen associated with atherosclerosis in

subjects who do or do not smoke?”

• Comparisons of the strength of correlated variables “Which has the stronger association with atherosclerosis—central or peripheral obesity?”

• Comparison of risk factor effects “Are there CHD risk factors that are not associated with atherosclerosis?”

ARIC benefits from progress in modern biochemistry• Storage of multiple aliquots of blood allows for continual

utilization of new biochemical technology

Page 34: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Page 35: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Page 36: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

A Sampling of ARIC Cohort Publications

Risk factors / predictors of prevalent and incident:• Coronary heart disease• Stroke• Diabetes• Obesity• Hypertension• Venous Thromboembolism• Renal dysfunction

Page 37: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

A Sampling of ARIC Cohort Publications

Risk factors / predictors of subclinical vascular diseases:• Carotid atherosclerosis• Cerebral infarcts, white matter disease• Peripheral arterial disease• Microvascular retinal disease• Arterial stiffness• Cardiac autonomic tone

Page 38: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

ARIC Ancillary Studies

To enhance the value of ARIC, welcome proposals from individual investigators to carry out ancillary studies and to promote the advancement of science

An ancillary study is one based on information from ARIC participants in an investigation that is not described in the ARIC protocol• Involves data collection or data analyses under additional funding

that are not included as part of the routine ARIC data set or data analyses

Page 39: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Active ARIC Ancillary Studies

Intimately tied to ARIC, with new data collection and external funding• Periodontal disease, subclinical atherosclerosis and CVD• Chronic inflammation of endodontic origin• Longitudinal investigation of venous thromboembolism• Life course SES and CVD• Using historical records to reconstruct SES exposures in

decedents • Physical activity in context of the environment• Cardiovascular responses to particulate air pollution

Page 40: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Active ARIC Ancillary Studies

Lab-based ancillary studies• Gene-environment interactions and CVD• Genetic determinants of diabetes• Novel biomarkers of atherosclerosis

Studies conducted independently• Jackson Heart Study• Family Heart Study• Sleep Heart Health Study

Meta-analyses (data contributed)

Page 41: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Page 42: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Cohort Baseline Characteristics

Percentage with risk factor

Black Women

Black Men White Women

White Men

Hypertension 57% 55% 26% 29%

Diabetes 21% 19% 8% 10%

Current Smoker 25% 38% 25% 25%

Page 43: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Cohort Baseline Characteristics

Mean risk factor level

Black Women

Black Men White Women

White Men

HDL (mg/dl) 57.8 50.4 57.4 42.6

LDL (mg/dl) 138.0 137.3 135.6 140.0

BMI (kg/m2) 30.8 27.6 26.6 27.4

Page 44: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Summary of Incident Events1987-2002

Black Women

Black Men White Women

White Men

Stroke 153 (6%) 111 (8%) 135 (2%) 178 (4%)

CHD 184 (8%) 190 (13%) 389 (7%) 857 (18%)

*prevalent cases excluded

Page 45: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC ARIC Baseline Characteristics:

Gender/Racial Differences in Drinking, Smoking and BMI

White Male White Female Black Male Black Female0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

Current Drinker Current Smoker BMI≥30

Page 46: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Gender/Racial Differences in HDL Levels, by Drinking Status

White Male White Female Black Male Black Female0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

Never Drinker Low-Moderate Drinker Heavy Drinker

HD

L

Low-Mod Drinker = ≤2 drinks/day

Heavy Drinker = >2 drinks/day *P<0.001

*

*

*

*

**

*

*

Page 47: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Gender/Racial Differences in TG Levels, by Drinking Status

Low-Mod Drinker = ≤2 drinks/day

Heavy Drinker = >2 drinks/day *P<0.05

*

**

White Male White Female Black Male Black Female70.0

90.0

110.0

130.0

150.0

Never Drinker Low-Moderate Drinker Heavy Drinker

Tri

gly

ceri

des

Page 48: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC CHD Risk is Influenced by Interaction between Drinking Status & Genotype

TT CT+CC0.8

1

1.2

1.4

CHD Risk in Whites for Alcohol-Related SNP

(no alcohol)

Never Drinker Low-Mod Drinker Heavy Drinker0.600000000000001

0.800000000000001

1

1.2

CHD Risk in Whites (no genotype)

P=0.01

TT CT+CC0.5

0.7

0.9

1.1

1.3

CHD Risk in WhitesBy Alcohol Intake and Genotype

Never Drinker Low-Mod Drinker Heavy Drinker

P<0.001

Page 49: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Stroke Risk is Influenced by Interaction between Drinking Status & Genotype

AA AG+GG0.8

1

1.2

1.4

Stroke Risk in Whites for Alcohol-Related SNP

(no alcohol)

Never Drinker Low-Mod Drinker Heavy Drinker0.8

1

1.2

1.4

Stroke Risk in Whites (no genotype)

AA AG+GG0.8

1

1.2

1.4

1.6

1.8

2

2.2

Stroke Risk in Whites by Alcohol Intake & Genotype

Never Drinker Low-Mod Drinker Heavy Drinker

P=0.008

Page 50: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

ARIC: Sustainable Philosophy

Role of epidemiologic research in the investigation of etiologic hypotheses is one of active interchange with other disciplines

Basic discoveries often come first in epidemiology• Importance of specific lipoprotein fractions was found first in

population studies, leading to specific investigations of cholesterol transport

Multidisciplinary team of ARIC investigators hopes to promote such scientific interchange

Page 51: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

ARIC: Future Goals

Another examination of the entire cohort.• Healthy aging• Cognitive decline

Imaging Some day we will all know our DNA sequence?

• First population-based cohort with the complete DNA sequence?

• Analysis

Page 52: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Genome-wide Association300,000 – 1,000,000 markers

Cases

Controls

SNP1 SNP2 SNP3 SNPn

….....

Page 53: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Genome-wide Scan

Replicate 1:

Replicate 2:

Genome-wide Association Scan for CHD

Ottawa Heart Institute #1

Cases (n=323): CABG, MI < 60 yrs, no FH, no DM

Controls (n=312): asymptomatic, > 65 yr

Ottawa Heart Institute #2

(304 cases/326 controls)

Atherosclerosis Risk

in Communities (ARIC)(n=15,782)

2,586 SNPs

50 SNPs

2 SNPs

Page 54: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC SNP 107 and CHD risk

0

0.5

1

1.5

AA

4

8

12

16

AG GG AA AG GG0

Relative Risk Absolute Risk

Page 55: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Predictive Ability of 9p21

AUC AUC P value

CHD Risk Score only 0.776

Add 9p21 0.780 0.004 Significant

Add CRP 0.778 0.002 Not significant

Individual risk factors do not cause large changes in the area under the CHD

Risk Score ROC curve.

55

AUC curves plot one minus specificity vs sensitivity, and they are used by regulatory agencies to evaluate new diagnostics.

Page 56: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

ATP III Guidelines ATP III classification using ACRS + 9p21 allele

ATP III classification using ACRS alone

High Mid-high Mid Low

CHD and CHD risk equivalents10-year risk >20%LDL-C goal <100 mg/dL

High 1,870 (372)18.69%

1760 (360) 109 (12)3.95%*

0 0

Multiple (2+) risk factors10-year risk 10–20%LDL-C goal <130 mg/dL

Mid-high 2,049 (219)20.48%

217 (27)10.59%*

1,701 (179) 131 (13)6.39%*

0

Multiple (2+) risk factors10-year risk <10%LDL-C goal <130 mg/dL

Mid 1,737 (80)17.36%

0 179 (17)10.31%*

1,558 (63) 0

0–1 risk factor10-year risk <10%LDL-C goal <160 mg/dL

Low 4,349 (107)43.47%

0 0 0 4,349 (107)

Total 10,004 (778)(100%)

1,977(19.76%)

1989(19.88%)

1,689(16.88%)

4,349(43.47%)

* Percentage of people re-classified. (Number of events on 10 years of follow-up.)

Page 57: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

The Future is Here!

Page 58: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

The field of human genetics: the amount of data is growing#

vari

ants

Year

1980s 1990s 2000 2007 2010

10s

1000s

100s

1x105

1x106

10x106

Candidate

Genes

Linkage

GWAS

Exome and

Whole-genome

sequencing

Page 59: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

• Potential to survey all genetic variation in the genome (or at least ~2.5 M variants!)

• Individual researchers can access this data

Genome-wide association and

whole-genome sequencing

Page 60: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Research Participants

Informedconsent

Submitting Investigators

Data Collection Submission & Management of Data

GWAS Data Repository

De-identified, Coded Data

As a part of funding and generating GWAS data, public repositories have been developed

Distribution &

Secondary Use of Data

RecipientInvestigators

Data Access Request

Data Submission

NIH Genome-Wide Association Studies Policy

Page 61: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

NIH Genome-Wide Association Studies Policy

dbGAP is one of the

central repositories

Page 62: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC

Open Access (summary level)

Search for studies, review protocols and questionnairesView summary phenotype and genotype data

View pre-computed or published genetic associations (after embargo)

Identify studies of interest, view their consent conditions, and review terms for data access

Locate potential collaborators for follow up studies No individual data!

NIH Genome-Wide Association Studies Policy

Page 63: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Controlled Access (individual level)

dbGaP

DatabaseGenotype & Phenotype Data

Public AccessStudy Protocol

Descriptive Information

Coded Genotypes

Phenotypes

Pre-computes

Controlled Access

Specific

Research Use

• Request data for specific research use

• Agreement by PI and institution to terms of access in the Data

Use Certification

Data Access Committee

Specific access rights

NIH Genome-Wide Association Studies Policy

Page 64: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC http://www.ncbi.nlm.nih.gov/gap

Data Release

Page 65: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Framingham Heart Study

Page 66: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Framingham Heart Study

In 1948, the Framingham Heart Study embarked on an ambitious project in health research. At the time, little was known about the general causes of heart disease and stroke, but the death rates for CVD had been increasing steadily since the beginning of the century and had become an American epidemic. Since 1971, the Framingham Heart Study has been conducted in collaboration with Boston University.

Objective - to identify the common factors or characteristics that contribute to CVD by following its development over a long period of time in a large group of participants who had not yet developed overt symptoms of CVD or suffered a heart attack or stroke.

recruited 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts,

Page 67: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARICFramingham Heart Study

Page 68: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Framingham Heart Study

Page 69: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Framingham Heart Study

Page 70: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Framingham SHARe

Page 71: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Framingham SHARe

Page 72: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Women’s Health Initiative (WHI)

Page 73: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Women’s Health Initiative (WHI)

WHI is a long-term national health study (1993-2005)

Objective: strategies for preventing heart disease, breast and colorectal cancer and osteoporotic fractures in postmenopausal women.

161,000 women ages 50-79

Two major parts: a randomized Clinical Trial and an Observational Study

Clinical Trial (CT) enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: • Hormone Replacement Trials• Dietary Modification Trial• Calcium / Vitamin D Trial

The Observational Study (OS) examines the relationship between lifestyle, health and risk factors and specific disease outcomes. This component involves tracking the medical history and health habits of 93,676 women. Recruitment for the observational study was completed in 1998 and participants were followed for 8 to 12 years.

Page 74: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC Women’s Health Initiative (WHI)

Page 75: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC WHI SHARe

Page 76: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC ARIC CARe

Page 77: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARIC ARIC CARe

Page 78: ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating

ARICLarge datasets are not limited to genetic datasets

http://www.ehdp.com/vitalnet/datasets.htm