zorg | 150129 | big data | een optie voor de toekomst van preventieve medische zorg? | presentatie |...
TRANSCRIPT
Background Big Data Final Remarks
De Toekomst Van Biostatistiek inhet Medisch Onderzoek
Jeanine Houwing-Duistermaat
Department of Medical Statistics and Bioinformatics, LUMC
Big Data: Een optie voor de toekomst van preventieve medischezorg? 29 januari 2015 Baarn
Background Big Data Final Remarks
Statistics
Background Big Data Final Remarks
New York Times
In his State of the Union address last week,President Obama encouraged the development ofprecision medicine, which would tailor treatmentsbased on individuals genetics or physiology.
Need for statistics
Background Big Data Final Remarks
Data in (Bio)medical research
Demographic variablesClassical parameters‐omicsGenetic dataHuman MicrobiomeImagingQuestionnaires
Simone Houtman
Background Big Data Final Remarks
Biomedical Research
Insight in biological mechanisms underlyinghuman diseases and health
Individiual predictionsScreeningTreatment effects
Background Big Data Final Remarks
Statistical Challenges: Big data
Combining various data sources: heterogeneity
Hierarchical data
Longitudinally measured profiles
Combination of the above mentioned datasets→ big data
Causality
Model Validation
Background Big Data Final Remarks
Background Big Data Final Remarks
Types of Datasets
Aggregated dataRegistrationSummaries from studies/EPD
Reference datasetsGenome NL1000 Genomes
Patient cohorts
HIS: Data from General Practioners
Epidemiological studies
Background Big Data Final Remarks
Combining data - heterogeneity
Populations
Measurementtechniques
Definitions
PatientCharacteristics
Study Designs
Page 1 of 1
27-01-2015file:///C:/d2011/d/presentaties/zorg/data%20analysis%20cartoon%201.gif
Background Big Data Final Remarks
Hierarchy of data
Background Big Data Final Remarks
Longitudinal Data
Background Big Data Final Remarks
Causality
Background Big Data Final Remarks
Causality
Background Big Data Final Remarks
”Big” Data in Biomedical Research
Genetic Studies (DNA)Genome Wide Association studies: 10M SNPs.Typically well measured.Whole genome sequencing. Data preprocessingimportant.
Metabolomics, Proteomics, GlycomicsSpectra (10-2000 peaks)Noisy data
Lots of data, but most of it not informative
Background Big Data Final Remarks
Sample sizes of Genome Wide Associationstudies
Outcome Number of subjectsHeight 180KLipids 100KCVD 80K
Longevity 8K
Big numbers are obtained by collaborations betweenresearch groups
Background Big Data Final Remarks
Statistical analysis of Genome WideAssociation Studies
Test for association between single geneticmarker and outcome per studySingle study results are combined by usingmeta analysis tools
Background Big Data Final Remarks
Lessons learned from Genome WideAssociation Studies
Quality control
Statistical model should acknowledgeStudy designDistribution of outcome variableAdjust for or model the structure in the data
From Simon Heath et al, EJHG
Background Big Data Final Remarks
Post analysis quality control
Most tests are representing the null hypothesisDistribution of observed test statistics shouldcorrespond to theoretical distribution
Background Big Data Final Remarks
Omics data
Measurement process not automated andstandardized
Degeneration of samples
Detection limit
Non normal data
Technique differences can be huge
Background Big Data Final Remarks
Relationship omics and age, sex
Lucija Klaric
Background Big Data Final Remarks
Multidisciplinary
Medical Science: Need for statistics withrealistic datasets
Data Science: Need for more biological andchemistry
Background Big Data Final Remarks
Final remarks
New methodologies are needed for big data
The amount of data is growing fast but not theinformative part
New biostatistical methods are neededacknowledge hierarchy in datalongitudinal dataselection of correct statistical modelscausality