graphical models for combining multiple sources of information in observational studies nicky best...
TRANSCRIPT
![Page 1: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/1.jpg)
Graphical models for combining multiple sources of information in
observational studies
Nicky BestSylvia Richardson
Chris JacksonVirgilio GomezSara Geneletti
ESRC National Centre for Research Methods – BIAS node
![Page 2: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/2.jpg)
Outline
• Overview of graphical modelling• Case study 1: Water disinfection byproducts and
adverse birth outcomes – Modelling multiple sources of bias in observational
studies
• Bayesian computation and software• Case study 2: Socioeconomic factors and heart
disease (Chris Jackson)– Combining individual and aggregate level data– Application to Census, Health Survey for England, HES
![Page 3: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/3.jpg)
Graphical modelling
Modelling
Inference
Mathematics
Algorithms
![Page 4: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/4.jpg)
1. Mathematics
Modelling
Inference
Mathematics
Algorithms
• Key idea: conditional independence• X and W are conditionally independent given Z if, knowing
Z, discovering W tells you nothing more about XP(X | W, Z) = P(X | Z)
![Page 5: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/5.jpg)
Example: Mendelian inheritance• Y, Z = genotype of parents • W, X = genotypes of 2 children• If we know the genotypes of the parents, then the
children’s genotypes are conditionally independent
P(X | W, Y, Z) = P(X | Y, Z)
Y
W
Z
X
![Page 6: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/6.jpg)
Joint distributions and graphical models
Graphical models can be used to:
• represent structure of a joint probability distribution…..
• …..by encoding conditional independencies
Factorization thm:
Jt distribution P(V) = P(v | parents[v])
Y
W
Z
XP(X|Y, Z)P(W|Y, Z)
P(Z)P(Y)
P(W,X,Y,Z) = P(W|Y,Z) P(X|Y,Z) P(Y) P(Z)
![Page 7: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/7.jpg)
Where does the graph come from?
• Genetics– pedigree (family tree)
• Physical, biological, social systems– supposed causal effects (e.g. regression models)
![Page 8: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/8.jpg)
• Conditional independence provides basis for splitting large system into smaller components
Y
W
Z
X
A B
D
C
![Page 9: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/9.jpg)
• Conditional independence provides basis for splitting large system into smaller components
Y
W
Z
WD
C
Y Z
X
Y
A B
![Page 10: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/10.jpg)
2. Modelling
Modelling
Inference
Mathematics
Algorithms
![Page 11: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/11.jpg)
Building complex models
Key idea• understand complex system• through global model• built from small pieces
– comprehensible– each with only a few variables– modular
![Page 12: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/12.jpg)
Example: Case study 1
• Epidemiological study of low birth weight and mothers’ exposure to water disinfection byproducts
• Background– Chlorine added to tap water supply for disinfection– Reacts with natural organic matter in water to form
unwanted byproducts (including trihalomethanes, THMs)– Some evidence of adverse health effects (cancer, birth
defects) associated with exposure to high levels of THM– SAHSU are carrying out study in Great Britain using
routine data, to investigate risk of low birth weight associated with exposure to different THM levels
![Page 13: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/13.jpg)
Data sources
• National postcoded births register• Routinely monitored THM concentrations in tap
water samples for each water supply zone within 14 different water company regions
• Census data – area level socioeconomic factors• Millenium cohort study (MCS) – individual level
outcomes and confounder data on sample of mothers
• Literature relating to factors affecting personal exposure (uptake factors, water consumption, etc.)
![Page 14: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/14.jpg)
Model for combining data sources
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
![Page 15: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/15.jpg)
Regression sub-model (MCS)
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
Regression model for MCS data relating risk of low
birth weight (yim) to mother’s THM exposure
and other confounders (cim)
![Page 16: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/16.jpg)
Regression sub-model (MCS)
[c]
[T]
yim
cim
THMim[mother]
Regression model for MCS data relating risk of low
birth weight (yim) to mother’s THM exposure
and other confounders (cim)
Logistic regression
yim ~ Bernoulli(pim)
logit pim = b[c] cim + b[T] THMim
i indexes small area
m indexes mother
[mother]
cik = potential confounders,e.g. deprivation, smoking, ethnicity
![Page 17: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/17.jpg)
Regression sub-model (national data)
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
Regression model for national data relating risk of
low birth weight (yik) to mother’s THM exposure
and other confounders (cik)
![Page 18: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/18.jpg)
Regression sub-model (national data)
[c]
[T]
yik
cik
THMik[mother]
Regression model for national data relating risk of
low birth weight (yik) to mother’s THM exposure
and other confounders (cik)
Logistic regression
yik ~ Bernoulli(pik)
logit pik = b[c] cik + b[T] THMik
i indexes small areak indexes mother
[mother]
![Page 19: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/19.jpg)
Missing confounders sub-model
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
Missing data model to estimate confounders (cik)
for mothers in national data, using information on within area distribution of
confounders in MCS
![Page 20: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/20.jpg)
Missing confounders sub-model
cik
i
cim
Missing data model to estimate confounders (cik)
for mothers in national data, using information on within area distribution of
confounders in MCS
cim ~ Bernoulli(i) (MCS mothers)
cik ~ Bernoulli(i) (Predictions for
mothers in national data)
![Page 21: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/21.jpg)
THM measurement error sub-model
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
Model to estimate true tap water THM concentration
from raw data
![Page 22: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/22.jpg)
THM measurement error sub-model
2
THMzt[true]
THMztj[raw]
Model to estimate true tap water THM concentration
from raw data
THMztj ~ Normal(THMzt, 2)
z = water zone; t = season; j = sample
(Actual model used was a more complex mixture of Normal distributions)
[raw] [true]
![Page 23: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/23.jpg)
THM personal exposure sub-model
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
Model to predict personal exposure using estimated tap water THM level and
literature on distribution of factors affecting individual
uptake of THM
![Page 24: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/24.jpg)
THM personal exposure sub-model
THMik[mother]
THMzt[true]
THMim[mother]
Model to predict personal exposure using estimated tap water THM level and
literature on distribution of factors affecting individual
uptake of THM
THM = ∑k THMzt x quantity (1k) x uptake factor (2k)
where k indexes different water use activities, e.g. drinking, showering, bathing
[mother] [true]
![Page 25: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/25.jpg)
3. Inference
Modelling
Inference
Mathematics
Algorithms
![Page 26: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/26.jpg)
Bayesian
![Page 27: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/27.jpg)
… or non Bayesian
![Page 28: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/28.jpg)
• Graphical approach to building complex models lends itself naturally to Bayesian inferential process
• Graph defines joint probability distribution on all the ‘nodes’ in the model
Recall: Joint distribution P(V) = P(v | parents[v])
• Condition on parts of graph that are observed (data) • Calculate posterior probabilities of remaining nodes
using Bayes theorem• Automatically propagates all sources of uncertainty
Bayesian Full Probability Modelling
![Page 29: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/29.jpg)
[c]
[T]
yik
2
yim
cik
i
cim
THMik[mother]
THMzt[true]
THMztj[raw]
THMim[mother]
Data
Unknowns
![Page 30: Graphical models for combining multiple sources of information in observational studies Nicky Best Sylvia Richardson Chris Jackson Virgilio Gomez Sara](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649dc85503460f94abe158/html5/thumbnails/30.jpg)
4. Algorithms
Modelling
Inference
Mathematics
Algorithms
• MCMC algorithms are able to exploit graphical structure for efficient inference
• Bayesian graphical models implemented in WinBUGS