model-data interface
TRANSCRIPT
Model-Data InterfaceParameter estimation and statistical inference
Parameteres)ma)on
• We’veseenthatbasicreproduc)vera)o,R0,isaveryimportantquan)ty
• Howdowecalculateit?
• Ingeneral,wemightnotknow(many)modelparameters.Howdoweachieveparameteres)ma)onfromepidemiologicaldata?
• Reviewsomesimplemethods
1a.Finaloutbreaksize
• Fromlecture1,werecallthatatendofepidemic:§ S(∞)=1–R(∞)=S(0)e–R(∞)R0
• So,ifweknowpopula)onsize(N),ini)alsuscep)bles(togetS(0)),andtotalnumberinfected(togetR(∞)),wecancalculateR0
Note:Ma&Earn(2006)showedthisformulaisvalidevenwhennumerousassump)onsunderlyingsimpleSIRarerelaxed
R0 = � log(1�R(1))
R(1)
1.Finaloutbreaksize
• Workedexample:
InfluenzaepidemicinaBri)shboardingschoolin1978
N=764X(0)=763Z(∞)~512
R0~1.65
1b.Finaloutbreaksize
• Beckershowedthatwithmoreinforma)on,wecanalsoes)mateR0from
• Again,weneedtoknowpopula)onsize(N),ini)alsuscep)bles(X0),totalnumberinfected(C)
• Usefully,standarderrorforthisformulahasalsobeenderived
(~1.66)
Smallaside:meanageatinfec)on
• Anepidemiologicallyinteres)ngquan)tyismeanageatinfec)on–howdowecalculateitinsimplemodels?
• Fromfirstprinciples,it’smean)mespentinsuscep)bleclass
• Atequilibrium,thisisgivenby1/(βI*),whichleadsto
§ ThiscanbewrigenasR0-1≈L/A (L=lifeexpectancy)
§Historically,thisequa)on’sbeenanimportantlinkbetweenepidemiologicales)matesofAandderivinges)matesofR0
A =
✓1
µ(R0 � 1)
◆
2.Independentdata
• ForS(E)IRmodel,wecancalculateaveragelengthof)meittakesforanindividualtoacquireinfec)on(assumingbornsuscep)ble)
• ExpressionforMeanAgeatInfec1onis
R0 is mean life expectancy (L) divided by mean age at infec8on (A)
MeaslesAge-Stra)fiedSeroprevalence
Infec8on-derived immunity
Maternally-derived an8bodies
Mean age at infec8on (A) is ~4.5 yearsAssume L~75, so R0 ~ 16.6
Historicalsignificance
Anderson&May(1982;Science)
3.EpidemicTake-off
Recallfromlinearstabilityanalysisthat
Takelogarithms
So,regressionslopewillgiveR0
Aslightlymorecommonapproachistostudytheepidemictakeoff
3.Epidemictake-off
• Backtoschoolboys
Lookslikeclassicexponen)altake-off
Epidemictake-off
Λ=1.0859
So,R0=1.0859*2.5+1=3.7
Ourvaluefor‘fluincuba1onperiod
Vynnyckyetal.(2007)
Vynnyckyetal.(2007)
Variantsonthistheme
• Recall
• LetTdbe‘doubling)me’ofoutbreak
• Then,★ R0=log(2)/Tdγ +1
4.Likelihood&inference
• Wefocusonrandomprocessthat(puta)vely)generateddata
• Amodelisexplicit,mathema)caldescrip)onofthisrandomprocess
• “Thelikelihood”isprobabilitythatdatawereproducedgivenmodelanditsparameters:
L(model|data)=Pr(data|model)
• Likelihoodquan)fies(insomesenseop)mally)modelgoodnessoffit
16
4.Likelihood&es)ma)on
• Assumewehavedata,D,andmodeloutput,M(botharevectorscontainingstatevariables).Modelpredic)onsgeneratedusingsetofparameters,θ
• Transmissiondynamicssubjectto– “processnoise”:heterogeneityamongindividuals,randomdifferencesin)mingofdiscreteevents(environmentalanddemographicstochas)city)– “observa)onnoise”:randomerrorsmadeinmeasurementprocessitself
4.Likelihood&es)ma)on
• Ifweignoreprocessnoise,thenmodelisdeterminis)candallvariabilityagributedtomeasurementerror
• Observa)onerrorsassumedtobesequen)allyindependent
• Maximizinglikelihoodinthiscontextiscalled‘trajectorymatching’
4.Likelihood&es)ma)on
• Data,D• Modeloutput,M• Parameters,θ
• Ifweassumemeasurementerrorsarenormallydistributed,withmeanµandvarianceσ2then
4.Likelihood&es)ma)on
• Data,D• Modeloutput,M• Parameters,θ
• OweneasiertodealwithLog-likelihoods:
4.Likelihood&es)ma)on
• Undersuchcondi)ons,MaximumLikelihoodEs)mate,MLE,issimplyparametersetwithsmallestdevia)onfromdata
• Equivalenttousingleastsquareerrors,todecideongoodnessoffit
– LeastSquaresSta)s)c=SSE=Σ(Di–Mi)2
• Then,miminiseSSEtoarriveatMLE
Trajectorymatching
0 2 4 6 8 10 12 140
50
100
150
200
250
300
350
400β=0.006; γ=0.7719; SSE=384519.15
Time (days)
Infecteds
β=4.58,γ=0.7719,SSE=384519
Trajectorymatching
23
0 2 4 6 8 10 12 140
50
100
150
200
250
300
350
400
450β=0.004; γ=0.4719; SSE=195130.1341
Time (days)
Infecteds
β=3.05,γ=0.47,SSE=195130
Modeles)ma)on:InfluenzaoutbreakC
ases
0
75
150
225
300
Day of Outbreak1 2 3 4 5 6 7 8 9 10 11 12 13 14
•Systema)callyvaryβandγ,calculateSSE
•Parametercombina)onwithlowestSSEis‘bestfit’
0 0.5 11.5 0
24
68
0
2
4
6
8
10
12
14
16
x 105
Transmission rate (βRecovery rate (γ)
Sum of Squared Errors (SSE)
2
4
6
8
10
12
14
x 105
00.5
11.5 0
24
68
8
9
10
11
12
13
14
15
Transmission rate (β)Recovery rate (γ)
Log(SSE)
9
9.5
10
10.5
11
11.5
12
12.5
13
13.5
14
0.2 0.4 0.6 0.8 1 1.2 1.40
2
4
6
8x 105
SSE
0.2 0.4 0.6 0.8 1 1.2 1.41
2
3
4
5
6
7
Recovery rate (γ)
Transmission rate (β)
0 2 4x 105
1
2
3
4
5
6
7
SSE
Modeles)ma)on:Influenzaoutbreak
25
Bestfitparametervalues:1. β=1.96(perday)2. 1/γ=2.1days3. R0~4.15
β=1.96
γ=0.47
Generally,mayhavemoreparameterstofit,sogridsearchnotefficient
Nonlinearop)miza)onalgorithms(egNelder-Mead)wouldbeused
4.Likelihood&es)ma)on
• HowdowerelateSSEtologLik?
=SSE=SSE/n
=lengthofdata
Modeles)ma)on:Influenzaoutbreak
00.5
11.5 0
24
68
8
9
10
11
12
13
14
15
Transmission rate (β)Recovery rate (γ)
Log(SSE)
9
9.5
10
10.5
11
11.5
12
12.5
13
13.5
14
SSE LogLik
Modeles)ma)on:Influenzaoutbreak
28
MaximumLikelihoodEs)mates:1. β=1.96(perday)2. 1/γ=2.1days3. R0~4.15
β=1.96
γ=0.47
Recall2log-likelihoodunitsindicatesignificantdifference
Canuselikelihoodprofilestoputconfidenceintervalsones)mates
β=1.96(1.90,2.04)γ=0.47(0.43,0.50)
Modelcomparison
• Howtocomparemodelswithdifferentnumberofes)matedparameters?
• CommonlyuseAkaike’sInforma)onCriterion
• AIC=2p-2logLik,wherepisnumberofes)matedparametersformodel
• rule-of-thumb:ifAICdifference<2,modelsindis)nguishable
29
SIR Model2
β 1.96(1.90,2.04)
γ 0.47(0.43,0.50)
logLik -60.95
AIC 125.9
Likelihoodes)ma)on
30
β=1.96,γ=0.47,Loglik=-60.95
0 2 4 6 8 10 12 140
50
100
150
200
250
300
350β=0.004; γ=0.4719; SSE=4951.6403
Time (days)
Infecteds
Likelihoodsurface
31
−1−0.5
00.5
11.5
22.5
3
−3
−2
−1
0
1
0
1
2
3
4
5
6
x 106
log10(Recovery rate (γ))log10(Transmission rate (β))
SSE
Whenlikelihoodsurfaceissomewhatcomplex,successofes)ma)onusinggradient-basedop)miza)onalgorithms(egNelder-Mead)willdependonprovidingagoodini)alguess
Caveat
• Inboardingschoolexample,datarepresentnumberofboyssick~Y(t)
• Typically,dataare‘incidence’(newlydetectedorreportedinfec)ons)
• Don’tcorrespondtoanymodelvariables• Mayneedto‘construct’newinforma)on:– dC/dt=γY diagnosisatendofinfec)ousness– dC/dt=βXY/N
• SetC(t+Δt)=0whereΔtissamplingintervalofdata
LectureSummary…
• R0canbees)matedfromepidemiologicaldatainavarietyofways– Finalepidemicsize–Meanageatinfec)on– Outbreakexponen)algrowthrate– CurveFi�ng
• Inprinciple,varietyofunknownparametersmaybees)matedfromdata
Further,...
1.Includeuncertaintyinini)alcondi)ons•WetookI(0)=1.Insteadcouldes)mateI(0)togetherwithβandγ(nowhave1fewerdatapoints)
2.Explicitobserva)onmodel•Implicitlyassumedmeasurementerrorsnormallydistributedwithfixedvariance,butcanrelaxthisassump)on
3.Whatisappropriatemodel?•SEIRmodel?(latentperiodbeforebecominginfec)ous)•SEICRmodel?(“confinementtobed”)•Timevaryingparameters?(e.g.ac)ontakentocontrolspread)
34
Further,...
4.Assumedmodeldeterminis)c--howdowefitastochas)cmodel?•Usea‘par)clefilter’tocalculatelikelihood
5.Canwesimultaneouslyes)matenumerousparameters?•Morecomplexmodelshavemoreparameters…es)mateallfrom14datapoints?⇒iden)fiability
6.Morecomplexmodelsaremoreflexible,sotendtofitbeger•Howdowedetermineifincreasedfitjus)fiesincreasedcomplexity?⇒informa)oncriteria
35