nguyen thi duy nhat, stacy todd, erwin de bruin, tran …10.1038...nguyen thi duy nhat, stacy todd,...

Appendix for Nhat, Todd, de Bruin et al (2017)

1 / 30

SupplementaryOnlineAppendixStructureofgeneral‐populationantibodytiterdistributionstoinfluenzaAvirusNguyenThiDuyNhat,StacyTodd,ErwindeBruin,TranThiNhuThao,NguyenHaThaoVy,TranMinhQuan,

DaoNguyenVinh,JankovanBeek,PhamHongAnh,HaMinhLam,NguyenThanhHung,NguyenThiLeThanh,

HuynhLeAnhHuy,VoThiHongHa,StephenBaker,GuyEThwaites,NguyenThiNamLien,TranThiKimHong,

JeremyFarrar,CameronPSimmons,NguyenVanVinhChau,MarionKoopmans,MaciejFBoni

TableofContents

1. Mappingluminescencevaluestotitersusingluminescencevaluesatfourdilutions1.1 Four‐parameterlog‐logisticcurve1.2 StandardizationagainsttheinternationalInfluenzaAH1N1Pdm2009Positive

Control2. Mappingluminescencevaluestotitersusingluminescencevaluesattwodilutions3. Titerreproducibility4. Representativenessofsamplecollectionversuscensusdata5. Choosingbinwidthfordiscretizingthenormaldistribution6. Comparisonsofgamma,Weibull,andnormalfits7. Confidenceintervalsformixturecomponentestimates8. Confidenceintervalsforpandemicattackrate9. Additionalfiguresandtables


2 / 30

1. MappingluminescencevaluestotitersusingluminescencevaluesatfourdilutionsSamples were tested for presence of a range of seasonal and pandemic influenza

antibodiesusingtheprotein‐microarray(PA)methodatserialfour‐folddilutionsfrom20to1280 except for the international standard (IS) positive control against H1N1 Pandemic2009 that was tested from 40 to 2560. Commercially available HA1 proteins of the 16influenzaviruses(TableS1)werespottedinduplicatesoneachofthe64‐padnitrocellulosecoated slide as described by Koopmans et al (ClinicalMicrobiology and Infection, 2012,18(8):797‐807). Allof theslidesused inthisstudywereprintedat theErasmusMedicalCenter(Rotterdam,NL)andtheDutchNationalInstituteforPublicHealthandEnvironment(RIVM;Bilthoven,NL)byErwindeBruin. Oneachslide,15humanseraandonepositivecontrol were tested in BSL‐2 conditions at the Oxford University Clinical Research Unit(OUCRU)inHoChiMinhCity,inthesamemannerdescribedbyKoopmansetal(2012).TheprocessedslideswerethendriedandsentbacktoRIVMforscanning.TheantigensignalswerequantifiedbyaScanarrayGxPlusmicroarrayscanner(PerkinElmer)withanadaptivecirclesizeof80−200μm.DigitalcopiesofthescanimageswereanalyzedusingScanArrayExpress (version 4.0) Software (PerkinElmer) to determine the median fluorescenceintensity. ThePAtiterwasdefinedasthebestfittiteraccordingtoafour‐parameterlog‐logistic functiondescribed in thenext section. Fitted titerswere then standardized to apositivecontrolinordertocorrectforvariationacrossslides;valuesfallingoutsidethe[20,1280]rangewerescoredas10and1810.AlldataanalysiswasdoneinMatlabR2015a.TheinitialscriptwaswrittenbyJankovanBeek.


3 / 30

TableS1:Listofallinfluenzastrainstestedonthemicroarrays

HumanAntigensAbbreviation AvianAntigens

Abbreviation

H1N1

A/SouthCarolina/1/18 H1‐18

H5N1

A/Vietnam/1194/2005 H5‐05

A/USSR/92/1977 H1‐77 A/Cambodia/R0405050/2007 H5‐07

A/NewCaledonia/20/1999 H1‐99 A/Hubei/1/2010 H5‐10

A/Brisbane/59/2007 H1‐07 H7N7 A/Chicken/Netherlands/2003 H7‐03

A/California/6/2009 H1‐09 H9N2 A/GuineaFowl/HK/1999 H9‐99

H3N2

A/Aichi/2/68 H3‐68

A/Wyoming/3/2003 H3‐03

A/Wisconsin/67/2005 H3‐05

A/Brisbane/10/2007 H3‐07

A/Victoria/210/2009 H3‐09

A/Victoria/361/2011 H3‐11

1.1 Fourparameterlog‐logisticcurve

Foreachhuman‐antigencombination,theeightluminescencescoresobtainedfromserial four‐folddilutionswere fitted,usingnon‐linear least‐squares regression, toa four‐parameter log‐logistic curve (Figure S1); errors were defined as absolute luminescencedifferencesbetweenthemodelandthedatapoint. Thelog‐logisticluminescencecurveissigmoidalinshape. Themaximumreadoutgivenbythescannerwas65355;thus,65356wasusedasthemaximumluminescenceinthelog‐logisticfit.Theminimumreadoutwasdefinedas3000,belowwhichitwasdifficulttodifferentiatebetweentheantigensignalsandthebackgroundluminescence.Mathematically,thetitervaluewasdeterminedasthex‐valueoftheinflectionpointofthefittedlog‐logisticcurve,whichisequivalenttothepointwherethefittedcurveachievesaluminescenceof34268. SampleswereassignedtitervaluesofNaNifthewidthofthe95%confidenceintervalresultingfromthefitwasgreaterthanonelog2‐unit. The variables titer and bwere fit with 8 luminescence data points using theequationbelow:

Lum1 exp ∙ log

dilutiontiter

inwhich: Lumi:luminescencescoreatdilutionic=3000:minimumluminescenced=65535:maximumluminescetiter:dilutionoftheinflectionpoint


4 / 30

Figure S1: Luminescence vs. protein micro‐array (PA) titers being fitted to four‐

parameterlog‐logisticcurveinwhichc=3000,d=65536,titer=dilutionattheinflectionpoint,andb=slopeattheinflectionpoint1.2 Standardizationagainsttheinternationalpositivecontrol

Asampleof theWHO2nd InternationalStandard(ISH2) forantibody to influenzaH1N1pdmvirus(H1‐09)(NationalInstituteforBiologicalStandardsandControls(NIBSC),NIBSC code: 10/202) was included on every slide to correct for variation among slides,technicians,slidebatches(differentprintingruns),andslideseries(differentcommercialantigen purchases). In order to take into account the titer change across series, thestandardizationprocedurewasdividedintoatwo‐stepprocedure.First,allsampleswithinthe same slide serieswere corrected to thegeometricmean titers for ISH2 samples of areferenceantigenwithintherespectiveseries.Second,dependingonthedifferencebetweenthegeometricmeanofthereferenceantigenforseriesscomparedtoitsgeometricmeaninseries1,thetitersweredeflated/inflatedoverthegeometric‐meantitersofseries1.Inouranalysis,thepandemic2009H1N1(H1‐09)antigenwaschosenasthereferenceantigenforcorrectingvarianceacrossslidesduetoitsstableperformanceovertime(TableS2).


5 / 30

TableS2:SummaryofISH2performanceagainstallantigens

Antigen

Series1 Series2 Series3

Mean Std Mean StdInflationoverSeries1

P‐value Mean StdInflationoverSeries1

P‐value

H1‐18 144.15 59.04 250.23 91.47 1.74 0 149.35 48.56 1.04 0.76675

H1‐77 762.14 388.88 1091 514.66 1.43 0 905.99 312.2 1.19 0.00031

H1‐99 287 156.21 307.74 121.12 1.07 0.01314 190.88 39.08 0.67 0.00491

H1‐07 247 110.69 325.42 119.93 1.32 0 483.18 179.65 1.96 0

H1‐09 158.91 44.76 154.15 89.66 0.97 0.48132 201.72 54.78 1.27 0

H3‐68 348.95 145.17 542.08 103.93 1.55 0 458.56 87.03 1.31 0.00064

H3‐03 392.81 119.55 550.42 275.15 1.40 0 448.03 94.08 1.14 0

H3‐05 679.28 336.05 631.39 277.52 0.93 0.00109 777.45 209.43 1.14 0.01081

H3‐07 126.14 37.55 189.85 75.41 1.51 0 158.85 93.98 1.26 0

H3‐09 108.86 40.07 159.54 56.99 1.47 0 171 65.97 1.57 0

H3‐11 146.08 61.06 214.03 82.15 1.47 0 174.75 64.58 1.20 0

H5‐04 40.01 0.23 40.27 12.36 1.01 0.21794 NaN NaN NaN NaN

H5‐07 40.03 0.52 41.14 19.01 1.03 0.02606 NaN NaN NaN NaN

H5‐10 40.01 0.18 40.34 5.34 1.01 0.03549 40.33 24.04 1.01 0.21348

H7‐03 40.23 14.71 40.96 48.39 1.02 0.36462 40.22 2.85 1.00 0.83947

H9‐99 47.28 27.6 89.58 35.03 1.89 0 69.7 25.59 1.47 0

Mathematically, the standardization procedure looks as follows. The within‐seriescorrectionfactorornormalizationfactorforsamplesonslideiandserieskiscalculatedas:

, ,

, , ,

inwhich , , istheH1‐09titerofthepositivecontrolonslidei(assumingitisinseries

k);and , , isthegeometricmeanofISH2titersfortheH1‐09antigeninseriesk.

Hence,thenormalizationfactoristhesameforallantigens andallsamplessonthesameslide; and it serves as the rejection criterion to discard slideswhoseH1‐09 titers of thepositivecontroldifferbymorethan1.2dilutions(log2‐units)awayfrom , , .

To correct for the inter‐series variation, themean titer of antigen in seriesk isinflated/deflatedoveritscorrespondingmeanfromseries1:

′ , ,

, ,

Theindividualtiterofantigen forallsamplessonslideiinserieskisthencorrectedas:


6 / 30

∙ ′ ∙ _

2. Mappingluminescencevaluestotitersusingluminescencevaluesattwodilutions

Afteranalyzingabout19,000humanseraatfour‐folddilutions(~300,000sample‐antigen combinations), we began a process of mapping luminescence series to titers todetermineiffewerluminescencepointscouldbeusedfortiterinference.Thiswasdonetoreducethenumberofdilutionsdonepersampleinordertoreducecost.

FigureS2showstherelationshipbetweenluminescencescoresatsingledilutionsandthetiterobtainedfromafour‐folddilution.Clearly,asingledilutionisnotsufficienttobeusedasaproxyforafour‐foldtiter,asthemajorityofmappingsfromsingledilutionstotiterswouldresultinmappedtitersthatareoneortwologunitsdistantfromthetruetiter.Thesingle dilution at 20 is especially uninformative, as 80% of the luminescence scores atdilution1:20aresaturatedat65535(FigureS2,top‐leftpanel).Thus,weconsideredthreedilutionpairsasthebestcandidatesformappingluminescencepairstotruetiters.Thethreepairsconsideredwere1:80and1:320,1:80and1:1280,and1:320and1:1280.Theobjectivewastodefineamappingfunctionf(L1,L2)=T,whereTisthetruetiterobtainedfromfour‐folddilutionsandL1andL2areluminescencescoresattwodifferentdilutions.Fourdifferentfunctionalformsoffwereinvestigated:

1.asimplegridapproachwheretheL1×L2spacewasdividedintogridsquaresofsizen×nluminescencepointsoneachside.Valuesofn=100,200,500,and1000weretried.TheexactL1×L2spaceusedwas[3000,65536]×[3000,65536].Themediantiterwasusedasthemappedtiterpointineachgridsquare.Thisfunctiongavepoormappingpoweratthetwoends,titersbelow200andabove1000.Thisqualityofthemappingdidnotimprovewithsmallergridsize.

2. a piecewise linear function f (L1,L2 ) = 65355

65355 with the breaks

occurringattheluminescencepointswherethetitersachievetheirboundaryvalues;i.e.titerof1810wasassignedforL2>34268.3.alinearsplinesurfaceplotoff(L1,L2)usingalinearinterpolationmethodwiththeMATLABfamilyoffunctionsfitobject.

4.acubicsplinesurfaceplotoff(L1,L2)usingcubicsplineinterpolationmethodwiththeMATLABfamilyoffunctionsfitobject.

Functionalform4waschosenasithadthebestaccuracyacrossalltiterrangesaboveatiterof150,althoughfunctionalform3hadsimilaraccuracy.Noneofthemethodswereable to accurately map to titers below 150 when the smallest dilution used was 1:320.


7 / 30

However, the overall performance of the mapping was considerably poorer using the1:80/1:320and1:80/1:1280pairswhichiswhy1:320and1:1280werechosenasthetwomostinformativedilutionstouseforinferringtiters(seebelow).

FigureS2:Four‐foldtitervs.luminescencescoresateachofthefourdilutionsfrom1:20to1:1280

FigureS3showsthemapped2‐foldtitersusingfunctionalform4againstthetrue4‐foldtiters.Threedifferentdilutionpairsforthe2‐foldtitersareshowninthethreepanels.The1:80and1:320dilutionpairlostdiscriminatorypowerfortitervaluesabove700.The1:80and1:1280dilutionpair,ontheotherhand, failedtomaptiters in the intermediaterange[400,1000].Eventhoughthe1:320and1:1280dilutionpairdidnotmaptitersbelow200accurately,thisshouldnotaffecttheseroprevalenceestimatesbecausethemeanofthehighest titercomponent inthemaintextarehigherthan200forallage‐groups(Table1,main text). In otherwords, losing discriminatory power among the seronegative groupsshould not affect seroprevalence estimates. After the initial 19,000 serawere processedusing4‐foldtiters,subsequentserawereprocessedatdilutions1:320and1:1280only(withtheexceptionofHoChiMinhCitysamples,whichwerealwaysprocessed4‐fold).Toensureaccurate standardization across slides, the positive control samples (ISH2) were alwaysprocessedatfourdilutions;andthestandardizationprocedurepresentedinsection1.2wasunchanged.


8 / 30

FigureS3:2‐foldtiterobtainedfromcubicsplinesurfacemapvs.4‐foldtiterfordifferentpairsofdilutions1:80and1:320,1:80and1:1280,and1:320and1:1280(topdown)


9 / 30

3. TiterReproducibilityThirty‐twosamplescollectedfromKhanhHoaweretested11timesbydifferentlab

techniciansondifferentdaystoassesstiterreproducibility.Acrossallsixteenantigensonthearray,98%ofthesample‐antigencombinationshadastandarddeviationoflog‐titerlessthanonewhenperformingreplicateexperiments(FigureS4,toppanel),makingPAtitersmorereproduciblecomparedtostandardHIassays. Wealso lookedatthereproducibilityof thepositivecontrol(ISH2);seeFigureS4,bottompanel. ThemediandifferencebetweenanindividualISH2titerandthegeometricmeanofall(n=1457)ISH2titerswasbelow0.5.

FigureS4:Top:Boxplotsofstandarddeviationsoflog2‐titersforelevenhumanantigens(x‐axis)forelevenreplicatesof32differentsamplesfromKhanhHoa.Inotherwords,eachboxplot summarizes 32 standard deviations. Bottom: Differences between individual ISH2titersandthegeometricmeanoftheISH2for1457replicateson11humanantigens(logtiterscale).Inbothplots,theredlineisthemediandifference;andtopandbottomedgesare75thand25thpercentileforeachbox,respectively.Thewhiskersextendtoq3+1.5*(q3–q1)andtoq1–1.5(q3–q1),whereq1andq3arethe25thand75thpercentiles.Outliersareplottedinreddots.


10 / 30

4. Representativenessofsamplecollectionversuscensusdata

FigureS5:Collectedsamplepopulation(blackdots)versusVietnamnationalhousingcensusin2009(graybars)fordifferentsites.They‐axisshowsdifferentagegroupsandx‐axisisthepercentageofthepopulationorsampleinthatagegroup.Leftcolumnsaredataformalesandrightcolumnsforfemales.

The 0‐4 yrs‐old age group was clearly oversampled for Ho Chi Minh City, and

undersampledforDakLak,whichmighthavebeencausedbydifferenthealthcaresystemsinthesetworegionscausingpediatricadmissionstobeover/under‐representedincertainhospitals.HoChiMinhCityisadenselypopulatedcitywitheasyaccesstomanylevelsofhealthcare,whereasDakLakProvincialHospital is inasmallcity inaruralregionof theVietnamesehighlands,withalowersocioeconomiclevelandlessaccesstohealthcare.Forsomeoftheagegroups,thesamplewasveryclosetothatofthedata(olderthan40,bothgenders,Hue;olderthan50,bothgenders,HoChiMinhCityandKhanhHoa;olderthan60,bothgenders,Dak‐Lak).Forthe5‐20agegroup,allofthesitesareundersampledforbothgenders,whichistobeexpectedastheseindividualsaretheleastlikelytoreporttohospitals.Overall,theserumsamplesethadanagestructuresimilartothetruedemography;thus,thesampleweightsincludedinthelikelihoodfunctionintroducedonlyminoradjustments.

Thesampleweightssi inthelikelihoodequationswerecalculatedfromthesedata.Forexample,thefractionofmalesinthe0‐5year‐oldagerangeinthecensusdataforHCMCis0.0364whilethisfractionis0.1241forthesampleset.Thus,everysamplethatwasmale


11 / 30

andbelongedtothe0‐5agecategoryinthesampleshouldbedown‐weightedbyafactorofsi=0.0364/0.1241=0.2933.


12 / 30

5. ChoosingabinwidthfordiscretizingthenormaldistributionWhen computing the likelihood function, the normal distribution needed to be

discretizedsothata‘zero‐inflated’normaldistributioncouldbeconstructed(log‐titer=1wastheinflatedprobabilitymass).Aprobabilitymassandcontinuousprobabilityfunctioncannotbecombinedintoasinglelikelihoodfunction.

Thenormalmixturewasdefinedonthelog‐titerrange[1,7],andsevendifferentbinwidthsof1.000,0.500,0.100,0.050,0.010.0.005,and0.001were tested in this range toinvestigatethebehaviorofthesevendifferentprobabilitymassfunctionsinthelikelihoodoptimizationalgorithm(TableS3).Forasinglenormaldistributionfit,theoptimumwasthesameforallbinwidths.Inthecaseoftwoandthreecomponentsinamixturedistribution,therewerefewchangesinthefittingresults,unlessthebinwidthwaslessthanorequalto0.100.Italsowasimportanttoavoidusingtoosmallofabinwidthtoavoidtheproblemofhaving“aspike”inthefitteddistributionthathadnoepidemiologicalmeaning.Abinwidthof0.100wasusedintheanalysis.

TableS3:LLHoptimizationresultsfordifferentbinwidths

Interval

Width w0 w1 μ1 σ1 w2 μ2 σ2 w3 μ3 σ3 SumNegLLH

1.000 0.228 0.772 3.945 1.551 38357.07

0.500 0.228 0.772 3.946 1.551 48240.31

0.100 0.228 0.772 3.950 1.551 71504.04

0.050 0.228 0.772 3.950 1.551 81535.65

0.010 0.228 0.772 3.949 1.551 104838.69

0.005 0.228 0.772 3.949 1.551 114876.59

0.001 0.228 0.772 3.949 1.551 138181.93

1.000 0.228 0.150 2.063 0.234 0.622 4.387 1.386 37870.75

0.500 0.228 0.211 2.443 0.686 0.561 4.509 1.398 47830.31

0.100 0.228 0.218 2.462 0.698 0.555 4.532 1.393 71076.04

0.050 0.228 0.218 2.462 0.697 0.555 4.532 1.392 81105.93

0.010 0.228 0.218 2.462 0.697 0.555 4.531 1.393 104410.41

0.005 0.228 0.218 2.462 0.697 0.555 4.532 1.393 114448.07

0.001 0.228 0.218 2.462 0.697 0.555 4.532 1.393 137753.27

1.000 0.228 0.205 2.017 0.059 0.377 4.026 0.794 0.190 5.765 1.207 37634.10

0.500 0.228 0.379 2.735 0.812 0.151 4.640 0.426 0.242 5.422 1.352 47639.60

0.100 0.228 0.396 2.771 0.830 0.159 4.684 0.434 0.217 5.571 1.298 70856.20

0.050 0.228 0.398 2.774 0.830 0.161 4.685 0.433 0.214 5.594 1.288 80881.55

0.010 0.228 0.398 2.774 0.830 0.161 4.684 0.434 0.214 5.591 1.290 104186.77

0.005 0.228 0.398 2.774 0.830 0.161 4.684 0.435 0.213 5.597 1.287 114224.69

0.001 0.228 0.398 2.774 0.830 0.161 4.684 0.435 0.213 5.596 1.288 137529.83


13 / 30

6. Comparisonsofgamma,Weibull,andnormalfits Normaldistributionswerechosenfortheformofthemixturecomponents,asthisisthe traditional distributional form chosen for mixture distribution fits of continuousquantities when no other information is known or assumed about the data points.ComparisonsweredonewithmixturesofWeibullandgammadistributions,althoughnotethatbothWeibullandgammadistributionsallowforsymmetryandrightskew,butnotforleft skew. In choosing a distributional form, when all other considerations are equal, anormaldistribution formshouldbechosenoverWeibullorgamma forms toprevent fitswhere rights skews are “discovered” by the fitting process but left skews remainundiscovered. In the figuresbelow, the four‐component fits forH1N1have the lowestBICvaluewhennormaldistributionsareusedascomponents.Forthree‐componentfitsforH3N2,thegamma‐fitshaveasomewhatlowerBICvaluethanthenormalfits(ΔBIC≈19),butitisalsoclear that the component structure in this fit does not distinguish titer groups. NormalcomponentshadbetterBICthanWeibullcomponentsforbothH1N1andH3N2. Basedon(1)ourinitialconsiderationsofskewnessinthefits,(2)theBICvalueswhencomparingthefitsforbothsubtypes,and(3)ourvisualinspectionofthefits,weconcludedthat we did not have any evidential support for using gamma distributions or Weibulldistributionsinthemixtureanalysis.

Figure S6: H1N1 titers, all sites all ages. Mixture fits were performed with three differentdistributionalforms,showninthethreecolumns.Thefirstrowoffiguresshowsfitswithasinglemixturecomponent,thesecondrowof figuresshowsfitswithtwomixturecomponents,etc. Thenumberinthetop‐rightcornerofeachpanelistheBICvalueforthatmodelfit.


14 / 30

Figure S7: H3N2 titers, all sites all ages. Mixture fits were performed with three differentdistributionalforms,showninthethreecolumns.Thefirstrowoffiguresshowsfitswithasinglemixturecomponent,thesecondrowof figuresshowsfitswithtwomixturecomponents,etc. Thenumberinthetop‐rightcornerofeachpanelistheBICvalueforthatmodelfit.


15 / 30

7. Confidenceintervalsformixturecomponentestimates Here,wepresentconfidence intervals forall themixturecomponentsandweightsdisplayedinFigure3ofthemaintext(H1N1).

TableS4:Estimateswith95%confidenceintervalsinparentheses,formean,variance,andweightparameters,for1‐componentto4‐componentmixturemodelsofH1N1titers.


16 / 30

Andbelow,weshowthesameconfidenceintervalsforH3N2.

TableS5:Estimateswith95%confidenceintervalsinparentheses,formean,variance,andweightparameters,for1‐componentto4‐componentmixturemodelsofH3N2titers.


17 / 30

8. Confidenceintervalsforpandemicattackrate InthefourthparagraphoftheResultssectionofthemainpaper,forsubtypeH1N1,wepresentevidenceshowingthattheinterpretationofw4asthefractionofthepopulationthatwasrecentlyinfected,andtheinterpretationofw3asthefractionofthepopulationthatwasnon‐recently infected,areconsistentwithpost‐pandemicobservations inVietnamin2010.TableS6:Estimatesandconfidenceintervals forweightparameters inpost‐pandemic(JanuarytoJune2010)analysis.

AgeGroup NumberofSamples w4(95%CI) w3(95%CI)

0.5–9 161 0.138(0.117–0.169) 0.160(0.146–0.174)

10–19 355 0.226(0.200–0.268) 0.216(0.182–0.234)

20–44 871 0.085(0.070–0.108) 0.230(0.227–0.393)

≥45 439 0.157(0.135–0.205) 0.196(0.169–0.221)

The confidence intervals for these ‘recent attack rates’ and ‘non‐recent attack rates’ arepresentedabove,andthew4columnisconsistentwithwhathasbeenmeasuredforthe2009pandemicinVietnamandotherpartsofAsia. Notethatsomeoftheconfidenceintervalsarequitenon‐symmetric.Allconfidenceintervals were checked manually to ensure that the appropriate weight parameter wasassociated with the appropriate mean (category) during the optimization and profilingprocess.


18 / 30

9. AdditionalFiguresandTables


19 / 30

FigureS8:Top:Quantileplotshowingthe25th,50th,75th,80th,85th,90th,95th,and99thpercentilesoflog‐titerfor2009H1N1(log‐titer onx‐axis) for the aggregateddata and the individual collection sites (y‐axis).Bottom: quantile‐quantile plots forpairwisecomparisonamongthefourcollectionsites(allagegroupsincluded;notre‐weightedbyprovincepopulation).Whenthetwodatasetscontaindifferentnumbersofsamples,theplottingquantileswerecalculatedbasedonthesmallerdataset.Bluelinesaretheextrapolatedstraightlineconnectingthefirstandthirdquartilesofthesmallerdataset.


20 / 30

FigureS9:Top:Quantileplotshowingthe25th,50th,75th,80th,85th,90th,95th,and99thpercentilesoflog‐titerofH3N2(geometricmeantiterof2009and2011strains).Log‐titerisshownonthex‐axisandcollectionsitesareshownonthey‐axis.Bottom:quantile‐quantileplotsforpairwisecomparisonamongthefourcollectionsites(allagegroupsincluded;notre‐weightedbyprovincepopulation).Whenthetwodatasetscontaindifferentnumbersofsamples,theplottingquantileswerecalculatedbasedonthesmallerdataset.Bluelinesaretheextrapolatedstraightlineconnectingthefirstandthirdquartilesofthesmallerdataset.


21 / 30

FigureS10:HistogramsofantibodytiterstoH3N2(GMTof2009and2011strains),showingfitresultsformixturemodelswithdifferentnumberofnormalcomponents(toptobottom,thelabeltotheleftofthey‐axisisthenumberofmixturecomponents)andgroupedbycollectionsites(lefttoright).HistogramsareweightedtoadjustforageandgenderaccordingtotheVietnamnationalhousingcensusin2009foreachofthefourcollectionsites.Thebluelinesineachpanelarethenormalizedprobabilitydensityfunctionsofthecomponentdistributionswhoseintensitywasrankedinorderofincreasingμ.Theblacklinesshowthefullmixturedistributiondensity,andtheblackdotsaretheestimatedcumulativedistributionofthemixturemodelsat7.0(titerof1280).ThenumbersintheupperrightcornerofeachpanelaretheBICscoresofthemodelfits.


22 / 30

FigureS11:VisualizationofmodelselectionprocessforH3N2titerdistributionsfromFigureS8. They‐axesshowthefittedvaluesofwi (mixtureweights), μi(means), andσi (standarddeviations).Components’ shadeswere ranked from lightest todarkestinincreasingorderofμ.Inthetoppanel,the“0thcomponent”representsthepointmassw0placedat20fortitersbelowthelowerdetectionlimitof20.Notethatforfiveorsixcomponents,inmanycases,theweightsorstandarddeviationparametersareclosetozero;forsomecases,twooftheinferredmeanparametersareveryclosetoeachother.


23 / 30

FigureS12:TiterhistogramsforH3N2andfitresultsformixturemodelswithdifferentnumberofcomponents(labelontheleftis thenumberofmixturecomponents)andgroupedbydifferentagegroups.Histogramsareweightedtoadjust forageandgenderaccordingtotheVietnamnationalhousingcensusin2009.ThenumbersintheupperrightcornerofeachpanelarethefittedBICscoresoftherespectivemodel.Foreachpanel,thebluelinesarethenormalizedprobabilitydensityofthecomponentdistributionswhose intensity fromlightest todarkestwasrankedinorderof increasingμ.Black linesarethetotalmixturedistributiondensity;andtheblackdotsareestimatedprobabilityweightofthemixturemodelfortiters≥7.0.


24 / 30

FigureS13:Antibodytiterhistogramsforn=20,152individuals,plottedforH1N1strains(toppanels)andH3N2strains(bottompanels).Thefractionsofindividualswithtitersbelowthedetectionlimitof20andabove1280thatwereoutoftheplottingrangesaregivennexttotherespectivebar.HistogramswereweightedtoadjustforageandgenderaccordingtotheVietnamnationalhousingcensusin2009forthefourcollectionsites.


25 / 30

FigureS14:SimilartoFigure5inthemaintext,butagegroupingsaredifferent.Individualsinthe0.5‐49agegroupwerebornin1960or laterandwouldhavebeenexposed to two lineagesofH1N1: the1977 lineageand the2009pandemic lineage.Individualsinthe60+agegroupwerebornin1953orearlierandarelikelytohavebeenexposedtothreeH1N1lineages:the1918‐1957lineage,the1977lineage,andthe2009pandemiclineage.TheBICcomparisons(seealsoTableS6)showsthat3componentsisthebestfit for60+andthat4componentsisthebestfitfor0.5‐49. However,thedifferencebetweenthe3‐componentfitsand4‐componentfitshere–as inallothergroupingsthathavebeenlookedat– isthatthe4‐componentfitincludesanextrasmallpeakintheseronegativetiterrange,coveringtitersbetween20and30.Theredoesnotappeartobeareasonwhythedifferentlineage‐exposurehistorywouldproducethisextrapeakforseronegativeindividuals.


26 / 30

FigureS15:Anadditionalvisualizationofagestructureinthetiterdistribution.Here,thefour‐componentmodelisshownforH1N1 and the three‐componentmodel is shown for H3N2. The individual histogram bins (0.1width on a log2‐scale) aredisplayedasastackedbargraphwithdifferentagegroupscorrespondingtodifferentcolors.NotethatforH1N1,theleft‐handbaratalog‐titerofoneisshownonaseparatescale.Therearenoclearagepatternsinthesetiterdistributionssuggestingthatcertainmixturecomponentscorrespondtocertainagegroups.


27 / 30

FigureS16:AsFigure2inthemaintext,butherefourrandomsub‐samplesweredrawn(withreplacement)accordingtotheage‐groupproportionsandagebandsinthegeneralpopulation,asshowninFigureS5. Fourorfivecomponentsisthebestexplanatorymodel,accordingtoBIC. However,thecomponentmeans,variances,andweightsarenotthesameacrosssub‐sampleswhenfivecomponentsareused.


28 / 30

TableS7:ChangeinBICscoresasthenumberofnormaldistributionsinthemixtureincreasefromonetosixforH3N2,fortheaggregateddataaswellastheindividualcollectionsites.Thevaluesofthenegativelog‐likelihoodwereweightedtoadjustforageandgenderaccordingtotheVietnamnationalhousingcensusin2009.Thefirstrowshowstheexactvaluesofthenegativelog‐likelihood.BoldnumbersrepresentthemixtureswiththebestBIC.NB:EventhoughtheBICisdesignedtotakesamplesizeintoaccountinselectingthebestfittedmodels,thelargenumberofdatapointsintheaggregatedata(~20,000)stillcausedtheBICtoselectsixcomponentsasthebestmodelforH1N1(Figure2)andfourcomponentsasthebestmodelforH3N2(FigureS8).Thus,BICisimperfectandcannotbeusedasthesolecriterionformodelselection.

ALL SITES HUE KHANH HOA HCMC DAK LAK

( N = 19785) (N = 4519) (N = 5384) (N = 5784) (N = 4134)

SumNegLLH (C =1) 78498.68 17790.08 21515.90 22757.09 16264.15

Δ in

num componen

ts

1 to 2 ‐533.59 ‐110.90 ‐77.42 ‐119.56 ‐153.85

2 to 3 ‐222.45 ‐103.79 ‐39.32 ‐33.62 ‐54.56

3 to 4 ‐29.26 5.47 1.52 ‐0.60 7.74

4 to 5 15.05 15.82 18.17 12.95 ‐2.89

5 to 6 14.88 16.15 5.95 21.13 23.64


29 / 30

TableS8:ChangeinBICscoresasthenumberofnormaldistributionsinthemixtureincreasefromonetosixforH3N2,fordifferentagegroupsasrecommendedbyCONCISE.Thevaluesofthenegativelog‐likelihoodwereweightedtoadjustforageandgenderaccordingtotheVietnamnationalhousingcensusin2009.Thefirstrowshowstheexactvaluesofthenegativelog‐likelihood.BoldnumbersrepresentthemixtureswiththebestBIC.

ALL AGES 0‐9 Yrs‐Old 10‐19 Yrs‐Old 20‐44 Yrs‐Old 45+ Yrs‐Old

( N = 19785) ( N = 3031) ( N = 2922) ( N = 8353) ( N = 5479)

SumNegLLH (C=1) 78498.68 11518.25 15750.62 33377.53 16966.21

Δ in

num componen

ts

1 to 2 ‐533.59 ‐26.27 ‐81.21 ‐222.69 ‐145.63

2 to 3 ‐222.45 6.62 ‐22.60 ‐83.06 ‐82.27

3 to 4 ‐29.26 16.77 ‐10.09 8.94 ‐5.55

4 to 5 15.05 ‐6.69 10.78 21.10 22.92

5 to 6 14.88 ‐1.14 23.64 22.79 25.97


30 / 30

TableS9:ChangeinBICscoresasthenumberofnormaldistributionsinthemixtureincreasefromonetosixforH1N1,fordifferentagegroupsasrecommendedbyCONCISE.Thevaluesofthenegativelog‐likelihoodwereweightedtoadjustforageandgenderaccordingtotheVietnamnationalhousingcensusin2009.Thefirstrowshowstheexactvaluesofthenegativelog‐likelihood.BoldnumbersrepresentthemixtureswiththebestBIC.

ALL AGES 0‐9 Yrs‐Old 10‐19 Yrs‐Old 20‐44 Yrs‐Old 45+ Yrs‐Old 0‐49 Yrs‐Old 60+ Yrs‐Old

( N = 19335) ( N = 2988 ) ( N = 2876 ) ( N = 8116 ) ( N = 5355 ) ( N = 15168 ) ( N = 2152 )

SumNegLLH (C=1) 71504.04 9951.65 14564.33 30125.89 16100.72 49844.40 5757.17

Δ in

num componen

ts

1 to 2 ‐1670.41 ‐86.24 ‐77.86 ‐345.89 ‐238.01 ‐547.01 ‐101.08

2 to 3 ‐837.75 ‐46.85 ‐58.34 ‐206.63 ‐90.59 ‐276.79 ‐37.61

3 to 4 ‐653.85 ‐24.88 ‐50.71 ‐134.61 ‐58.24 ‐208.20 ‐3.98

4 to 5 ‐57.35 ‐7.33 4.76 ‐7.33 13.75 ‐19.98 18.09

5 to 6 ‐72.45 11.20 ‐0.88 30.17 17.70 ‐8.14 20.63

nguyen thi duy nhat, stacy todd, erwin de bruin, tran …10.1038...nguyen thi duy nhat, stacy todd,...

Documents