doctoral dissertation optimisation of statistical procedures … · vical cancer screening in the...

136
DOCTORAL DISSERTATION Optimisation of statistical procedures to assess the diagnostic accuracy of cervical cancer screening tests Doctoral dissertation submitted to obtain the degree of Doctor of Sciences: Statistics, to be defended by Promoter: Prof. Dr Marc Aerts | UHasselt Co-promoter: Dr Marc Arbyn | Wetenschappelijk Instituut Volksgezondheid 2017 | Faculty of Sciences D/2017/2451/57 Victoria Nyawira Nyaga

Upload: others

Post on 19-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

DOCTORAL DISSERTATION

Optimisation of statistical procedures to assess the diagnostic accuracy of cervical cancer screening tests

Doctoral dissertation submitted to obtain the degree of Doctor of Sciences: Statistics, to be defended by

Promoter: Prof. Dr Marc Aerts | UHasselt Co-promoter: Dr Marc Arbyn | Wetenschappelijk Instituut Volksgezondheid

2017 | Faculty of Sciences

D/2017/2451/57

Victoria Nyawira Nyaga

Page 2: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 3: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

To my daugther Noni.Apparently, thanks to you the architecture of my brain has changed for good. Thevolume of my gray matter has also substantially reduced thereby maturing further

my neural network sub-serving social cognition [1].

Page 4: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 5: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Acknowledgments

The work presented in this thesis is a result of collaborative efforts between Hasseltuniversity (UHasselt) and Scientific Institute of Public Health (ISP). Through thiscollaboration, I have interacted with statistician and non-statistician professionalsconsequently gaining experience from academia as well as industry.

First and foremost, I am very honored to have had the supervision of Prof. MarcAerts (UHasselt). Despite being busy, he still managed to set aside time for stimu-lating discussions, most of them over skype, whenever I needed his valuable adviceand guidance on the theoretical part of this thesis. I will be forever grateful for hissuggestions, contributions and immense knowledge on which the successful com-pletion of my Ph.D rest.

I am also very grateful to Dr. Marc Arbyn (ISP), my co-supervisor and at thesame time my immediate supervisor. Using data which he has collected over thepast years on cervical cancer precursors screening, diagnosis and treatment, our re-search was translated from theory to practice. Being an epidemiologist and a clinicaldoctor, he was an important part of the bridge between mathematical models de-veloped in this thesis and public health. His expertise was invaluable in suggestingpossible application areas of our work. His persistent support and sincerity offeredme an environment to experiment within safe borders. By engaging me in other‘side’ projects I have learnt how to multi-task and collaborate with non-statisticians.

Thanks to my colleagues at ISP who were very friendly and kind. You werefantastic colleagues and despite coming from different parts of the world, you havebecome family to me. I have immensely enjoyed the company of Renata and Martinand wish them well in their plans to move to Czech in the near future. Frank andLan, thanks for letting me have a taste of China. I wish you too success and all thebest in your plans to move back to China.

I would like to thank the support staff in ISP for their logistical support when-ever I had to travel to UHasselt for activities related to doctoral school or to nation-ally/internationally organized scientific meeting and trainings. I also acknowledgethe staff at UHasselt for their involvement in my doctoral study in general and helpcompleting the necessary formalities for my defense.

Special gratitude to my partner, Dries Nollet. Coincidentally, I got to know youjust when I was beginning my doctoral study and since then I have learnt so muchabout Europe and travelled to places I never ever imagined or knew of. My viewand knowledge of the world has since been exponentially amplified. You have beenthere during the happy and difficult moments and have made the journey of myPh.D memorable. Your family is very open, kind and welcoming. They have reallyhelped me integrate into the Belgian culture and made ‘Belgium’ feel warm evenduring winter.

Page 6: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

I am also very grateful to my family in Kenya. I can never find enough words tosay how grateful I am to my mother and father. They always pushed me not only todream more but also to do more in order to realize those dreams. Though very far, Ialways feel their trust, warmth and love. I am who I am because of their upbringing,persistent support and guidance. Thank you to my grandfather and grandmother(whom I am named after) for their endless concern and steadfast encouragement.

Finally, I take this opportunity to thank everyone else not mentioned in this ac-knowledgment who has contributed theoretically or otherwise to the success of mydoctoral study.

vi

Page 7: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

List of Publications

Manuscripts

This thesis corresponds to a collection of the following original publications.

1. Chapter 2: Nyaga VN, Arbyn M and Aerts M. Metaprop: a Stata commandto perform meta-analysis of binomial data. Archives of Public Health, 72(1):39,2014.

2. Chapter 3: Nyaga VN, Arbyn M and Aerts M. CopulaDTA: Copula based bi-variate beta-binomial models for diagnostic test accuracy studies in a Bayesianframework. Journal of Statistical Software, 2015. Conditionally accepted for pub-lication.

3. Chapter 4: Nyaga VN, Aerts M and Arbyn M. ANOVA model for networkmeta-analysis of diagnostic test accuracy data. Statistical Methods in MedicalResearch, 2016. First published Sep 20, 2016; DOI: 10.1177/0962280216669182

4. Chapter 5: Nyaga VN, Arbyn M and Aerts M. Beta-binomial analysis of vari-ance model for network meta-analysis of diagnostic test accuracy data. Sta-tistical Methods in Medical Research, 2016. First published Jan 1, 2016; DOI:10.1177/0962280216682532

Contributed manuscripts

Koliopoulos G, Nyaga VN, Santesso N, et al. Cytology versus HPV testing for cer-vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug 10, 2017; DOI: 10.1002/14651858.CD008587.pub2.

Software development

As part of the doctoral project, the following software have been developed with theaim to provide tools for meta-analysis to the scientific community.

1. Nyaga VN, Arbyn M and Aerts M. METAPROP: Stata module to perform fixedand random effects meta-analysis of proportions. 2017.https://ideas.repec.org/c/boc/bocode/s457781.html

2. Nyaga VN, Arbyn M and Aerts M. METAPROP ONE: Stata module to performfixed and random effects meta-analysis of proportions. 2017.https://ideas.repec.org/c/boc/bocode/s457861.html

Page 8: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

3. Nyaga VN. CopulaDTA: Copula based bivariate beta-binomial model for di-agnostic test accuracy studies. R package version 0.0.5, 2017.https://cran.r-project.org/package=CopulaDTA

4. Nyaga VN. MADAREG: Meta-analysis and meta-regression of diagnostic ac-curacy studies in SAS. 2017. https://github.com/VNyaga/Madareg

5. Nyaga VN. NMADAS: Network meta-analysis of diagnostic test accuracy stud-ies. R package version 0.0.1, 2017. https://github.com/VNyaga/NMADAS

viii

Page 9: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

List of Abbreviations

AB Arm-BasedABC Approximate Bayesian ComputationAIC Akaike Information CriterionANOVA ANalysis Of VArianceASC-US Atypical Squamous Cells of Undetermined SignificanceBRMA Bivariate Random-effects Meta-AnalysisCB Contrast-BasedCC Conventional CytologyCCD Central Composite DesignCIN Cervical Intraepithelial NeoplasiaCIN2+ Cervical Intraepithelial Neoplasia lesion of grade 2 or worseCIN3+ Cervical Intraepithelial Neoplasia lesion of grade 3 or worseCRAN Comprehensive R Archive NetworkDIC Deviance Information CriterionDLL Dynamic Link LibraryDNA DeoxyriboNucleic AcidDOR Diagnostic Odds RatioESS Effective Sample SizeFGM Farlie-Gumbel-MorgensternFPR False Positive RateGLMM Generalized Linear Mixed ModelHC2 Hybrid Capture 2HMC Harmilton Monte CarloHPV Human Papilloma VirusHSROC Hierarchical Summary ROCINLA Integrated Nested Laplace ApproximationIPD Individual Patient DataISP Scientific Institute of Public HealthITT Intention To TreatJAGS Just Another Gibbs SamplerLBC Liquid Based CytologyLCM Latent Class ModelLKJ Lewandowski Kurowicka and Joe [2]LR - Negative Likelihood RatioLR + Positive Likelihood RatioLSIL Low-grade Squamous Intraepithelial LesionLSM Least Squares MethodMA Meta-Analysis

Page 10: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

MAR Missing At RandomMCAR Missing Completely At RandomMCMC Markov Chain Monte CarloML Maximum LikelihoodMOM Method Of MomentsNMA Network Meta-AnalysisNPV Negative Predictive ValueNUTS No-U-Turn SamplerOR Odds RatioOSPADAC Optimisation of Statistical Procedures to

Assess the Diagnostic Accuracy of Cervical cancer screening testsPPV Positive Predictive ValueREML REstricted Maximum LikelihoodRNA RiboNucleic AcidTPR True Positive RateWAIC Watanabe-Alkaike Information Criterion

x

Page 11: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Contents

Page

List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Part I General Introduction 2

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Meta-analysis of Diagnostic Accuracy Studies . . . . . . . . . . . . . . . 3

1.1.1 Fixed-effect model . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 Random-effects model . . . . . . . . . . . . . . . . . . . . . . . . 41.1.3 Bivariate beta distribution . . . . . . . . . . . . . . . . . . . . . . 61.1.4 Estimation procedures . . . . . . . . . . . . . . . . . . . . . . . . 71.1.5 Meta-regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Network Meta-analysis of Diagnostic Accuracy Studies . . . . . . . . . 71.2.1 Contrast-based models . . . . . . . . . . . . . . . . . . . . . . . . 81.2.2 Arm-based models . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Inference Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.1 Frequentist inference . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Burden of Cervical Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5 The OSPADAC Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Part II Optimisation of Statistical Procedures to Assess the Diagnostic Accuracyof Cervical Cancer Screening Tests 14

2. Meta-analysis of Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5 Software Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Page 12: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

3. Copula Based Bivariate Beta-Binomial Models for Diagnostic Test AccuracyStudies in a Bayesian Framework . . . . . . . . . . . . . . . . . . . . . . . . . 283.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Statistical Methods for Meta-analysis . . . . . . . . . . . . . . . . . . . . 32

3.3.1 Definition of copula function . . . . . . . . . . . . . . . . . . . . 323.3.2 The hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . 333.3.3 Bivariate Gaussian copula . . . . . . . . . . . . . . . . . . . . . . 343.3.4 Frank copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.5 Farlie-Gumbel-Morgenstern copula . . . . . . . . . . . . . . . . 373.3.6 Clayton copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Software Development and Model Diagnostics . . . . . . . . . . . . . . 383.4.1 The CopulaDTA package . . . . . . . . . . . . . . . . . . . . . . 383.4.2 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4.3 Model comparison and selection . . . . . . . . . . . . . . . . . . 39

3.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5.1 Telomerase data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5.2 ASCUS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.6 The Intercept Only Model . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6.1 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.7 Meta-regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4. ANOVA Model for Network Meta-analysis of Diagnostic Test Accuracy Data 564.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3 Motivating Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4.1 Contrast-based model . . . . . . . . . . . . . . . . . . . . . . . . 624.4.2 Arm-based model . . . . . . . . . . . . . . . . . . . . . . . . . . 644.4.3 Ranking of the tests . . . . . . . . . . . . . . . . . . . . . . . . . . 664.4.4 Missing data and exchangeability . . . . . . . . . . . . . . . . . 674.4.5 Prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.5.1 All available data . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.5.2 AB versus CB model . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5. Beta-binomial Analysis of Variance Model for Network Meta-analysis of Di-agnostic Test Accuracy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.1 Copula function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

xii

Page 13: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

5.4.2 Overdispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4.3 Missing data and exchangeability assumption . . . . . . . . . . 935.4.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.5.1 Correlation and overdispersion . . . . . . . . . . . . . . . . . . . 955.5.2 Bivariate beta-binomial versus bivariate logistic-binomial dis-

tribution versus univariate beta-binomial . . . . . . . . . . . . . 975.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Part III Discussion 104

6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.1 Further Development and Research . . . . . . . . . . . . . . . . . . . . 1096.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.3 Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xiii

Page 14: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 15: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Part I

General Introduction

Page 16: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

1. Introduction

A diagnostic test discriminates patients with or without a disease or health condi-tion. While most diagnostic tests will not distinguish the healthy from the sick withperfect certainty, an accurate clinical diagnosis is critical in choosing proper man-agement and treatment of a patient. Diagnostic accuracy studies compare a set ofdiagnostic tests against a standard reference and provide vital information in evalu-ating the accuracy and performance of new and alternative diagnostic tests. Ideally,the standard reference test should be perfect. Nevertheless, imperfect standards areoften used in practice whenever the absolute certainty of a patients disease status isunknown, the best diagnostic technique is very expensive or technically impossible,the perfect diagnostic test would expose a patient to unacceptable risk [3].

Data from a diagnostic studies are usually summarized as a 2 x 2 cross-tabulationof index versus reference test results (see Tab 1.1). The results are classified as truepositive, false positive, true negative and false negative. The frequencies in the fourdata cells are then used to compute several diagnostic accuracy measures.

The most basic and widely used measure of a test performance is a bivariateoutcome consisting of sensitivity (true positive/(true positive + false negative)) andspecificity (true negative / (true negative + false positive)) at a defined test cut-off.

In probabilistic terms sensitivity is the probability of a positive test result givenpresence of disease and relates to the ability of a test to recognize the diseased. Onthe other hand, specificity is the probability of a negative test given absence of dis-ease and relates to the ability of the test to identify the healthy individuals [4]. Sen-sitivity is also known as true positive rate (TPR) and 1 - specificity as false positiverate (FPR).

Other diagnostic accuracy measures are the positive and negative likelihood ra-tios (LR+ and LR-), positive and negative predictive values (PPV and NPV), diag-nostic odds ratio (DOR), and the overall diagnostic accuracy.

In this thesis, we focus on sensitivity and specificity. This choice was motivatedby the following reasons. First, they are almost certainly presented as results in avast majority of diagnostic studies. Secondly, other measures including DOR andLR can be derived from the two outcomes. Thirdly, sensitivity and specificity are atthe heart of diagnostic theory and teaching and therefore most familiar measures toclinicians[5].

1.1 Meta-analysis of Diagnostic Accuracy Studies

Meta-analysis is a statistical procedure that pools, synthesizes and summarizes re-sults/evidence from separate but methodologically and epidemiologically similarstudies addressing the same research question. The analysis yields a more precise

Page 17: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Index test Disease status+ -

+ true positive false positive- false negative true negative

Tab. 1.1: Cross-tabulation of index test results by the reference test results.

‘weighted’ average of the individual studies. The analysis has more statistical powerthan individual studies which might be small or large but with small number ofcases due to low prevalence of disease. Meta-analyses offer a quick and cost-effectivemethod to gather information for clinical decision making [6]. We make a distinctionbetween two types of models used for meta-analysis: the fixed- and random-effectsmodel.

1.1.1 Fixed-effect model

The fixed-effects model is fitted whenever the variation among studies can only beattributed to chance. In essence, the observed effect in each study is the sum of afixed effect common to all studies and a sampling error. This model assumes thatthe true study effect, mostly a weighted average with weights inverse to variabilitywithin the individual studies, is homogeneous across all studies. This strong as-sumption of homogeneity may overstate the significance of the global effect [7] andis often unrealistic[8] in diagnostic research because test accuracy studies are typi-cally clinically diverse. Furthermore, the model has been shown to underestimateuncertainty[9, 10]. Moreover, the inverse variance method relies on large sampleasymptotic sampling variances and therefore performs poorly when there are stud-ies with very low or high event rates or small sample sizes[11, 12].

1.1.2 Random-effects model

As the test threshold varies, an increased sensitivity is often associated with de-creased specificity. While it is almost always assumed that this association is neg-ative, it is not always the case in real datasets [13]. Differences due to chance, studydesign, selection of study patients, reading of the index and reference test, the afore-mentioned test threshold variation and other sources imply that there will be hetero-geneity within and between studies and correlation between sensitivity and speci-ficity. To appropriately handle correlation and heterogeneity, a bivariate random-effects model is used.

This models assumes that each study estimates its own but unknown study-effectwhich is a perturbation of the overall population effect [14]. Probabilistically, thetrue individual studies effects are randomly distributed around a mean which de-scribes the pooled/meta-analytic estimate. The effects are often thought of as ex-changeable and coming from a common distribution whose spread translates to thebetween-study heterogeneity beyond what is due to chance. In fact, the random-effects models reduces to the fixed-effects model when the extra variation is zero.Morel variability in the true study-specific parameters imply more uncertainty in

4

Page 18: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

the meta-analytic mean estimates. In general, the confidence interval for the meta-analytic mean estimates from the random-effects model are usually wider than fromthe fixed-effects model.

Since sensitivity and specificity are correlated, using a multi-/bi-variate random-effects has the following advantages over a univariate random-effects model. Themodel enables sharing of information resulting in more precise estimates, allowsfor calculation of joint confidence regions and functions of the pooled values, andfacilitates better predictions about true effects of a new study [15, 16]. The gain inprecision is even more pronounced when some data is missing at random [17].

Appropriate specification of the random-effects distribution is a challenging butessential phase in determining the quality of inference on regression parameters.Indeed, mis-specifying the distribution of the random-effects may not only leadto inefficiency in the prediction of random-effects and the estimation of other pa-rameters, but may also yield biased estimates and even obscure some of the effects[18, 19, 20, 21]. While there is generally little information from the data to guidethe choice of the random-effects distribution, the severity of the misspecification de-pends on the degree of departure from true distribution [19].

It is almost a convention to use the normal distribution to describe the distribu-tion of the monotonically transformed random-effects. This normality assumptionis perhaps adopted out of familiarity and software limitations. Nonetheless, the as-sumption relies on asymptotic and therefore is less valid especially when data issparse, when studies are small [22], when events are rare [23] or in presence of out-liers [24]. An appropriate and reasonable specification of the transforming functionto map sensitivity and specificity to the real line is a key step in the analysis espe-cially when data is sparse. It has been shown that the choice of the transformationusually has influence on the conclusion and interpretation of the results. Despite thepopularity of the logit transformation, the interpretation of the parameters in the lo-gistic regression in the odds or logit scale is seldom helpful since many researchersare not quite familiar and comfortable with the odds or logits.

When sensitivity or specificity is mostly on the parameter boundary (either mostly0 or 1), the estimate of the regression coefficients from the logistic regression areusually relatively uncertain [25]. Furthermore, the estimation of the between-studyvariance is problematic with small meta-analysis [26] and is often estimated as zero[10]. The above shortcomings of the normal distributions motivated our research onother distributions for the random-effects.

Sensitivity and specificity, and proportions in general are bounded, often asym-metric and skewed variables. A natural choice for such random-effects in the 0-1interval is the beta distribution. The beta distribution is flexible enough to accommo-date asymmetries. Proportions generally display more variation around 0.5 whichdecreases as towards either of the boundaries. The mean and variance parametersof the beta distribution are dependent implying that any factor that affects the meanalso alters the variance. In contrast, the variance parameters in the normal distribu-tion is functionally independent of the mean parameter and does not have an upperbound. Using the beta distribution eliminates the burden of choosing the transfor-mation function on the study specific sensitivity and specificity. Consequently theregression parameters are interpretable on the natural scale of the expected propor-tion when the beta distribution is parameterized using the location and dispersion

5

Page 19: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

parameters. Paolino (2001) [27] showed that the beta distributed may provide moreprecise and accurate results than normal distribution when modelling dependentvariables that are proportions. The resulting beta-binomial distribution has previ-ously been shown to be more powerful than logistic-normal regression especiallywhen the event-rate is low, the sample sizes are small or in presence of strong over-dispersion [28].

1.1.3 Bivariate beta distribution

With the exception of the bivariate normal distribution, finding a native bivariatebeta distribution analogous to the univariate beta distribution is a non-trivial task.The complexity is possibly due to the (nonlinear) dependence between sensitivityand specificity. A non-exhaustive list of methods for constructing bivariate and mul-tivariate distributions in general include, conditionally specified distributions, con-struction of multivariate distributions based on order statistics, methods based onmixtures, shared-parameter models, and copula-based methods [29]. Of all thesemethods, the use of copula is the most appealing due to its high flexibility, simplicityand straightforwardness.

A copula function is a multivariate cumulative distribution function with uni-form marginals. Introduced in 1959 by Sklar[30], the concept allows constructionof any multivariate distributions from its marginal distribution and a copula den-sity. The Sklar’s theorem [30] assures the existence of at least one copula functionC on [0, 1]d for a d-dimensional random vector X = (x1, . . . , xd), xi ∈ [−∞,+∞] fori = 1, . . . ,dwith marginal distributions Fi, such that

F(x1, . . . , xd) = C(F1(x1), . . . , Fd(xd) (1.1)

When all the marginal distributions are continuous, then the copula function isunique. The density function of X is obtained as follows

f(x1, . . . , xd) =∂dF(x1, . . . , xd)∂x1, . . . ,∂xd

(1.2)

= c(F1(x1), . . . , Fd(xd))d∑

i=1

fi(xi)

where c(.) is a copula density. With different copula densities allowing differentdependence patterns, the copula theory provides a rich class of multivariate distri-bution. A detailed description on different types of copulas and their characteristicshas been provided by Nelsen (2006) [31]. The dependence between the random vari-ables is fully encoded by the copula. In fact the conventional dependence measuressuch as the Kendall and the Spearman coefficients are functions of the copula. Theisolation of the marginal distributions from the dependence structure in Equation1.2 facilitates easy estimation of complex models by splitting the procedure into twosteps. The marginal distributions are estimated in the first step, followed by thecorrelation structure in the second step.

6

Page 20: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

1.1.4 Estimation procedures

Parametric methods for fitting the random-effects model within the frequentist ap-proach include ML, REML, profile likelihood and MOM. ML and REML often failto converge or converge but give unreliable parameter estimates with some missingstandard errors. It is often the case that (RE)ML estimates the between-study correla-tion of -/+ 1, whenever the estimated covariance matrix is singular. This is becausethe estimation procedure often truncates the between-study covariance matrix onthe boundary of its parameter space [17]. The estimated singular covariance matrixcan lead to incorrect standard errors implying that the confidence intervals may betoo wide or too narrow. Convergence and singular covariance matrix challenges aremagnified as the number of studies decrease, as data become sparse (sensitivity orspecificity on the boundary of 0/1) or as the within-study variation increases [32].The violation of the normality assumption may also contribute to non-convergenceof (RE)ML procedures.

The non-iterative and semi-parametric MOM is computationally faster comparedto the likelihood based methods and performs well in estimating the location param-eters [15, 33, 34]. However, likelihood based methods enable better estimation of thebetween-study covariance matrix and its standard errors [15].

Bayesian analysis yield estimates close to the likelihood based frequentist esti-mates when ‘vague’ priors are placed on all parameters, . Though flexible, Bayesianmethods are more computationally intensive and require more statistical expertise.

1.1.5 Meta-regression

Other than accounting for it, heterogeneity can also be explored and attributed tocertain covariates with meta-regression. That said, random-effects meta-regressionoften lack statistical power especially when published summary statistics do notvary much across studies [23, 35]. In practice, only a limited number of covariatescan be accommodated in the regression model, otherwise a large number of stud-ies would be needed to obtain precise estimates. Furthermore, studies rarely reportuniformly on study-level covariates and when reported, study-level covariate ad-justment might be prone to ecological bias implying that subject-level characteristicswill not be properly accounted for [6, 35]. Other limitations and pitfalls in meta-regression of aggregated data are detailed by Thompson and Higgins [36].

1.2 Network Meta-analysis of Diagnostic Accuracy Studies

New, cheaper, less labour intensive or even more rapid diagnostic tests are often in-troduced. This generates the need for accurate, efficient, effective and flexible meth-ods to analyze and make inference on multiple diagnostic tests simultaneously. Firstdeveloped in settings with univariate outcome, network meta-analysis (NMA) ex-tend the conventional meta-analysis comparing two interventions by utilizing andsynthesizing direct (when a particular test comparison is available) and indirect(when direct evidence on a test comparison is missing) evidence in a one model.This strengthens the evidence base and enables quantitative comparison of diagnos-tic tests that are not directly compared in studies [37]. NMA enriches the analysis by

7

Page 21: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

using all available data and borrowing/sharing information across the network oftests to provide more precise diagnostic accuracy estimates on all tests.

The validity of NMA rests upon two key assumptions. The first assumption as-suring the validity of indirect comparisons is transitivity. The assumption requiresthat studies making different direct comparisons must be sufficiently similar in allrespects other than the interventions being compared [38]. This is usually taken careof through a well defined inclusion-exclusion criteria.

When the direct and indirect evidence differ in a model, the consistency assump-tion is violated. More implicitly, consistency holds if

dAB = dCB − dCA (1.3)

where dkk′ is the contrast (log OR of sensitivity or specificity) between test k andk′. The right side of equation 1.3 represents the direct evidence on the contrast be-tween A and B while the left side represents the corresponding indirect evidencefrom studies comparing test/treatment B and A separately with test/treatment C asa common comparator. The consistency assumption is often questionable [39] moti-vating research on how to detect, visualize, test for and deal with it [40, 41, 42, 43, 44].Inconsistency is possibly due heterogeneity arising from non-comparability of stud-ies, differences in baseline/comparator tests, differences in patients characteristics,or different biases in different studies [45]. According to Dias et al. (2013) [41] incon-sistency is caused by an imbalance in the distribution of effect modifiers in the directand indirect evidence.

NMA of diagnostic accuracy studies present an additional statistical challengeowing to the nature of the outcomes; bivariate and often negatively correlated. Dis-regarding the correlations within a study, between outcomes and between studiesmay lead to loss in precision and increase of selective reporting bias [46].

Suppose there are K diagnostics tests evaluated in I studies. Further, let i =1, . . . , I, j = 1, 2 and k = 1, . . . ,K index study, outcome, and test respectively. Forstudy i, let (Yi1k, Yi2k) denote the true positive and true negatives, (Ni1k,Ni2k) thetotal diseased and healthy individuals, and (πi1k,πi2k) the latent sensitivity andspecificity of test k. There are two alternative parameterizations of models for NMA:the contrast- and arm-based models.

1.2.1 Contrast-based models

In a contrast-based model, contrasts are the basic units being modelled.Taking test K as a common comparator test, Menten and Lesaffre (2015) [47] have

8

Page 22: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

formulated the following model

logit(πij1) = θij1 = µij + (K− 1)δij1

K−δij2

K−δij3

K− . . . −

δij(K−1)

K(1.4)

θij2 = µij −δij1

K+ (K− 1)

δij2

K−δij3

K− . . . −

δij(K−1)

K...

θijK = µij −δij1

K−δij2

K−δij3

K− . . . −

δij(K−1)

K

δij = (δij1, . . . , δij(K−1))

δi = (δi1,δi2)

where µij is the K-average’ event rate in the ith study and δijk the contrast in logitscale between test k and the common comparator test K in outcome j. Due to esti-mation difficulties and identifiability issues, the components δi1 and δi2 are usuallyuncorrelated and modelled as follows

[δi1δi2

]∼ N

[ [ν1ν2

],[Σ1 00 Σ2

] ](1.5)

νj = (νj1, . . . ,νj(K−1))

where, Σj is an unstructured or diagonal covariance matrix of contrasts in outcomej.

By partitioning the random-effects in the classic CB model [48] into two indepen-dent sources to incorporate both test-wise and outcome-wise correlations, Hong etal. [49] reformulated the model as follows

logit(πijk) = θijk = µijk +α1jk +ωik + δij (1.6)

ωi = (ωi1, . . . ,ωik)T ∼ NK−1[0,Σ]

δi = (δi1, δi2) ∼ N2

[ [00

],Ω =

[σ2

1 ρσ1σ2ρσ1σ2 σ2

2

]

where µijk is the baseline treatment effect in study i, α1jk is a fixed-effect for thecontrast between a common first test and test k,ωik is a random-effect for test k, andδij random-effect for outcome j. Σ and Ω are unstructured matrices capturing cor-relation between tests contrasts and between logit sensitivity and specificity respec-tively. In both parameterization, the µijk are not modelled but treated as nuisanceparameters.

1.2.2 Arm-based models

AB models are based on the idea that each study hypothetically compares all inter-ventions, some of which are missing by design and thus considered as missing atrandom. The model use a classical linear predictor with main effects for interven-tion and study [50, 51]. Hong et al. [49] extended this model to bi (multi)-variate

9

Page 23: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

outcomes as follows

logit(πijk) = θijk = µjk +ωij + δik (1.7)

ωi = (ωi1,ωi2)T ∼ N2

[ [00

],Ω =

[σ2

1 ρσ1σ2ρσ1σ2 σ2

2

]

δi = (δi1, . . . , δiK)T ∼ NK(0,Σ)

where for study i, µjk is the fixed mean effect of test k and outcome j, ωij therandom-effects for outcome i and δik is a random-effect for test k. Σ is a K × Kunstructured covariance matrix. Compared to its CB ‘equivalent’ in equation 1.6,the AB model in equation 1.7 is less constrained but with more parameters. Thearm-based models are more elegant, easier to fit and interpret but often criticizedbased on the argument that they break randomization by assuming exchangeabilityof the test effects across the studies [52], depart radically from the entire tradition ofepidemiological statistics and are likely to yield biased treatment effects [53]. Withunivariate outcome, the AB model has been shown to produce similar results as ana-lyis using baseline contrasts [54].

1.3 Inference Framework

There are two main paradigms/frameworks in statistical inference: the frequentistand Bayesian inference. Meta-analysis and network meta-analysis can be performedin both frameworks. Most of statistical practice is based on the frequentist meth-ods and hence the classical approach. Bayesian methods are increasingly becomingpopular especially in performing meta-analysis and network meta-analysis due totheir increased flexibility to accommodate complex models.The differences in rea-soning between the two paradigms has impact on estimation of model parameters,inference on them and interpretation of the results.

1.3.1 Frequentist inference

Frequentist inference is about the probability of a statistic under repeated sampling.In this framework, data is random and the model parameters unknown but fixed.The inference depends on the likelihood for both observed and missing data. Typicalmeasures in this framework are p-values and confidence intervals.

Frequentist parameter estimation techniques include LSM, MOM, ML, and ap-proximate likelihood [55]. Frequentist methods are popular because they requirelower level of statistical knowledge and programming skills, take less time to com-pute and are available as standard procedures in popular statistical software. Exam-ples of frequentist meta-analysis procedures available in both open-source and com-mercial statistical software include mvmeta [56] and mada [57] in R [58], metadas[59] in SAS (SAS Institute, Cary NC), and mvmeta [40] in Stata (StataCorp, CollegeStation TX).

The frequentist methods developed in this project were implemented in Stataand SAS. Compared to R and SAS, Stata is more popular among epidemiologists,clinicians and other stakeholders with lower statistical and programming skills.

10

Page 24: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

1.3.2 Bayesian inference

In a nutshell, Bayesian inference combines the data (likelihood) with prior knowl-edge (prior probability) to update information about unknown parameters (poste-rior probability). The inclusion of prior knowledge collected through expert opinionor from previous studies summarized in an informative prior distribution may im-prove precision of the classical estimators [7]. It is however also a base for criticagainst Bayesian methods because it introduces ‘subjectivity’ into the analysis. Toreflect prior ignorance and somehow make the analysis ‘objective’, non-informativepriors are used. Since classical methods never take prior knowledge into account,the use of non-informative prior leads to Bayesian estimates close to the maximumlikelihood estimates.

Frequentist methods sometimes fail to converge when fitting models to sparsedata. In such cases, Bayesian methods are still feasible by incorporating weakly in-formative priors [60]. Weakly informative priors introduce numerical stability bydiscarding unrealistic parameter values while still being vague enough [61] to main-tain ‘objectivity’ in the analysis.

The rigorous mathematical foundation and the ability to incorporate all avail-able sources of information in a model-based framework make Bayesian methodsappealing. These methods are flexible enough to fit more complicated models withless restrictions, permit inductive reasoning and explicit posterior inference, and al-low full assessment of the uncertainty in the estimated random effects and functionsof the model parameters [55].

In a hierarchical model such as those introduced in Sections 1.2.1and 1.2.2, eachstudy informs on the posterior distribution of its own first-level parameters (sayon sensitivity). It influences the estimates from other studies through the imposedbetween-study correlation structure and also the estimate of the second latent vari-able (in this case specificity) by modelling the correlation between sensitivity andspecificity. This has the potential to improve inference on both the heterogeneityparameters and test effects.

Technically, Bayesian analysis is about evaluating a ratio of two integrals, theposterior distribution. Except for simple definite integrals that can be solved analyt-ically (e.g. when conjugate priors are used) the more complicated statistical modelshave intractable expressions of the posterior distribution which need to be numer-ically integrated. However, numerical integration methods e.g grid approximationbecome impossible as the number of parameters grow. Alternatively, the integralscan be replaced by a Monte Carlo sample of the posterior distribution, often calledMonte Carlo integration.

The most common sampling algorithms, the so called Markov Chain Monte Carlo(MCMC) methods are Gibbs sampler and Metropolis-Hastings algorithm. These al-gorithms have been implemented in a number of open-source software includingWinBUGS [62], OpenBUGS [63], and JAGS [64]. Recent versions of popular com-mercial software for frequentist statistical analysis also have facilities to performBayesian analysis. Examples are PROC MCMC in SAS 9.02 and Bayesmh commandin Stata 14. However, the capabilities and flexibility in these classical software is stilllimited compared to the pure Bayesian software. Other numerical approximationalgorithms in Bayesian inference include the likelihood-free approximate Bayesiancomputation (ABC), importance sampling, iterative quadrature, variational bayes,

11

Page 25: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

central composite design integration (CCD) and integrated nested Laplace approxi-mation (INLA) [65].

In this project, Bayesian methods have been implemented in Stan software [66],accessed through rstan [67], the R interface to Stan. We chose Stan because apartfrom being open-source, it provided an elegant, flexible and expressive probabilis-tic programming language, and a scalable framework for fitting complex statisticalmodels.

Stan uses the no-U-turn sampler [68], an adaptive variant of Hamiltonian MonteCarlo. The sampler itself is a generalization of the familiar Metropolis algorithm,performing multiple steps per iteration to move more efficiently through the poste-rior distribution. The implemented MCMC technique is more efficient and a robustcompared to Gibbs sampling and Metropolis-Hastings [69].

The R software was chosen because it is an open-source and all-around statisticalsoftware. It is excellent for graphics, has a sensible language to manage and manip-ulate data, is easily extended as an add-on package and has a large community ofusers.

While MCMC techniques have expanded the scope of statistical problems thatcan be solved, fitting Bayesian models requires a higher level of statistical and pro-gramming expertise compared to frequentist models. Furthermore, the less excit-ing model diagnostics performed to assess stationarity and convergence to the tar-get posterior distribution and sensitivity analysis add further on the computationaloverhead.

Sensitivity analysis examining the effects of different prior distributions are highlyrecommended. The analysis might show to be sensitive to the specification of priors.However, choosing different vague priors is a non-trivial exercise (especially for thevariance hyperparameters and with small data sets) and the extra computing effortimplies that sensitivity analysis is not always feasible.

1.4 Burden of Cervical Cancer

In this thesis, the diagnosis of cervical cancer precursor is used to demonstrate theuse of the developed statistical models. Worldwide, cervical cancer is the mostcommon cancer in women affecting nearly half a million women worldwide half ofwhom die, mostly in low and middle-income countries [70]. It is the leading causeof death in women in sub-Saharan Africa [71]. In the developed countries, incidenceof cervical cancer is low due to screening programmes and preventive treatment.

A Pap smear is a screening test used to detect precursor lesions of cervical can-cer. Technically, it is a microscopic examination of cells from the transformation zoneof the cervix to reveal cytological abnormalities which may be classified as low- orhigh-grade intraepithelial lesions [72]. Women with a high-grade cytological lesionsare immediately referred for further investigation and treatment. However, man-agement of women with atypical and low-grade cytological abnormalities remainscontroversial. The natural history of minor cytological lesions is difficult to predictand it is often the case that the lesions regress spontaneously without treatment [73].Hence, referring women with low-grade cytological lesions for further diagnosticwork-up would have the following consequences: discomfort and anxiety, increasedfinancial burden on the health-care system, over-diagnosis and over-treatment [72].

12

Page 26: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Triaging tests are therefore crucial in identifying women who are at increased risk ofdeveloping cervical pre-cancer and need further diagnostic work-up [74, 75, 76].

Because the human papilloma virus (HPV) is detected in 99% of cervical tumours[71], testing for high-risk HPV has become an integral part of new screening strate-gies [77] as well as a triage method to identify women who need further explorationamong those with equivocal pap smear results[73]. Other triage methods use proteinmarkers indicative of a transforming HPV infection [78, 79].

The statistical evaluation of performance of the different diagnostic tests is crucialand contributes to more evidence on new screening and triage methods for cervicalcancer prevention.

1.5 The OSPADAC Project

The arsenal of methods for diagnostic meta-analysis continues to grow. Howeverthese methods are yet to become routine procedures. This is possibly due to thecomplexity of the procedures and limited capabilities of standard statistical softwarepackages.

With that in mind, the OSPADAC project of which this thesis is part of was initi-ated to bring established and new innovative statistical procedures for meta-analysisand network meta-analysis of diagnostic accuracy studies to the end users: clini-cians, researchers, students, etc. The developed statistical procedures contribute ontools to assess evidence on new screening and triage algorithms to detect cervicalcancer precursors.

The statistical methods considered herein perform meta-analysis of diagnosticaccuracy data when the considered outcome is univariate, bivariate or multivariateand when the reference standard test is considered perfect.

13

Page 27: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Part II

Optimisation of Statistical Procedures to Assess theDiagnostic Accuracy of Cervical Cancer Screening Tests

Page 28: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 29: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

2. Meta-analysis of Proportions

This chapter has been published as:Nyaga VN, Arbyn M and Aerts M. Metaprop: a Stata command to perform meta-analysis of binomial data. Arch Public Health, 2014, 72(1):39.

Page 30: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 31: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

ARCHIVES OF PUBLIC HEALTHNyaga et al. Archives of Public Health 2014, 72:39http://www.archpublichealth.com/content/72/1/39

METHODOLOG Y Open Access

Metaprop: a Stata command to performmeta-analysis of binomial dataVictoria N Nyaga1, Marc Arbyn1* and Marc Aerts2

Abstract

Background: Meta-analyses have become an essential tool in synthesizing evidence on clinical and epidemiologicalquestions derived from a multitude of similar studies assessing the particular issue. Appropriate and accessiblestatistical software is needed to produce the summary statistic of interest.

Methods: Metaprop is a statistical program implemented to perform meta-analyses of proportions in Stata. It buildsfurther on the existing Stata proceduremetanwhich is typically used to pool effects (risk ratios, odds ratios, differencesof risks or means) but which is also used to pool proportions.Metaprop implements procedures which are specific tobinomial data and allows computation of exact binomial and score test-based confidence intervals. It providesappropriate methods for dealing with proportions close to or at the margins where the normal approximationprocedures often break down, by use of the binomial distribution to model the within-study variability or by allowingFreeman-Tukey double arcsine transformation to stabilize the variances.Metaprop was applied on two publishedmeta-analyses: 1) prevalence of HPV-infection in women with a Pap smear showing ASC-US; 2) cure rate aftertreatment for cervical precancer using cold coagulation.

Results: The first meta-analysis showed a pooled HPV-prevalence of 43% (95% CI: 38%-48%). In the secondmeta-analysis, the pooled percentage of cured women was 94% (95% CI: 86%-97%).

Conclusion: By usingmetaprop, no studies with 0% or 100% proportions were excluded from the meta-analysis.Furthermore, study specific and pooled confidence intervals always were within admissible values, contrary to theoriginal publication, wheremetan was used.

Keywords: Meta-analysis, Stata, Binomial, Logistic-normal, Confidence intervals, Freeman-Tukey double arcsinetransformation

BackgroundMeta-analyses combine information frommultiple studiesin order to derive an average estimate. Different meta-analysis procedures exist depending on the statistic to bereported. Examples of statistics of interest include associa-tionmeasures such as risk difference, risk ratio, odds ratio,difference in means, or simply one-dimensional binomialor continuous measures such as proportions or means.There are three important aspects in meta-analysis: a)

the analysis framework, b) the model and c) the choiceof the method to estimate the heterogeneity parameter.

*Correspondence: [email protected] of Cancer Epidemiology, Scientific Institute of Public Health, JulietteWytsmanstraat 14, 1050 Brussels, BelgiumFull list of author information is available at the end of the article

These aspects interact with each other. Ameta-analyst hasa choice between the fixed- and random-effects model.In the fixed-effects model, it is assumed that the param-

eter of interest is identical across studies and the differ-ence between the observed proportion and the mean isonly due to sampling error. In the random-effects model,the observed difference between the proportions and themean cannot be entirely attributed to sampling errorand other factors such as differences in study popula-tion, study designs, etc. could also contribute. Each studyestimates a different parameter, and the pooled estimatedescribes the mean of the distribution of the estimatedparameters. The variance parameter describes the hetero-geneity among the studies and in the case where the vari-ance is zero, this model simply reduces to the fixed-effectsmodel.

© 2014 Nyaga et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedicationwaiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwisestated.

18

Page 32: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 2 of 10http://www.archpublichealth.com/content/72/1/39

There are three frameworks in modeling of binomialdata. The most popular framework uses approximationto the normal distribution by use of transformations andis known as the approximate likelihood approach [1,2].Some of the common transformations include the logitand the arcsine [3]. Some of the reasons why this approachis popular include lower level of required statistical exper-tise, faster computations and availability of software tocarry out the analysis.The second approach recognises the true nature of the

data and is known as the exact likelihood approach. Inthis framework, the special relationship between themeanand the variance as characterised by binomial data is cap-tured by the binomial distribution [4]. The beta-binomialdistribution [5] can be used to fit a random-effects modelsuch that the beta distribution describes the distributionof the varying binomial parameters. While it is possible toperform computations to estimate the parameters of thebinomial model, most common statistical software lacksfunction to fit the beta-binomial model and therefore, thisapproach is the least popular. The WinBUGS software, asoftware package for Bayesian statistics, has the capabil-ity to perform such analyses. Other software e.g R andSAS (PROC NLMIXED) can also be used, but extensiveprogramming is required.The third approach is a compromise between approx-

imate and exact likelihood. In the first stage, the datais modeled using the binomial distribution. In the sec-ond stage, the normal distribution is used after the logittransformation to model the heterogeneity among thestudies. This is an emerging approach and is often rec-ommended by statisticians [4]. Most statistical softwareincluding Stata(melogit), R, SAS (PROC NLMIXED) havethe capability to perform such analyses.There are three popular methods to estimate the

parameters. The non-iterative method popularised byDersimonian and Laird [6]. The other twomethods are themaximum likelihood (ML) and restricted maximum like-lihood (REML) method. For random-effects model, theREML method is preferred because ML leads to underes-timation of the variance parameter. For generalized linearmixed models [2,7,8] under which models for binomialdata falls, the REML method is not used due to inten-sive computation of high-dimension integrations of therandom-effects and as a result most software estimate theheterogeneity parameter using ML methods. The proce-dure proposed by Dersimonian and Laird is efficient forthe mean but not the heterogeneity parameter [9].Various procedures to performmeta-analysis have been

implemented in the Stata commandmetan [10]. Inmetan,the confidence intervals are calculated using the nor-mal distribution based on the asymptotic variance. Forproportions such intervals may contain inadmissible val-ues especially when the statistic is near the boundary.

Furthermore, computation of confidence intervals is notpossible when the statistic is on the boundary, as the esti-mated standard error is set to zero and as a consequence,the metan command automatically excludes studies withproportion equal to 0 or 1 from the calculation of thepooled estimate.Tests of significance on the pooled proportion typi-

cally rely on normal probabilities. Proportions (p = rn ) are

binomial and the normal distribution is a good approxi-mation of the binomial distribution if n is large enoughand p is not close to the margins [11]. When n is smalland/or p is near the margins, the test statistic may notbe approximately normally distributed due to its skew-ness and discreteness. To make the normal distributionassumptions more applicable to significance testing, sev-eral transformations have been suggested. Freeman andTukey [12] presented a double arcsine transformation tostabilize the variance.We have developed metaprop, a new program in Stata

to perform meta-analyses of binomial data to supplementthemetan command, which is typically used to pool asso-ciations.metaprop builds further on themetan procedure.It allows computation of 95% confidence intervals usingthe score statistic and the exact binomial method andincorporates the Freeman-Tukey double arcsine trans-formation of proportions. The program also allows thewithin-study variability be modelled using the binomialdistribution. This article presents a general overview ofthe program to serve as a starting point for users inter-ested in performing meta-analysis of proportions in Statasoftware.

MethodsA detailed description of various statistical proceduresto perform meta-analysis which can be performed withmetan can be found elsewhere [10]. In this article, wepresent procedures specific to pooling of binomial dataincluding methods of computation of the confidenceintervals, continuity correct and the Freeman-Tukeytransformation. Table 1 summarises the characteristics ofthe procedures presented.

Confidence intervals for the individual studiesTwo types of confidence intervals for the study spe-cific proportions have been implemented. Throughout thetext, for study i, ri denotes the number of observationswith a certain characteristic, ni is the total number ofobservations, pi = ri

ni is the observed proportion, k is thetotal number of studies in the meta-analysis, and 1 - α

refers to the selected level of confidence.

Exact confidence intervalsThe exact or Clopper-Pearson [13] confidence limits fora binomial proportion are constructed by inverting theequal-tailed test based on the binomial distribution.

19

Page 33: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 3 of 10http://www.archpublichealth.com/content/72/1/39

Table 1 Summary of the procedures available in metaprop

Option in metaprop Description Strength Remarks

cimethod (score) Computes the study specificconfidence intervals using the scoremethod.

Study specific intervals always yieldadmissible values (within the limitsof 0 and 1).

The Wald confidence intervals forthe pooled estimate could beinadmissible if study specificestimates are on or close to themargin.

The coverage probability of thestudy specific confidence intervalsare close to the nominal level.

cimethod (exact) Computes the study specificconfidence intervals using exactmethod

Study specific intervals always yieldadmissible values

More conservative method andtherefore study specific confidenceintervals tend to be too wide.The Wald confidence intervals for thepooled estimate could be inadmissible ifstudy specific estimates are on or closeto the margin.

ftt Performs the Freeman-tukeydouble arcsine transformation,computes the weighted pooledestimate and performs theback-transformation on the pooledestimate.

The confidence intervals for thepooled estimate are alwaysadmissible. Test of significancebased on Normal approximationmore applicable than without thetransformation.

The procedure could break-down incase of extremely sparse data.

logit Uses the Binomial distribution tomodel the within-study variability.

The confidence intervals forthe study-specific estimate andpooled estimate are alwaysadmissible.

Requiresmetaprop_one availablefor Stata 13 or later versions.

It is an iterative procedure andtherefore it requires morecomputational time thannon-iterative procedures.

The interval for the ith study is [Li, Ui] with Li and Ui asthe solutions in pi to the equations;

P (Xi ≥ ri) = α

2and P (Xi ≤ ri) = α

2for Xi = 0, 1, ..ri, . . . ,ni.

The lower endpoint is the α2 quantile of a beta distribu-

tion; Beta(xi, ni − xi + 1), and the upper endpoint is the1 − α

2 quantile of a beta distribution; Beta(xi + 1, ni −xi) [14]. Since the binomial distribution is discrete, thecoverage probability of the exact intervals is not exactly(1-α) but at least (1-α) and consequently exact confidenceintervals are considered conservative [15].

Score confidence intervalsThe score confidence interval [16] has its coverage closeto the nominal confidence level even with small samplesizes. It has been shown to perform better than the Waldand the exact confidence intervals [1,15]. The confidencelimits for the ith study are computed as;

pi + z2ni ∓ z

√pi(1 − pi) +

z4nini(

1 + zni

) ,

where z is the α2th percentile of the standard normal

distribution.

Confidence Intervals for the pooled estimate aftertransformationFreeman-Tukey double arcsine transformationThe variance stabilizing transformation of the proportionsas proposed by Freeman and Tukey [12] normalizing theoutcomes before pooling, is defined as;

sin−1√ rini + 1

+ sin−1

√ri + 1ni + 1

.

The asymptotic variance of the transformed variableis defined as, 1

ni+0.5 . This transformations is intendedto achieve approximate normality. The pooled estimateare then computed using the Dersimonian and Laird [6]method based on the transformed values and their vari-ances. The confidence intervals for the pooled estimateare then computed using theWald method.

Inverse of Freeman-Tukey double arcsine transformationTo convert the transformed values into the ‘original units’of proportions,Miller [3] proposed the following formula;

p = 12

⎡⎢⎣1 − sign(cos t)

√√√√√⎡⎣1 −

(sin t + sin t − 1

sin tn

)2⎤⎦

⎤⎥⎦ ,

where t is the transformed value and n is the sample size.In the meta-analysis setting, t is the pooled estimate orthe confidence intervals based on transformed values. In

20

Page 34: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 4 of 10http://www.archpublichealth.com/content/72/1/39

practice, the use of this formula usually involves translat-ing the means of t’s derived from binomials with differentn’s as is the case in meta-analysis where most studiesincluded have different sample sizes. In this case, Miller[3] suggested that the harmonic mean of the ni’s be usedin the conversion formula. For a set of numbers, the har-monic mean is the inverse of the arithmetic mean of thereciprocals of the numbers in the set.

The logistic-normal random-effects modelThe observed events ri are assumed to have a binomialdistribution with parameters pi and sample size ni, i.e;

ri ∼ binomial(pi, ni).The normal distribution is then used to model the

random-effects;

logit(pi) ∼ normal(μ, τ).

Here, μ is the mean of a population of possible means,and τ is the between-study variance, both in the logitscale. The maximum likelihood (ML) procedure is hereinused to estimate τ . The above model can be reduced toform the fixed-effects model by assuming that τ = 0. Inthis case, the model is written as;

ri ∼ binomial(p, ni).

MaterialsThe datasets used for the illustration were part of meta-analyses conducted by Arbyn et al. [17] and Dolman et al.[18]. The datasets are available as clickable examples in thehelp file formetaprop.

Dataset oneArbyn et al. [17] assessed the HPV test positivity ratein women with equivocal or low-grade cervical cytolog-ical abnormalities. HPV testing has been proposed as amethod to triage women with minor cytological abnor-malities identified through screening for cervical cancerusing the Pap smear [19,20]. The prevalence of HPV infec-tion reflects the burden of referral and diagnostic work-upwhen the test is used to triage women with these cyto-logical conditions [17]. Two groups of minor cytologicalabnormalties can be distinguished: a) atypical squamouscells of undetermined significance (ASC-US) or border-line dyskaryosis and b) low-grade squamous intraepithe-lial lesion (LSIL) or mild dyskaryosis. The meta-analysisconcluded that the large majority of women with LSILwere infected with HPV suggesting limited utility of HPVtriaging. However, for women with ASC-US, more thanhalve tested negative and could be released from furtherfollow-up. Figure 1 reproduces the meta analysis includ-ing 32 studies providing data of HPV infection in case ofequivocal cervical cytology (ASC-US). The pooled preva-lence of HPV infection, assessed with the Hybrid Capture

2 assay was 43% (95% CI: 39%-46%) (see Figure 1 andTable 2).The dataset contains author and year which identify

each study, where tgroup corresponds with the triagegroup(ASCUS, LSIL, borderline dyskaryosis). num anddenom indicates the number of women with a positiveHPV test (HC2 assay) and total number of tested womensuch that frac

( numdenom

)is the proportion with a positive

HC2 test. se indicates the standard error computed as√frac(1−frac)

denom . lo and up are the lower and upper confidenceintervals computed using the ‘exact’ method.

Dataset twoDolman et al. [18] published a systematic review on theefficacy of cold coagulation to treat cervical intraepithe-lial neoplasia (CIN). Thirteen reports were included inthe meta-analysis which showed a high degree of hetero-geneity among studies. Several studies had cure rates at orclose to 100%. As seen in Figure 2, the Wald confidenceintervals yield values beyond 1 for some of the individ-ual studies and for the pooled proportion for studiesconducted in Europe.The dataset contains nb_cured and nb_treated indicates

the number of women cured of CIN and total number ofwomen treated for CIN such that frac

(nb_curednb_treated

)is the

proportion of women cured of CIN, and se is the stan-dard error. region indicates continent in which the studywas conducted. For studies with frac = 1, se = 0 and theauthors replaced se = uplow

2∗1.96 , where up and low were theexact binomial confidence intervals to ensure that suchstudies were not excluded from the analysis.

Software developmentThe metaprop command is an adaptation of the metanprogramme developed by Harris et al. [10] intendedto perform fixed and random-effects meta-analysis inStata on continuous variables or associations betweencontinuous or binomial variables. The metaprop pro-gram and its help file are available for download-ing at http://ideas.repec.org/c/boc/bocode/s457781.html.The command requires Stata 10 or later versions andcan be directly installed within Stata by typing sscinstall metaprop when one is connected to the internet.An update to metaprop to include the logistic-normalrandom-effects model is also available for download. Theupdated command metaprop_one requires Stata 13 andcan be directly installed within Stata by typing ssc installmetaprop_one when one is connected to the internet.

ResultsExample oneWe reproduce Figure one in Arbyn et al. [17]. metaproppools proportions and presents a weighted sub-group and

21

Page 35: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 5 of 10http://www.archpublichealth.com/content/72/1/39

Heterogeneity between groups: p = 0.925

Overall (I^2 = 96.1%, p = 0.000);

Guyot (2003)

Zielinski (2001)

Bergeron (2006)

Morin (2001)

Giovannelli (2005)

Lonky (2003)

Lytwyn (2000)

You (2007)

Bruner (2004)

Kendall (2005)

ASC−US

Subtotal (I^2 = 90.7%, p = 0.000)

Manos (1999)

Cuschieri (2007)

Cuzick (2003)

Kiatpongsan (2006)

Solomon (2001)

Kelly (2006)

Nieh (2005)

Subtotal (I^2 = 96.9%, p = 0.000)

Wright (2006)

Wensveen (2003)

Monsonego (2006)

Moss (2006)

Study

Ronco (2007)

Ko (2006)

Andersson (2005)

Palma (2005)

Subtotal (I^2 = 93.2%, p = 0.000)

Pretorius (2002)

BORDERLINE DYSKARYOSIS

Kulasingam (2002)

Rebello (2001)

Rowe (2004)

Shlay (2000)

Selvaggi (2006)

Bergeron (2000)

ASCUS

0.43 (0.39, 0.46)

0.52 (0.31, 0.73)

0.35 (0.28, 0.42)

0.44 (0.42, 0.47)

0.29 (0.25, 0.34)

0.23 (0.15, 0.33)

0.46 (0.40, 0.52)

0.40 (0.28, 0.54)

0.46 (0.43, 0.49)

0.27 (0.18, 0.37)

0.34 (0.33, 0.35)

0.43 (0.34, 0.52)

0.39 (0.36, 0.43)

0.61 (0.53, 0.68)

0.26 (0.19, 0.35)

0.39 (0.29, 0.50)

0.57 (0.55, 0.59)

0.73 (0.58, 0.84)

0.74 (0.62, 0.84)

0.43 (0.38, 0.48)

0.34 (0.31, 0.37)

0.45 (0.37, 0.54)

0.48 (0.36, 0.60)

0.46 (0.44, 0.47)

ES (95% CI)

0.31 (0.28, 0.35)

0.40 (0.38, 0.42)

0.44 (0.30, 0.59)

0.70 (0.62, 0.77)

0.42 (0.36, 0.47)

0.32 (0.29, 0.35)

0.51 (0.45, 0.57)

0.41 (0.30, 0.53)

0.44 (0.38, 0.50)

0.31 (0.25, 0.38)

0.40 (0.36, 0.43)

0.43 (0.34, 0.53)

0.43 (0.39, 0.46)

0.52 (0.31, 0.73)

0.35 (0.28, 0.42)

0.44 (0.42, 0.47)

0.29 (0.25, 0.34)

0.23 (0.15, 0.33)

0.46 (0.40, 0.52)

0.40 (0.28, 0.54)

0.46 (0.43, 0.49)

0.27 (0.18, 0.37)

0.34 (0.33, 0.35)

0.43 (0.34, 0.52)

0.39 (0.36, 0.43)

0.61 (0.53, 0.68)

0.26 (0.19, 0.35)

0.39 (0.29, 0.50)

0.57 (0.55, 0.59)

0.73 (0.58, 0.84)

0.74 (0.62, 0.84)

0.43 (0.38, 0.48)

0.34 (0.31, 0.37)

0.45 (0.37, 0.54)

0.48 (0.36, 0.60)

0.46 (0.44, 0.47)

ES (95% CI)

0.31 (0.28, 0.35)

0.40 (0.38, 0.42)

0.44 (0.30, 0.59)

0.70 (0.62, 0.77)

0.42 (0.36, 0.47)

0.32 (0.29, 0.35)

0.51 (0.45, 0.57)

0.41 (0.30, 0.53)

0.44 (0.38, 0.50)

0.31 (0.25, 0.38)

0.40 (0.36, 0.43)

0.43 (0.34, 0.53)

0 .25 .5 .75 1Proportion

Figure 1 Meta-analysis of the proportion of women with ASCUS or a borderline Pap smear that have a positive Hybrid Capture II test.Output generated by the Stata proceduremetaprop.

overall pooled estimates with inverse-variance weightsobtained from a random-effects model.

. metaprop num denom, random by(tgroup)cimethod(exact) /**/ label(namevar=author, yearvar=year) /**/ xlab(.25,0.5,.75,1)xline(0, lcolor(black)) /**/ subti(Atypical cervical cytology, size(4)) /**/ xtitle(Proportion,size(2)) nowt /*

*/ olineopt(lcolor(red)lpattern(shortdash))/**/ plotregion(icolor(ltbluishgray)) /**/ diamopt(lcolor(red)) /**/ pointopt(msymbol(x)msize(0))boxopt(msymbol(S)mcolor(black)) /*

Table 2 and Figure 1 both present the study specificproportions with 95% exact confidence intervals foreach study, the sub-group and overall pooled estimate

22

Page 36: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 6 of 10http://www.archpublichealth.com/content/72/1/39

Table 2 Meta-analysis of the presence of high-risk HPV DNA in women with equivocal cervical cytology, by terminologygroup (ASCUS, Borderline Dyskaryosis or ASC-US)

Study ES [95% Conf. interval]

ASCUS

Manos (1999) 0.395 0.364 0.426

Bergeron (2000) 0.432 0.339 0.53

Lytwyn (2000) 0.404 0.276 0.542

Shlay (2000) 0.313 0.248 0.383

Morin (2001) 0.292 0.245 0.342

Solomon (2001) 0.568 0.547 0.588

Kulasingam (2002) 0.511 0.45 0.572

Pretorius (2002) 0.322 0.293 0.353

Lonky (2003) 0.46 0.401 0.521

Wensveen (2003) 0.453 0.371 0.537

Rowe (2004) 0.44 0.38 0.501

Andersson (2005) 0.442 0.305 0.587

Palma (2005) 0.699 0.62 0.769

Giovannelli (2005) 0.228 0.147 0.328

Kendall (2005) 0.341 0.33 0.352

Nieh (2005) 0.742 0.62 0.842

Bergeron (2006) 0.444 0.422 0.467

Kiatpongsan (2006) 0.389 0.288 0.497

Monsonego (2006) 0.479 0.359 0.601

Ronco (2007) 0.314 0.281 0.349

Sub-total

Random pooled ES 0.431 0.382 0.480

BORDERLINE DYSKARYOS

Rebello (2001) 0.413 0.301 0.533

Zielinski (2001) 0.347 0.284 0.415

Cuzick (2003) 0.26 0.185 0.347

Guyot (2003) 0.522 0.306 0.732

Moss (2006) 0.456 0.44 0.473

Cuschieri (2007) 0.605 0.532 0.675

Sub-total

Random pooled ES 0.428 0.341 0.516

ASC-US

Bruner (2004) 0.269 0.182 0.371

Kelly (2006) 0.725 0.583 0.841

Ko (2006) 0.401 0.381 0.421

Selvaggi (2006) 0.396 0.359 0.434

Wright (2006) 0.341 0.315 0.368

You (2007) 0.463 0.434 0.492

Sub-total

Random pooled ES 0.416 0.360 0.472

Overall

Random pooled ES 0.428 0.395 0.461

23

Page 37: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 7 of 10http://www.archpublichealth.com/content/72/1/39

Table 2 Meta-analysis of the presence of high-risk HPV DNA in women with equivocal cervical cytology, by terminologygroup (ASCUS, Borderline Dyskaryosis or ASC-US) (Continued)

Test(s) of heterogeneity:

Heterogeneity statistic Degrees of freedom p-value I2∗∗

ASCUS 614.42 19 0.000 96.9%

BORDERLINE DYSKARYOS 53.58 5 0.000 90.7%

ASC-US 73.92 5 0.000 93.2%

Overall 785.77 31 0.000 96.1%

Random: Rest for heterogeneity between sub-groups:

0.16 2 0.925

**I2: the variation in ES attributable to heterogeneity

Significance of test(s) of ES = 0

ASCUS z = 17.22 p = 0.000

BORDERLINE

DYSKARYOS z = 9.58 p = 0.000

ASC-US z = 14.57 p = 0.000

Overall z = 25.31 p = 0.000

Output generated by the Stata proceduremetaprop.

with 95% Wald confidence intervals and the I2 statis-tic which describes the percentage of total variation dueto inter-study heterogeneity. The table presents addi-tional information on the pooled proportions and includestests of heterogeneity within the sub-groups and over-all. Significant intra-group heterogeneity was observed(p<0.001 with I2 exceeding 93% for all the three terminol-ogy groups). However, no inter-group heterogeneity wasnoted (p = 0.925), supporting the pooling of all studiesinto one pooled measure: 43% (95% CI: 39-46%).Though the weights have been computed using the

random-effects model, the heterogeneity statistics havebeen computed by re-calculating the overall pooled esti-mate by treating the sub-group pooled estimates asthough they were fixed-effects estimates. Since all study-specific proportions are close to 0.5,metan (see Figure onein Arbyn et al. [17]) and metaprop (see Figure 1) producesimilar results.

Example twoWe extracted data that generated Figure two in Dolmanet al. [18] (see Figure 2). Since the proportion of curedwomen is close to or at 1 in some studies, we enabledthe Freeman-Tukey double arcsine transformation. Oth-erwise, studies with estimated proportion at 1 would beexcluded from the analysis leading to a biased pooled esti-mate. Alternatively; using cc(#) ensures that such studiesare not excluded. However, the pooled estimate is notguaranteed to be within the [0,1] interval which is auto-matic when the Freeman-Tukey double arcsine(ftt) optionis enabled. We used the score confidence intervals for theindividual studies.

. metaprop nb_cured nb_treated, random by(region)ftt cimethod(score)/**/ label(namevar = study) graphregion(color(white))plotregion(color(white))/**/ xlab(0.5,0.6,.7,0.8, 0.9, 1) /**/ xtick(0.5,0.6,.7,0.8, 0.9, 1) force/**/ xtitle(Proportion,size(2)) nowt stats /**/ olineopt(lcolor(black) lpattern(shortdash)) /**/ diamopt(lcolor(black)) /**/ boxopt(msymbol(S)) rcols(col)/**/ astext(70) texts(80) nohet notable

Figure 3 (displaying the forest plot generated bymetaprop) presents the study-specific proportions with95% score confidence intervals, the regional and overallpooled estimates with 95% Wald confidence intervals, I2statistic, and test of significance of the overall pooled esti-mates. In contrast with Figure 2 (displaying the graphicaloutput generated withmetan), all the confidence intervalshave admissible values.

Example threeWe extracted data that generated Figure two in Dolmanet al. [18] (see Figure 2). We fit the logistic-normalrandom-effects model to the data. With these model,there is no worry about studies with cure rates close toor at 1 in some studies since we use the exact method.The confidence intervals for the individual studies alsoare computed with exact method. We used the updatedcommandmetaprop_one which requires Stata 13 to fit thegeneralized linear mixed model (GLMM).

24

Page 38: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 8 of 10http://www.archpublichealth.com/content/72/1/39

NOTE: Weights are from random effects analysis

.

.

.

Overall (I−squared = 42.5%, p = 0.107)

Joshi (2013)

Loobuyck & Duncan (1993)

Subtotal (I−squared = 51.6%, p = 0.102)

Asia

North America

Europe

de Cristofaro (1990)

Study

Javaheri (1981)

Hussein & Galloway (1985)

ID

Subtotal (I−squared = .%, p = .)

Singh (1998)

Subtotal (I−squared = 0.0%, p = 0.746)

Rogstad (1992)

0.96 (0.92, 0.99)

0.91 (0.79, 1.03)

0.97 (0.95, 0.99)

0.97 (0.93, 1.01)

1.00 (0.96, 1.04)

0.96 (0.87, 1.04)

0.91 (0.74, 1.08)

Proportion (95% CI)

0.96 (0.87, 1.04)

0.88 (0.79, 0.98)

0.89 (0.82, 0.97)

0.80 (0.62, 0.98)

20/22

cured/Nb

445/459

Nb

42/42

treated

22/23

10/11

with F−Up

38/43

16/20

0.96 (0.92, 0.99)

0.91 (0.79, 1.03)

0.97 (0.95, 0.99)

0.97 (0.93, 1.01)

1.00 (0.96, 1.04)

0.96 (0.87, 1.04)

0.91 (0.74, 1.08)

Proportion (95% CI)

0.96 (0.87, 1.04)

0.88 (0.79, 0.98)

0.89 (0.82, 0.97)

0.80 (0.62, 0.98)

20/22

cured/Nb

445/459

Nb

42/42

treated

22/23

10/11

with F−Up

38/43

16/20

.5 .6 .7 .8 .9 1 1.1 1.2Proportion

Figure 2 Proportion-cured estimates associated with cold coagulation treatment for CIN1 disease, by world region as analysed bymetan.

. metaprop_one nb_cured nb_treated, random logitgroupid(study) ///label(namevar=author, yearvar=year) sortby(yearauthor) ///xlab(.1,.2,.3,.4,.5,.6,.7,.8,.9,1) xline(0, lcolor(black)) ///ti(Positivityof p16 immunostaining, size(4) color(blue)) ///subti("Cytology= HSIL", size(4) color(blue)) ///xtitle(Proportion,size(3)) nowt nostats ///olineopt(lcolor(red) lpattern(shortdash)) ///diamopt(lcolor(red)) pointopt(msymbol(s) msize(2)) ///astext(70) texts(100)

Table 3 presents the study-specific proportions with95% exact confidence intervals and overall pooledestimates with 95% Wald confidence intervals with logittransformation and back transformation, Chi2 statistic ofLikelihood ratio (LR) test comparing the random- andfixed-effects model, the estimated between-study vari-ance and test of significance testing if the estimated pro-portion is equal to zero. The P-value for the LR is 0.022indicating presence of significant heterogeneity. From theprevious command, the Q-statistic is analogous to the LRstatistic. In contrast with Figure 2 (displaying the graphicaloutput generated with metan), all the confidence inter-vals have admissible values. The estimated pooled meanand the corresponding 95% intervals are similar to thoseobtained earlier (see Figure 2) computed as a weightedaverage after the arcsine transformation. However, the

estimated between-study variance is larger (0.4907) thanthe Dersimonian and Laird variance estimate obtainedfrom the previous command (0.0409) as expected [9].

DiscussionWe have presented procedures to perform meta-analysisof proportions in Stata. We adapted and made addi-tions to the metan command to provide procedureswhich are specific for binomial data where the userspecifies n and N denoting the number of individualswith the characteristic of interest and the total num-ber of individuals. With metaprop, it is possible toperform a test of heterogeneity between groups whensub-group analysis is desired and the random-effectsmodel has been used to compute the pooled estimate.In metan, a test for intergroup comparison is only pro-duced when the fixed effects model is used in a subgroupmeta-analysis.When the estimated proportion is at 0/1, the estimate

for the standard error is zero and therefore the Waldconfidence intervals cannot be computed. Studies withzero standard error are often excluded since the weightassigned to such studies is infinite. Excluding such stud-ies could lead to biased results and often users computethe standard error in ad hoc way. The continuity cor-rection enabled by the cc(#) option avoids exclusion ofstudies with 0%. or 100% prevalence. While this ensures

25

Page 39: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 9 of 10http://www.archpublichealth.com/content/72/1/39

Overall

Subtotal

Subtotal

Subtotal

de Cristofaro (1990)

Joshi (2013)

Asia

Rogstad (1992)

Singh (1998)

Europe

Hussein & Galloway (1985)

Loobuyck & Duncan (1993)

Javaheri (1981)

North America

Study

0.95 (0.88, 0.99)

0.96 (0.87, 1.00)

0.96 (0.79, 0.99)

0.89 (0.80, 0.96)

1.00 (0.92, 1.00)

0.91 (0.72, 0.97)

0.80 (0.58, 0.92)

0.88 (0.76, 0.95)

0.91 (0.62, 0.98)

0.97 (0.95, 0.98)

0.96 (0.79, 0.99)

ES (95% CI)

treated

42/42

20/22

Nb

cured/Nb

16/20

38/43

10/11

445/459

22/23

with F−Up

0.95 (0.88, 0.99)

0.96 (0.87, 1.00)

0.96 (0.79, 0.99)

0.89 (0.80, 0.96)

1.00 (0.92, 1.00)

0.91 (0.72, 0.97)

0.80 (0.58, 0.92)

0.88 (0.76, 0.95)

0.91 (0.62, 0.98)

0.97 (0.95, 0.98)

0.96 (0.79, 0.99)

ES (95% CI)

treated

42/42

20/22

Nb

cured/Nb

16/20

38/43

10/11

445/459

22/23

with F−Up

0 .5 .6 .7 .8 .9 1

Proportion

Figure 3 Proportion-cured estimates associated with cold coagulation treatment for CIN1 disease, by world region as analysed bymetaprop.

that the studies are retained, the confidence intervals forthe pooled estimate may yield inadmissible values.Furthermore, use of Wald confidence intervals for the

individual studies when the estimated proportion is closeto zero often yields inadmissible values. This is becausethe Wald confidence intervals are always symmetricaround an estimate. In contrast to the Wald, the exact

or score confidence intervals can be asymmetric espe-cially near the extreme values. By computing the exactor score confidence intervals for the individuals studies,we are guaranteed of admissible values. While the exactconfidence are regarded as the ‘gold’ standard, we rec-ommend the use of score confidence intervals becausethe coverage is close to the nominal level, whereas the

Table 3 Meta-analysis of the presence proportion of women cured of CIN1 disease with cold coagulation)

Study ES [95% Conf. Interval]

Javaheri (1981) 0.957 0.7901 0.9923

Hussein & Galloway (1985) 0.909 0.6226 0.9838

de Cristofaro (1990) 1.000 0.9162 1.0000

Rogstad (1992) 0.800 0.5840 0.9193

Loobuyck & Duncan (1993) 0.969 0.9495 0.9817

Singh (1998) 0.884 0.7552 0.9493

Joshi (2013) 0.909 0.7219 0.9747

Random pooled ES 0.942 0.8855 0.9715

LR test: RE vs FE Model chi2 = 4.04 (d.f.= 1) p= 0.022.Estimate of between-study variance Tau2 = 0.4907.Test of ES= 0 : z= 45.56 p = 0.000.Output generated by the Stata proceduremetaprop_one.

26

Page 40: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. Archives of Public Health 2014, 72:39 Page 10 of 10http://www.archpublichealth.com/content/72/1/39

coverage is always higher than the nominal level for theexact method. By using the Freeman-Tukey double arcsinetransformation, all the studies are retained, furthermore,we are guaranteed to have admissible confidence inter-vals for each individual study as well as for the pooledproportion. While the distribution of the Freeman-Tukeydouble arcsine statistic is more normal for sparse data, theprocedure breaks down with extremely sparse data andshould thus be used with caution [21]. Whenever possi-ble the use of exact methods is more recommended forbinomial data. As the sample size increases and when theproportions are not extreme, methods relying on trans-formed data and exact methods give similar results asapproximate methods.

Conclusionmetaprop enables epidemiologists to pool proportionsin Stata, avoiding problems encountered with metan.metaprop allows inclusion of studies with proportionsequal to zero or 100 percent, and avoids confidence inter-vals exceeding the 0 to 1 range. The logistic-normalrandom-effects model draws the users a step closertowards the use of exact methods recommended for bino-mial data.

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsVN wrote the metaprop program in Stata, analysed the data and draftedmanuscript. MA* conceptualized and initiated the project and edited themanuscript. MA edited the manuscript. All authors reviewed and approvedthe final manuscript.

AcknowledgementsFinancial support was received from: (1) the 7th Framework Programme of DGResearch of the European Commission through the COHEAHR Network (grantNo. 603019, coordinated by the Vrije Universiteit Amsterdam, the Netherlands)and the HPV-AHEAD project (FP7-HEALTH-2011-282562, coordinated by IARC,Lyon, France); (3) The Scientific Institute of Public Health (Brussels, through theOPSADAC project).

Author details1Unit of Cancer Epidemiology, Scientific Institute of Public Health, JulietteWytsmanstraat 14, 1050 Brussels, Belgium. 2Center for Statistics, HasseltUniversity, Agoralaan Building D, 3590 Diepenbeek, Belgium.

Received: 5 May 2014 Accepted: 11 July 2014Published: 10 November 2014

References1. Agresti A, Coull BA: Approximate is better than ’exact’ for interval

estimation of binomial proportions. Am Stat 1998, 52(2):119–126.2. Breslow NE, Clayton DG: Approximate inference in generalized linear

mixedmodels. J Am Stat Assoc 1993, 88:9–25.3. Miller JJ: The inverse of the Freeman-Tukey double arcsine

transformation. Am Stat 1978, 32(4):138.4. Hamza TH, van Houwelingen HC, Stijnen T: The binomial distribution of

meta-analysis was preferred to model within-study variability.J Clin Epidemiol 2008, 61:41–51.

5. Molenberghs G, Verbeke G, Iddib S, Demétrio CGB: A combined betaand normal random-effects model for repeated, over-dispersedbinary and binomial data. J Multivar Anal 2012, 111:94–109.

6. DerSimonian R, Laird N:Meta-analysis in clinical trials. Control Clin Trials1986, 7:177–188.

7. Engel E, Keen A: A simple approach for the analysis of generalizedlinear mixedmodels. Stat Neerl 1994, 48:1–22.

8. Molenberghs G, Verbeke G, Demétrio CGB, Vieira AMC: A family ofgeneralized linear models for repeatedmeasures with normal andconjugate random effects. Stat Sci 2010, 3:325–347.

9. Jackson D, Bowden J, Baker R: How does the Dersimonian and Lairdprocedure for random effects meta-analysis compare with its moreefficient but harder to compute counterparts? J Stat Plan Inference2010, 140:961–970.

10. Harris R, Bradburn M, Deeks J, Harbord R, Altman D, Sterne J:metan:fixed- and random-effects meta-analysis. Stata J 2008, 8(1):3–28.

11. Box GEP, Hunter JS, Hunter WG: Statistics for experimenters. Hoboken (NJ),USA: J Wiley & Sons Inc, Wiley Series in Probability and Statistics; 1978.

12. Freeman MF, Tukey JW: Transformations related to the angular andthe square root. AnnMath Stats 1950, 21(4):607–611.

13. Clopper CJ, Pearson ES: The use of confidence or fiducial limitsillustrated in the case of the binomial. Biometrika 1934, 26(4):404–413.

14. Brown LD, Cai TT, DasGupta A: Interval estimation for a binomialproportion. Stat Sci 2001, 16:404–413.

15. Newcombe RG: Two-sided confidence intervals for the singleproportion: comparison of sevenmethods. StatMed 1998, 17:857–872.

16. Wilson EB: Probable inference, the law of succession, and statisticalinference. J Am Stat Assoc 1927, 22(158):209–212.

17. Arbyn M, Martin-Hirsch P, Buntinx F, Ranst MV, Paraskevaidis E, Dillner J:Triage of women with equivocal or low-grade cervical cytologyresults a meta-analysis of the hpv test positivity rate. J Cell Mol Med2009, 13(4):648–659.

18. Dolman L, Sauvaget C, Muwonge R, Sankaranarayanan R:Meta-analysisof the efficacy of cold coagulation as a treatment method forcervical intra-epithelial neoplasis: a systematic review. BJOG 2014,121:929–942.

19. Arbyn M, Ronco G, Anttila A, Meijer CJLM, Poljak M, Ogilvie G, KoliopoulosG, Naucler P, Sankaranarayanan R, Petok J: Evidence regarding humanpapillomavirus testing in secondary prevention of cervical cancer.Vaccine 2012, 30(Suppl 5):F88–F99.

20. Arbyn M, Roelens J, Simoens C, Buntinx F, Paraskevaidis E, Martin-HirschPP, Prendiville WJ: Human papillomavirus testing versus repeatcytology for triage of minor cytological cervical lesions. CochraneDatabase Syst Rev 2013, 3(CD008054):1–201.

21. Westfall PH, Young SS: Resampling-based multiple testing: examples andmethods for P-value adjustment. Hoboken (NJ), USA: John Wiley & Sons;1993.

doi:10.1186/2049-3258-72-39Cite this article as: Nyaga et al.:Metaprop: a Stata command to performmeta-analysis of binomial data. Archives of Public Health 2014 72:39.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

27

Page 41: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

3. Copula Based Bivariate Beta-Binomial Models for Diagnostic TestAccuracy Studies in a Bayesian Framework

This chapter has been submitted for publication as:Nyaga VN, Arbyn M and Aerts M. CopulaDTA: Copula Based Bivariate Beta-BinomialModels for Diagnostic Test Accuracy Studies in a Bayesian Framework. J Stat Softw,2015, Conditionally accepted for publication.

Page 42: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 43: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

JSS Journal of Statistical SoftwareMMMMMM YYYY, Volume VV, Code Snippet II.doi: 10.18637/jss.v000.i00

CopulaDTA: An R Package for Copula Based

Bivariate Beta-Binomial Models for Diagnostic Test

Accuracy Studies in a Bayesian Framework

Victoria N NyagaSci. Inst. of Public Health,

Hasselt University

Marc ArbynSci. Inst. of Public Health

Marc AertsHasselt University

Abstract

The current statistical procedures implemented in statistical software packages forpooling of diagnostic test accuracy data include hSROC regression (Rutter and Gatso-nis 2001) and the bivariate random-effects meta-analysis model (BRMA) (Reitsma et al.(2005), Arends et al. (2008), Chu and Cole (2006), Riley et al. (2007b)). However, thesemodels do not report the overall mean but rather the mean for a central study withrandom-effect equal to zero and have difficulties estimating the correlation between sen-sitivity and specificity when the number of studies in the meta-analysis is small and/orwhen the between-study variance is relatively large (Riley et al. 2007a).

This tutorial on advanced statistical methods for meta-analysis of diagnostic accuracystudies discusses and demonstrates Bayesian modeling using CopulaDTA (Nyaga 2016)package in R (R Core Team 2016) to fit different models to obtain the meta-analytic pa-rameter estimates. The focus is on the joint modelling of sensitivity and specificity usingcopula based bivariate beta distribution. Essentially, we extend the work of Nikoloulopou-los (2015a) by: i) presenting the Bayesian approach which offers flexibility and ability toperform complex statistical modelling even with small data sets and ii) including covariateinformation, and iii) providing an easy to use code. The statistical methods are illustratedby re-analysing data of two published meta-analyses.

Modelling sensitivity and specificity using the bivariate beta distribution providesmarginal as well as study-specific parameter estimates as opposed to using bivariate nor-mal distribution (e.g., in BRMA) which only yields study-specific parameter estimates.Moreover, copula based models offer greater flexibility in modelling different correlationstructures in contrast to the normal distribution which allows for only one correlationstructure.

Keywords: diagnostic test accuracy, meta-analysis, Bayesian, random-effects, copula, R.

30

Page 44: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

2 CopulaDTA: Bivariate Beta-Binomial Models in R

1. Introduction

In a systematic review of diagnostic test accuracy, the statistical analysis section aims atestimating the average (across studies) sensitivity and specificity of a test and the variabilitythereof, among other measures. There tends to be a negative correlation between sensitivityand specificity, which postulates the need for correlated data models. The analysis is statisti-cally challenging because the user i) deals with two summary statistics, ii) has to account forcorrelation between sensitivity and specificity, iii) has to account for heterogeneity in sensi-tivity and specificity across the studies and iv) should be allowed to incorporate covariates.

Currently, the HSROC regression (Rutter and Gatsonis 2001) or the bivariate random-effectsmeta-analysis model (BRMA) (Reitsma et al. (2005), Arends et al. (2008), Chu and Cole(2006)) is recommended for pooling of diagnostic test accuracy data. These models fit abivariate normal distribution which allows for only one correlation structure to the logittransformed sensitivity and specificity. The resulting distribution has no closed form andtherefore the mean sensitivity and specificity is only estimated after numerical integration orother approximation methods.

When the number of studies in the meta-analysis is small and/or when the data are sparse(very low counts or even zero cells), maximum likelihood estimation of hierarchical modelssuch as the BRMA and HSROC model encounters computational difficulties (non-convergence)or may give no or unreliable estimates for the between study correlation(Takwoingi et al.2015). When the correlation is close to the boundary of its parameter space, the betweenstudy variance estimates from the BRMA are upwardly biased as they compensate for therange restriction on the correlation parameter (Riley et al. 2007a). According to Riley et al.(2007b) this occurs because the maximum likelihood estimator truncates the between-studycovariance matrix on the boundary of its parameter space, and this often occurs when thewithin-study variation is relatively large or the number of studies is small.

For sensitivity and specificity and sample proportions in general, the mean and variance bothdepend on the underlying probability. Therefore, any factor affecting the probability willchange the mean and the variance. This implies that models where the predictors affect themean but assume a constant variance will generally not be adequate. Both the BRMA andHSROC assume that the transformed sensitivity and specificity is approximately normal withconstant variance.

Joint modelling of study specific sensitivity and specificity using a known or copula basedbivariate beta distributions overcomes the above mentioned difficulties. Since both sensitivityand specificity take values in the interval space (0, 1), it is a more natural choice to use a betadistribution to describe their distribution across studies, without the need for any transforma-tion. The beta distribution is conjugate to the binomial distribution and therefore it is easyto integrate out the random-effects analytically giving rise to the beta-binomial marginal dis-tributions. Moreover no further integration is needed to obtain the meta-analytically pooledsensitivity and specificity. Previously, Cong et al. (2007) fitted separate beta-binomial modelsto the number of true positives and the number of false positives. While the model ignorescorrelation between sensitivity and specificity, Cong et al. (2007) reported that the modelestimates are comparable to those from the SROC model (Moses et al. 1993), the predecessorof the HSROC model.

According to Riley (2009), ignoring the correlation would have negligible influence on themeta-analysis results when the within-study variability is large relative to the between-study

31

Page 45: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 3

variability. It is generally known that full parametric specification of a (hierarchical) model,including the specification of an (existing) correlation structure, increases the efficiency ofthe estimation of the parameters, resulting in smaller standard errors. The use of copulabased mixed models within the frequentist framework for meta-analysis of diagnostic testaccuracy was recently introduced by Nikoloulopoulos (2015a) who evaluated the joint densitynumerically.

This tutorial, presents and demonstrates hierarchical mixed models for meta-analysis of di-agnostic accuracy studies. In the first level of the hierarchy, given sensitivity and specificityfor each study, two binomial distributions are used to describe the variation in the numberof true positives and true negatives among the diseased and healthy individuals, respectively.In the second level, we model the unobserved sensitivities and specificities using a bivariatedistribution. While hierarchical models are used, the focus of meta-analysis is on the pooledaverage across studies and rarely on a given study estimate.

The methods are demonstrated using datasets from two previously published meta-analyses:a) on diagnostic accuracy of telomerase in urine as a tumour marker for the diagnosis ofprimary bladder cancer from Glas et al. (2003) previously used by Riley et al. (2007b) andNikoloulopoulos (2015a) since it is a problematic dataset that has convergence issues causedby the correlation parameter being estimated to be -1 and has no covariates and b) onthe comparison of the sensitivity and specificity of human papillomavirus testing (using theHC2 assay) versus repeat cytology to triage women with minor cytological cervical lesions todetect underlying cervical precancer from Arbyn et al. (2013). The second dataset is used todemonstrate meta-regression with one covariate which can be naturally extended to includeseveral covariates.

The layout of this tutorial is as follows: Section 2 introduces the concept of copula theory anddifferent bivariate distributions for sensitivity and specificity. The software implementationand model selection in a Bayesian framework is discussed in Section 3. The two aforemen-tioned datasets are introduced in Section 4. Application of software, code examples and theresults of the models fitted to the data are presented in Section 5 and 6. The complete code isavailable with this article at the Journal of Statistical Software website alongside this article.A brief discussion is found in Section 7 and a conclusion in Section 8.

2. Statistical methods for meta-analysis

2.1. Definition of copula function

A bivariate copula function describes the dependence structure between two random variables.Two random variables X1 and X2 are joined by a copula function C if their joint cumulativedistribution function can be written as

F (x1, x2) = C(F1(x1), F2(x2)), −∞ ≤ x1, x2 ≤ +∞. (1)

where F1 and F2 denote the univariate cumulative distribution function of X1 and X2 respec-tively.

According to the theorem of Sklar (1959), there exists for every bivariate (multivariate inextension) distribution a copula representation C which is unique for continuous random

32

Page 46: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

4 CopulaDTA: Bivariate Beta-Binomial Models in R

variables. If the joint cumulative distribution function and the two marginals are known,then the copula function can be written as

C(u, v) = F (F−11 (u), F−12 (v)), 0 ≤ u, v ≤ 1. (2)

A 2-dimensional copula is in fact simply a 2-dimensional cumulative distribution functionrestricted to the unit square with standard uniform marginals. A comprehensive overview ofcopulas and their mathematical properties can be found in Nelsen (2006). To obtain the jointprobability density, the joint cumulative distribution in Equation 1 is differentiated to yield

f(x1, x2) = f1(x1) f2(x2) c(F1(x1), F2(x2)), (3)

where f1 and f2 denote the marginal density functions and c the copula density functioncorresponding to the copula cumulative distribution function C. Therefore from Equation 3,a bivariate probability density can be expressed using the marginal and the copula density,given that the copula function is absolutely continuous and twice differentiable.

When the functional form of the marginal and the joint densities are known, the copuladensity can be derived as follows

c(F1(x1), F2(x2)) =f(x1, x2)

f1(x1) f2(x2). (4)

While our interest does not lie in finding the copula function, Equation 3 and 4 serve to showhow one can move from the copula function to the bivariate density or vice-versa, given thatthe marginal densities are known. The decompositions allow for constructions of other andpossible better models for the variables than would be possible if we limited ourselves to onlyexisting standard bivariate distributions.

We finish this section by mentioning an important implication when Sklar’s theorem is ex-tended to a meta-regression setting with covariates. According to Patton (2006), it is im-portant that the conditioning variable remains the same for both marginal distributions andthe copula, as otherwise the joint distribution might not be properly defined. This impliesthat covariate information should be introduced in both the marginals and the associationparameters of the model.

2.2. The hierarchical model

Since there are two sources of heterogeneity in the data, the within- and between-studyvariability, the parameters involved in a meta-analysis of diagnostic accuracy studies vary attwo levels. For each study i, i = 1, ..., n, let Yi = (Yi1, Yi2) denote the true positivesand true negatives, Ni = (Ni1, Ni2) the diseased and healthy individuals respectively, andπi = (πi1, πi2) represent the ‘unobserved’ sensitivity and specificity respectively.

Given study-specific sensitivity and specificity, two separate binomial distributions describethe distribution of true positives and true negatives among the diseased and the healthyindividuals as follows

Yij | πij , xi ∼ bin(πij , Nij), i = 1, . . . n, j = 1, 2, (5)

where xi generically denotes one or more covariates, possibly affecting πij . Equation 5 formsthe higher level of the hierarchy and models the within-study variability. The second level

33

Page 47: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 5

of the hierarchy aims to model the between study variability of sensitivity and specificitywhile accounting for the inherent negative correlation thereof, with a bivariate distributionas follows

(g(πi1)g(πi2)

)∼ f(g(πi1), g(πi2)) = f(g(πi1)) f(g(πi2)) c(F1(g(πi1)), F2(g(πi2))), (6)

where g(.) denotes a transformation that is used to map the (0, 1) range to the whole real line.While it is critical to ensure that the studies included in the meta-analysis satisfy the specifiedentry criterion, there are study specific characteristics like different test thresholds and otherunobserved differences that give rise to the second source of variability, the between-studyvariability. It is indeed the difference in the test thresholds between the studies that gives riseto the correlation between sensitivity and specificity. Including study level covariates allowsus to model part of the between-study variability. The covariate information can and should(Patton 2006) be used to model the mean as well as the correlation between sensitivity andspecificity.

In the next section we give more details on different bivariate distributions f(g(πi1), g(πi2))constructed using the logit or identity link function g(.), different marginal densities and/ordifferent copula densities c. We discuss their implications and demonstrate their applicationin meta-analysis of diagnostic accuracy studies. An overview of suitable parametric fami-lies of copula for mixed models for diagnostic test accuracy studies was recently given byNikoloulopoulos (2015a). In the following section, a short description of well-known copulasimplemented in the package is given. Here, we consider five copula functions which can beplugged in Equation 3 to model negative correlation.

Bivariate Gaussian copula

Given the density and the distribution function of the univariate and bivariate standardnormal distribution with correlation parameter ρ ∈ (−1, 1), the bivariate Gaussian copulafunction and density is expressed (Meyer 2013) as

C(u, v, ρ) = Φ2(Φ−1(u), Φ−1(v), ρ),

c(u, v, ρ) =1√

1 − ρ2exp

(2 ρ Φ−1(u) Φ−1(v)− ρ2 (Φ−1(u)2 + Φ−1(v)2)

2 (1− ρ2)

). (7)

The logit transformation is often used in binary logistic regression to relate the probability of“success” (coded as 1, failure as 0) of the binary response variable with the linear predictormodel that theoretically can take values over the whole real line. In diagnostic test accuracystudies, the ‘unobserved’ sensitivities and specificities can range from 0 to 1 whereas theirlogits = log(

πij1 − πij

) can take any real value allowing to use the normal distribution as follows

logit(πij) ∼ N(µj , σj) <=> logit(πij) = µj + εij , (8)

where, µj is a vector of the mean sensitivity and specificity for a study with zero randomeffects, and εi is a vector of random effects associated with study i. Now u is the normaldistribution function of logit(πi1) with parameters µ1 and σ1, v is the normal distributionfunction of logit(πi2) with parameters µ2 and σ2, Φ2 is the distribution function of a bivariatestandard normal distribution with correlation parameter ρ ∈ (−1, 1) and Φ−1 is the quantileof the standard normal distribution. In terms of ρ, Kendall’s tau is expressed as ( 2

π )arcsin(ρ).

34

Page 48: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

6 CopulaDTA: Bivariate Beta-Binomial Models in R

With simple algebra the copula density in Equation 7 with normal marginal distributionssimplifies to

c(u, v, ρ) =1√

1− ρ2exp

(1

2 (1 − ρ2)

(2 ρ (x− µ1) (y − µ2)

σ1 σ2− ρ2

((x − µ1)

2

σ21+

(y − µ2)2

σ22

))).

(9)

The product of the copula density in Equation 9, the normal marginal of logit(πi1) andlogit(πi2) in Equation 8 form a bivariate normal distribution which characterize the modelby Reitsma et al. (2005), Arends et al. (2008), Chu and Cole (2006), and Riley et al. (2007a),the so-called bivariate random-effects meta-analysis (BRMA) model, recommended as theappropriate method for meta-analysis of diagnostic accuracy studies. Study level covariateinformation explaining heterogeneity is introduced through the parameters of the marginaland the copula as follows

µj = XjB>j . (10)

Xj is a n × p matrix containing the covariates values for the mean sensitivity (j = 1 ) andspecificity (j = 2 ). For simplicity, assume that X1 = X2 = X. B>j is a p × 1 vector ofregression parameters, and p is the number of parameters. By inverting the logit functionsin Equation 8, we obtain

πij = logit−1(µj + εij). (11)

Therefore, the meta-analytic sensitivity and specificity obtained by averaging over the randomstudy effect, is given by, for j = 1, 2

E(πj) = E(logit−1(µj + εij)) =

∫ ∞

−∞logit−1(µj + εij)f(εij , σj) dεij , (12)

assuming that σ21 > 0 and σ22 > 0. The integration in Equation 12 has no analytical expressionand therefore needs to be numerically approximated and the standard are not easily available.Using MCMC simulation in the Bayesian framework the meta-analytic estimates can be easilycomputed as well as a standard error estimate and a credible intervals E(πj) with minimumeffort by generating predictions of the fitted bivariate normal distribution.

In the frequentist framework, it is more convenient however to use numerical averaging bysampling a large number M of random-effects εij from the fitted distribution and to estimatethe meta-analytic sensitivity and specificity by (Molenberghs and Verbeke 2005), for j = 1, 2

E(πj) =1

M

M∑

i = 1

logit−1(µj + εij). (13)

However, inference is not straightforward in the frequentist framework since the standarderrors are not available. When εij = 0, then

E(πj | εij = 0) = logit−1(µj). (14)

Inference for E(πj | εij = 0), as expressed in Equation 14, can be done in both Bayesianand frequentist framework. The equation represents the mean sensitivity and specificity for a“central” study with εij = 0. Researchers often seem to confuse E(πj | εij = 0) with E(πj)but due to the non-linear logit transformations, they are clearly not the same parameter.

35

Page 49: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 7

With the identity link function, no transformation on study-specific sensitivity and specificityis performed. A natural choice for u and v would be beta distribution functions with pa-rameters (α1, β1) and (α2, β2) respectively. Since πij ∼ beta(αj , βj), the meta-analyticsensitivity and specificity are analytically solved as follows

E(πj) =αj

αj + βj, (15)

After reparameterising the beta distributions using the mean (µj =αj

αj + βj) and certainty

(ψj = αj + βj) or dispersion (ϕj = 11 + αj + βj

) parameters different link functions intro-

duce covariate information to the mean, certainty/dispersion and association (ρ) parameters.A typical model parameterisation is

µj = logit−1(XB>j ),

ψj = g(WC>j ),

αj = µj ψj ,βj = (1 − µj) ψj ,

ρ = tanh(ZD>j ) =exp(2× ZD>j ) − 1

exp(2× ZD>j ) + 1. (16)

X, W and Z are a n × p matrices containing the covariates values for the mean, dispersionand correlation which we will assume has similar information and denoted by X for simplicitypurpose, p is the number of parameters, B>j , V>j and D>j are a p × 1 vectors of regressionparameters relating covariates to the mean, variance and correlation respectively. g(.) is thelog link to mapping XC>j to the positive real number line and is the Hadamard product.

Frank copula

This flexible copula in the so-called family of Archimedean copulas was introduced by Frank(1979). The functional form of the copula and the density which is plugged in Equation 3 isgiven by;

C(F (πi1), F (πi2), θ) = − 1

θlog

[1 +

(e−θ F (πi1) − 1)(e−θ F (πi2) − 1)

e−θ − 1

],

c(F (πi1), F (πi2), θ) =θ (1− e−θ) e−θ (F (πi1) + F (πi2))

[1− e−θ − (1− e−θ F (πi1)) (1− e−θ F (πi2))]2. (17)

Since θ ∈ R, both positive and negative correlation can be modelled, making this one of themore comprehensive copulas. When θ is 0, sensitivity and specificity are independent. Forθ > 0, sensitivity and specificity exhibit positive quadrant dependence and negative quadrantdependence when θ < 0. The Spearman correlation ρs and Kendall’s tau τk can be expressedin terms of θ as

ρs = 1− 12D2(−θ) − D1(−θ)

θ,

τk = 1 + 4D1(θ) − 1

θ, (18)

36

Page 50: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

8 CopulaDTA: Bivariate Beta-Binomial Models in R

where Dj(δ) is the Debye function defined as

Dj(δ) =j

δj

∫ δ

θ

tj

exp(t) − 1dt, j = 1, 2. (19)

Covariate information is introduced in a similar manner as Equation 16. The identity link isused for the association parameter θ.

Farlie-Gumbel-Morgenstern copula (FGM)

This popular copula studied by Farlie (1960), Gumbel (1960) and Morgenstern (1956) isdefined as

C(F (πi1), F (πi2), θ) = F (πi1) F (πi2)[1 + θ (1− F (πi1)) (1 − F (πi2))],

c(F (πi1), F (πi2), θ) = [1 + θ (2 F (πi1) − 1) (2 F (πi2) − 1)]. (20)

Because θ ∈ (−1, 1), the Spearman correlation and Kendall’s tau are expressed in terms of θ asθ/3 and 2θ/9 respectively, making this copula only appropriate for data with weak dependencesince |ρs| ≤ 1/3. In a similar manner as in Equation 16 the logit link, log/identity link andFisher’s z transformation can be used to introduce covariate information in modelling themean, dispersion and association parameter.

Clayton copula

The Clayton copula function and density by Clayton (1978) is defined as

C(F (πi1), F (πi2), θ) = [F (πi1)−θ + F (πi2)

−θ − 1]−1θ ,

c(F (πi1), F (πi2), θ) = (1 + θ) F (πi1)−(1 + θ) F (πi2)

−(1+θ) [F (πi1)−θ + F (πi2)

−θ − 1]−(2 θ + 1)

θ .(21)

Since θ ∈ (0, ∞), the Clayton copula typically models positive dependence; Kendall’s tauequals θ/(θ + 2). However, the copula function can be rotated by 90 or 270 to modelnegative dependence. The distribution and density functions following such rotations aregiven by

C90(F (πi1), F (πi2), θ) = F (πi2) − C(1 − F (πi1), F (πi2), θ),

c90(F (πi1), F (πi2), θ) = (1 + θ)(1 − F (πi1))−(1 + θ) F (πi2)

−(1 + θ) [(1− F (πi1))−θ

+ F (πi2)−θ − 1]

−(2 θ + 1)θ , (22)

and

C270(F (πi1), F (πi2), θ) = F (πi1)− C(F (πi1), 1 − F (πi2), θ),

c270(F (πi1), F (πi2), θ) = (1 + θ) F (πi1)−(1 + θ) (1 − F (πi2))

−(1 + θ) [F (πi1)−θ

+ (1 − F (πi2))−θ − 1]

−(2 θ + 1)θ . (23)

The logit, log/identity and log/identity links can be used to introduce covariate informationin modelling the mean (µj), certainty (ψj)/dispersion (ϕj) and association (θ) parametersrespectively in the same way as in Equation 16.

37

Page 51: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 9

Of course other copula functions that allow for negative association can be chosen. It is alsoan option to use known bivariate beta distributions. However, it is not always straightfor-ward and analytically attractive to derive the corresponding copula function for all bivariatedistributions. The use of existing bivariate beta distributions in meta-analysis of diagnosticaccuracy studies has been limited because these densities model positive association ( e.g.,Libby and Novick (1982), Olkin and Liu (2003)), or both positive and negative associationbut over a restricted range (e.g., Sarmanov (1966)).

3. Software development and model diagnostics

3.1. The CopulaDTA package

The CopulaDTA package is an R package for modelling diagnostic test accuracy data usingcopula based bivariate beta-binomial distribution and providing estimates for the marginalmean sensitivity and specificity. It is an extension of rstan (Stan 2016), the R interfaceto Stan (Carpenter et al. 2016) for diagnostic test accuracy data. Stan is a probabilisticprogramming language which has implemented Hamilton Monte Carlo (HMC) and uses No-U-Turn sampler (NUTS) (Hoffman and Gelman 2014). The package facilitates easy applicationof complex models and their visualization within the Bayesian framework.

JAGS (Plummer 2003) is an alternative extensible general purpose sampling engine to Stan.Extending JAGS requires knowledge of C++ to assemble a dynamic link library (DLL) module.From experience, configuring and building the module is a daunting and tedious task especiallyin the Windows operation system. The above short-comings coupled with the fact that Stantends to converge with fewer iterations even from bad initial values than JAGS made us preferthe Stan MCMC sampling engine.

The CopulaDTA package is available via the Comprehensive R Archive Network (CRAN)at http://CRAN.R-project.org/package=CopulaDTA. With a working internet connection,the CopulaDTA package is installed and loaded in R with the following commands

R> install.packages("CopulaDTA", dependencies = TRUE)

R> library("CopulaDTA")

The CopulaDTA package provide functions to fit bivariate beta-binomial distributions con-structed as a product of two beta marginal distributions and copula densities discussed inSection 2. The package also provides forest plots for a model with categorical covariates orwith intercept only. Given the chosen copula function, a beta-binomial distribution is assem-bled up by the cdtamodel function which returns a cdtamodel object. The main function fit

takes the cdtamodel object and fits the model to the given dataset and returns a cdtafit

object for which print, summary and plot methods are provided for.

3.2. Model diagnostics

To assess model convergence, mixing and stationarity of the chains, it is necessary to check thepotential scale reduction factor R, effective sample size (ESS), MCMC error and trace plotsof the parameters. When all the chains reach the target posterior distribution, the estimatedposterior variance is expected to be close to the within chain variance such that the ratio of

38

Page 52: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

10 CopulaDTA: Bivariate Beta-Binomial Models in R

the two, R is close to 1 indicating that the chains are stable, properly mixed and likely to havereached the target distribution. A large R indicates poor mixing and that more iterationsare needed. Effective sample size indicates how much information one actually has about acertain parameter. When the samples are auto correlated, less information from the posteriordistribution of our parameters is expected than would be if the samples were independent.ESS close to the total post-warm-up iterations is an indication of less autocorrelation andgood mixing of the chains. Simulations with higher ESS have lower standard errors andmore stable estimates. Since the posterior distribution is simulated there is a chance that theapproximation is off by some amount; the Monte Carlo (MCMC) error. MCMC error closeto 0 indicates that one is likely to have reached the target distribution.

3.3. Model comparison and selection

Watanabe-Alkaike Information Criterion (WAIC) (Watanabe 2010), a recent model compar-ison tool to measure the predictive accuracy of the fitted models in the Bayesian framework,will be used to compare the models. WAIC can be viewed as an improvement of Deviance In-formation Criterion(DIC) which, though popular, is known to be have some problems (Plum-mer 2008). WAIC is a fully Bayesian tool, closely approximates the Bayesian cross-validation,is invariant to reparameterisation and can be used for simple as well as hierarchical and mix-ture models.

4. Datasets

4.1. Telomerase data

Glas et al. (2003) systematically reviewed the sensitivity and specificity of cytology and othermarkers including telomerase for primary diagnosis of bladder cancer. They fitted a bivariatenormal distribution to the logit transformed sensitivity and specificity values across the studiesallowing for heterogeneity between the studies. From the included 10 studies, they reportedthat telomerase had a sensitivity and specificity of 0.75 [0.71, 0.79] and 0.86 [0.71, 0.94]respectively. They concluded that telomerase was not sensitive enough to be recommendedfor daily use. This dataset is available within the package and the following commands

R> data("telomerase")

R> telomerase

loads the data into the R enviroment and generates the following output

ID Dis TP NonDis TN

1 1 33 25 26 25

2 2 21 17 14 11

3 3 104 88 47 31

4 4 26 16 83 80

5 5 57 40 138 137

6 6 47 38 30 24

7 7 42 23 12 12

39

Page 53: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 11

8 8 33 27 20 18

9 9 17 14 32 29

10 10 44 37 29 7

ID is the study identifier, DIS is the number of diseased, TP is the number of true positives,NonDis is the number of healthy and TN is the number of true negatives.

4.2. ASCUS triage data

Arbyn et al. (2012) and Arbyn et al. (2013) performed a meta-analysis and Cochrane reviewon the accuracy of human papillomavirus testing and repeat cytology to triage of womenwith an equivocal Pap smear to diagnose cervical precancer. They fitted the BRMA modelin SAS using METADAS on 10 studies where both tests were used. They reported absolutesensitivity of 0.91 [0.86, 0.94] and 0.72 [0.63, 0.79] for HC2 and repeat cytology respectively.The specificity was 0.61 [0.54, 0.68] and 0.68 [0.60, 0.76] for HC2 and repeat cytology re-spectively. These data is used to demonstrate how the intercept only model is extended in ameta-regression setting. This dataset is also available within the package and the followingcommands

R> data("ascus")

R> ascus

loads the data into the R enviroment and generates the following output

Test StudyID TP FP TN FN

1 RepC Andersson 2005 6 14 28 4

2 RepC Bergeron 2000 8 28 71 4

3 RepC Del Mistro 2010 20 191 483 7

4 RepC Kulasingam 2002 20 74 170 6

5 RepC Lytwyn 2000 4 20 26 2

6 RepC Manos 1999 48 324 570 15

7 RepC Monsonego 2008 10 18 168 15

8 RepC Morin 2001 14 126 214 5

9 RepC Silverloo 2009 24 43 105 10

10 RepC Solomon 2001 227 1132 914 40

11 HC2 Andersson 2005 6 17 25 4

12 HC2 Bergeron 2000 10 38 61 2

13 HC2 Del Mistro 2010 27 154 566 2

14 HC2 Kulasingam 2002 23 115 129 3

15 HC2 Lytwyn 2000 4 19 33 1

16 HC2 Manos 1999 58 326 582 7

17 HC2 Monsonego 2008 22 110 72 2

18 HC2 Morin 2001 17 88 253 2

19 HC2 Silverloo 2009 34 65 81 2

20 HC2 Solomon 2001 256 1050 984 11

Test is an explanatory variable showing the type of triage test, StudyID is the study identifier,TP is the number of true positives, FP is the number of false positives, TN is the number oftrue negatives, FN is the number of false negatives.

40

Page 54: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

12 CopulaDTA: Bivariate Beta-Binomial Models in R

5. The intercept only model

The CopulaDTA package has five different correlation structures that result to five differ-ent bivariate beta-binomial distributions to fit to the data. The correlation structure isspecified by indicating copula ~=~"gauss" or "fgm" or "c90" or "c270" or "frank" in thefitcopula function. The Gaussian copula bivariate beta-binomial distribution is fitted tothe telomerase data with the following code

R> gauss.1 <- cdtamodel("gauss")

R> fitgauss.1 <- fit(gauss.1, data = telomerase, SID = "ID", iter = 28000,

+ warmup = 1000, thin = 30, seed = 3)

By default, chains = 3 and cores = 3 and need not be specified unless otherwise. From thecode above, 28000 samples are drawn from each of the 3 chains, the first 1000 samples arediscarded and thereafter every 30th draw kept such that each chain has 900 post-warm-updraws making a total of 2700 post-warm-up draws. The seed value, seed = 3, specifies arandom number generator to allow reproducibility of the results and cores = 3 allows forparallel-processing of the chains by using 3 cores, one core for each chain. They were no initialvalues specified and in that case, the program randomly generates random values satisfyingthe parameter constraints. The trace plots in the top-left panel of Figure 1 produced withthe code below show satisfactory mixing of the chains and convergence.

R> traceplot(fitgauss.1)

Next, obtain the model summary estimates as follows

R> print(fitgauss.1, digits = 2)

Posterior marginal mean sensitivity and specificity

with 95% credible intervals

Parameter Mean Lower Upper n_eff Rhat

MUse[1] Sensitivity 0.76 0.68 0.76 0.82 672 1

MUsp[1] Specificity 0.80 0.64 0.81 0.91 235 1

ktau[1] Correlation -0.84 -0.99 -0.91 -0.36 490 1

Model characteristics

Copula function: gauss, sampling algorithm: NUTS(diag_e)

Formula(1): MUse ~ 1

Formula(2): MUsp ~ 1

Formula(3): Omega ~ 1

3 chain(s)each with iter=28000; warm-up=1000; thin=30.

post-warmup draws per chain=900;total post-warmup draws=2700.

Predictive accuracy of the model

Log point-wise predictive density (LPPD): -37.99

Effective number of parameters: 7.11

Watanabe-Akaike information Criterion (WAIC): 90.20

41

Page 55: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 13

MUse[1] MUsp[1]

0.6

0.7

0.8

0.5

0.6

0.7

0.8

0.9

1.0

0 250 500 750 0 250 500 750

chain

1

2

3

MUse[1] MUsp[1]

0.6

0.7

0.8

0.5

0.6

0.7

0.8

0.9

0 250 500 750 0 250 500 750

chain

1

2

3

MUse[1] MUsp[1]

0.60

0.65

0.70

0.75

0.80

0.85

0.4

0.6

0.8

250 500 750 250 500 750

chain

1

2

3

MUse[1] MUsp[1]

0.6

0.7

0.8

0.5

0.6

0.7

0.8

0.9

250 500 750 250 500 750

chain

1

2

3

MUse[1] MUsp[1]

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0.9

250 500 750 250 500 750

chain

1

2

3

MU[1] MU[2]

0.6

0.7

0.8

0.9

0.4

0.6

0.8

1.0

100 200 300 400 500 100 200 300 400 500

chain

1

2

3

Figure 1: Trace plots of the posterior mean sensitivity and specificity for the telomerase

data as estimated by the Gaussian, Clayton 90 (C90) and 270 (C270), Farlie-Gumbel-Morgenstern (FGM) and Frank copula based bivariate beta and bivariate normal (BRMA)distributions.

From the output above, n_eff and Rhat both confirm proper mixing of the chains with littleautocorrelation. The meta-analytic sensitivity MUse[1] and specificity MUsp[1] is 0.76 [0.68,0.82] and 0.80 [0.64, 0.91] respectively. The Kendall’s tau correlation between sensitivity andspecificity is estimated to be -0.84 [-0.91, -0.36].

The command below produces a forest plot in Figure 2.

R> plot(fitgauss.1, graph = 3, title.3 = "" )

As observed in Figure 2, the posterior study-specific sensitivity and specificity are less extremeand variable than the ‘observed’ study-specific sensitivity and specificity. In other words, thereis ‘shrinkage’ towards the overall mean sensitivity and specificity as studies borrow strengthfrom each other in the following manner: the posterior study-specific estimates depends onthe global estimate and thus also on all other the studies.

The mean sensitivity and specificity as estimated by the other four copula based bivariatebeta distributions are in Table 1 and graphically shown in Figure 3. Though not presentedhere, the full code of the other four fitted copula based bivariate beta distributions is inthe replication code. Figure 1 shows satisfactory chain mixing with little autocorrelationapart from the ‘Clayton270’ model. The Clayton copula is known to be unstable when thecorrelation parameter is close to the boundaries (-1 or 0) and this could be the reason whysampling from the posterior distribution was difficult.

For comparison purpose, the current recommended model; the BRMA, which uses normalmarginals is also fitted to the data though it is not part of the CopulaDTA package. Themodel is first expressed in Stan modelling language in the code below and is stored within R

42

Page 56: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

14 CopulaDTA: Bivariate Beta-Binomial Models in R

Sensitivity Specificity

Overall

10

9

8

7

6

5

4

3

2

1

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

ID

Plot of study−specific sensitivity and specificity byID: marginal mean and 95% CI

Figure 2: Plot of the study-specific sensitivity and specificity (magenta points) and theircorresponding 95 % exact confidence intervals (thick grey lines), superimposed with the pos-terior estimates (blues stars) and their corresponding 95 % credible intervals (think blacklines). Posterior estimates from the Gaussian copula based bivariate beta distribution for thetelomerase data.

environment as character string named BRMA1.

R> BRMA1 <- "

data

int<lower = 0> Ns;

int<lower = 0> tp[Ns];

int<lower = 0> dis[Ns];

43

Page 57: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 15

int<lower = 0> tn[Ns];

int<lower = 0> nondis[Ns];

parameters

real etarho;

vector[2] mul;

vector<lower = 0>[2] sigma;

vector[2] logitp[Ns];

vector[2] logitphat[Ns];

transformed parameters

vector[Ns] p[2];

vector[Ns] phat[2];

real MU[2];

vector[2] mu;

real rho;

real ktau;

matrix[2, 2] Sigma;

rho = tanh(etarho);

ktau = (2 / pi()) * asin(rho);

for (a in 1:2)

for (b in 1:Ns)

p[a][b] = inv_logit(logitp[b][a]);

phat[a][b] = inv_logit(logitphat[b][a]);

mu[a] = inv_logit(mul[a]);

MU[1] = mean(phat[1]);

MU[2] = mean(phat[2]);

Sigma[1, 1] = sigma[1]^2;

Sigma[1, 2] = sigma[1] * sigma[2] * rho;

Sigma[2, 1] = sigma[1] * sigma[2] * rho;

Sigma[2, 2] = sigma[2]^2;

model

etarho ~ normal(0, 10);

mul ~ normal(0, 10);

sigma ~ cauchy(0, 2.5);

for (i in 1:Ns)

logitp[i] ~ multi_normal(mul, Sigma);

logitphat[i] ~ multi_normal(mul, Sigma);

tp ~ binomial(dis, p[1]);

tn ~ binomial(nondis, p[2]);

generated quantities

44

Page 58: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

16 CopulaDTA: Bivariate Beta-Binomial Models in R

vector[Ns * 2] loglik;

for (i in 1:Ns)

loglik[i] = binomial_lpmf(tp[i], dis[i], p[1][i]);

for (i in (Ns + 1):(2 * Ns))

loglik[i] = binomial_lpmf(tn[i-Ns], nondis[i-Ns], p[2][i-Ns]);

"

Next, prepare the data by creating as list as follows

R> datalist = list(tp = telomerase$TP, dis = telomerase$TP + telomerase$FN,

+ tn = telomerase$TN, nondis = telomerase$TN + telomerase$FP,

+ Ns = 10)

In the data block the dimensions and names of variables in the dataset are specified, here Ns

indicate the number of studies in the dataset. The parameters block introduces the unknownparameters to be estimated. These are etarho; a scalar representing the Fisher’s transformedform of the association parameter ρ, mul;a 2 × 1 vector representing the mean of sensitivityand specificity on the logit scale for a central study where the random-effect is zero, sigma; a2 × 1 vector representing the between study standard deviation of sensitivity and specificityon the logit scale, logitp; a Ns × 2 array of study-specific sensitivity in the first column andspecificity in the second column on logit scale, and logitphat; a Ns × 2 array of predictedsensitivity in the first column and predicted specificity in the second column on logit scale.

The parameters are further transformed in the transformed parameters block. Here, p is a 2× Ns array of sensitivity in the first column and specificity in the second column after inverselogit transformation of logitp, and phat is a 2 × Ns array of predicted sensitivity in thefirst column and predicted specificity in the second column after inverse logit transformationof logitphat to be used in computing the meta-analytic sensitivity and specificity. mu is a2 × 1 vector representing the mean of sensitivity and specificity for a certain study with arandom effect equal to 0, MU is a 2 × 1 vector containing the meta-analytic sensitivity andspecificity, Sigma; a 2 × 2 matrix representing the variance-covarince matrix of sensitivityand specificity on the logit scale, rho and ktau are scalars representing the Pearson’s andKendall’s tau correlation respectively. The prior distributions for the all parameters anddata likelihood are defined in the model block. Finally, in the generated quantities block,loglik is a (2Ns) × 1 vector of the log likelihood needed to compute the WAIC.

Next, call the function stan from the rstan package to translate the code into C++, compilethe code and draw samples from the posterior distribution as follows

R> brma.1 <- stan(model_code = BRMA1, data = datalist, chains = 3,

+ iter = 5000, warmup = 1000, thin = 10, seed = 3, cores = 3)

The parameter estimates are extracted and the chain convergence and autocorrelation exam-ined further with the following code

R> print(brma.1, pars = c('MU', 'mu', 'rho'), digits = 2, prob = c(0.025, 0.975))

45

Page 59: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 17

The above lines of code print the following output

Inference for Stan model: 61572683b29d52354783115614fab729.

3 chains, each with iter=5000; warmup=1000; thin=10;

post-warmup draws per chain=400, total post-warmup draws=1200.

mean se_mean sd 2.5% 50% 97.5% n_eff Rhat

MU[1] 0.75 0.00 0.05 0.63 0.76 0.84 796 1.00

MU[2] 0.79 0.00 0.11 0.53 0.81 0.95 1045 1.00

mu[1] 0.77 0.00 0.04 0.69 0.77 0.84 891 1.00

mu[2] 0.89 0.00 0.08 0.69 0.91 0.98 789 1.00

rho -0.93 0.01 0.14 -1.00 -0.98 -0.56 372 1.01

Samples were drawn using NUTS(diag_e) at Wed Aug 03 09:48:21 2016.

For each parameter, n_eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor on split chains (at

convergence, Rhat=1).

The meta-analytic sensitivity (MU[1]) and specificity (MU[2]) and 95% credible intervals are0.75 [0.63, 0.84] and 0.79 [0.53, 0.95] respectively. This differs from what the authors published(0.75 [0.71, 0.79] and 0.86 [0.71, 0.94]) in two ways. The authors fitted the standard bivariatenormal distribution to the logit transformed sensitivity and specificity values across the studiesallowing for heterogeneity between the studies as expressed in Equation 6 and disregarded thehigher level of the hierarchical model expressed in Equation 5. Because of this the authors hadto use a continuity correction of 0.5 since the seventh study had ‘observed’ specificity equal to1, a problem not encountered in the hierarchical model. Secondly the authors do not reportthe meta-analytic values but rather report the mean sensitivity(mu[1]) and specificity (mu[2])for a particular, hypothetical study with random-effect equal to zero, which in our case is 0.77[0.69, 0.84] and 0.89 [0.69, 0.98] respectively and is comparable to what the authors reported.This discrepancy between MU and mu will indeed increase with increase in the between studyvariability.

5.1. Model comparison

Table 1 shows that the correlation as estimated by the BRMA model and the Gaussian copulabivariate beta are more extreme and equivalent compared to the estimates from the Frank,90- and 270- Clayton copula. On the other extreme is the estimate from the model FGMcopula bivariate beta and this is due to the constraints on the association parameter in theFGM copula where values lie within |2/9|.In Figure 3, the marginal mean sensitivity and specificity from the five bivariate beta dis-tributions are comparable with subtle differences in the 95 percent credible intervals despitedifferences in the correlation structure.

Glas et al. (2003) and Riley et al. (2007a) estimated the Pearson’s correlation parameterin the BRMA model ρ as -1 within the frequentist framework. Using maximum likelihoodestimation, Riley et al. (2007b) showed that the between-study correlation from the BRMA isoften estimated as +/-1. Without estimation difficulties, the estimated Pearson’s correlationwas -0.93 [-0.99, -0.56]. This is because Bayesian methods are not influenced by sample size

46

Page 60: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

18 CopulaDTA: Bivariate Beta-Binomial Models in R

Model Parameter Mean Lower Median Upper n eff Rhat WAIC

GaussianSensitivity 0.76 0.68 0.76 0.82 671.72 1.00

90.20Specificity 0.80 0.64 0.81 0.91 234.69 1.01Correlation -0.84 -0.99 -0.91 -0.36 490.24 1.01

C90Sensitivity 0.75 0.69 0.76 0.82 49.46 1.05

93.20Specificity 0.81 0.64 0.81 0.92 67.67 1.03Correlation -0.63 -0.97 -0.79 0.00 41.75 1.06

C270Sensitivity 0.76 0.70 0.76 0.82 100.91 1.02

89.58Specificity 0.78 0.65 0.79 0.90 20.24 1.14Correlation -0.51 -0.98 -0.62 0.00 9.80 1.45

FGMSensitivity 0.76 0.69 0.76 0.81 2656.52 1.00

94.99Specificity 0.81 0.65 0.81 0.91 2678.45 1.00Correlation -0.19 -0.22 -0.22 0.22 931.09 1.00

FrankSensitivity 0.76 0.69 0.76 0.82 2510.07 1.00

89.94Specificity 0.81 0.66 0.82 0.91 2535.39 1.00Correlation -0.70 -0.85 -0.69 1.00 2700.00 NaN

BRMASensitivity 0.75 0.63 0.76 0.84 796.04 1.00

86.76Specificity 0.79 0.53 0.81 0.95 1044.90 1.00Correlation -0.82 -0.98 -0.88 -0.38 238.27 1.02

Table 1: The posterior mean, median, 95% credible interval, effective sample size and potentialscale reduction factor R factor for the marginal means and Kendall’s tau parameters asestimated by the Gaussian, Clayton 90 (C90) and 270 (C270), Farlie-Gumbel-Morgenstern(FGM) and Frank copula based bivariate beta and bivariate normal (BRMA) distributionsfor the telomerase data.

and therefore able to handle cases of small sample sizes with less issues.

Essentially, all the six models are equivalent in the first level of hierarchy and differ in spec-ifying the prior distributions for the ‘study-specific’ sensitivity and specificity. As thus, themodels should have the same number of parameters in which case it makes sense then tocompare the log predictive densities. Upon inspection, the log predictive densities from thesix models are practically equivalent (min=-38.48, max=-37.40) but the effective number ofparameters differed a bit (max=6.00, max=9.01). Surprisingly, the last column of Table 1indicates that the BRMA fits the data best based on the WAIC.

6. Meta-regression

The ascus dataset has Test as a covariate. The covariate is used as it is of interest to studyits effect on the joint distribution of sensitivity and specificity (including the correlation). Thefollowing code fits the FGM copula based bivariate beta-binomial distribution to the data

R> fgm.2 <- cdtamodel(copula = "fgm",

+ modelargs = list(formula.se = StudyID ~ Test + 0))

R> fitfgm.2 <- fit(fgm.2, data = ascus, SID = "StudyID", iter = 19000,

+ warmup = 1000, thin = 20, seed = 3)

47

Page 61: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 19

BRMA

C270

C90

FGM

Frank

Gaussian

0.6 0.8 1.0Sensitivity Specificity

0.6 0.8 1.0

Mean [95% equal−tailed credible intervals]

Model

Figure 3: Plot of the posterior meta-analytic sensitivity (upper) and specificity (lower) andthe correspondinb 95% credible intervals) as estimated by the Gaussian, Clayton 90 (C90)and 270 (C270), Farlie-Gumbel-Morgenstern (FGM) and Frank copula based bivariate betaand bivariate normal (BRMA) distributions for the telomerase data.

Figure 4 shows the trace plots for all the six models fitted to the ascus data where allparameters, including the correlation parameter(except the BRMA) are modeled as a functionof the covariate. There is proper chains mixing and convergence except for the case of theClayton copula based bivariate beta. From the posterior relative sensitivity and specificityplotted in Figure 5, all the models that converged generally agree that repeat cytology wasless sensitive than HC2 without significant loss in specificity.

The n_eff in Table 2 indicate substantial autocorrelation in sampling the correlation param-

48

Page 62: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

20 CopulaDTA: Bivariate Beta-Binomial Models in R

MUse[1] MUse[2]

MUsp[1] MUsp[2]

0.80

0.85

0.90

0.95

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0.5

0.6

0.7

0.8

250 500 750 250 500 750

250 500 750 250 500 750

chain

1

2

3

MUse[1] MUse[2]

MUsp[1] MUsp[2]

0.75

0.80

0.85

0.90

0.95

0.5

0.6

0.7

0.8

0.50

0.55

0.60

0.65

0.70

0.6

0.7

0.8

250 500 750 250 500 750

250 500 750 250 500 750

chain

1

2

3

MUse[1] MUse[2]

MUsp[1] MUsp[2]

0.75

0.80

0.85

0.90

0.95

0.6

0.7

0.8

0.500.550.600.650.70

0.5

0.6

0.7

0.8

250 500 750 250 500 750

250 500 750 250 500 750

chain

1

2

3

MUse[1] MUse[2]

MUsp[1] MUsp[2]

0.8

0.9

0.5

0.6

0.7

0.8

0.450.500.550.600.650.70

0.5

0.6

0.7

0.8

250 500 750 250 500 750

250 500 750 250 500 750

chain

1

2

3

MUse[1] MUse[2]

MUsp[1] MUsp[2]

0.7

0.8

0.9

0.5

0.6

0.7

0.8

0.5

0.6

0.7

0.5

0.6

0.7

0.8

250 500 750 250 500 750

250 500 750 250 500 750

chain

1

2

3

MU[1,1] MU[1,2]

MU[2,1] MU[2,2]

0.80

0.85

0.90

0.95

0.4

0.5

0.6

0.7

0.8

0.5

0.6

0.7

0.8

0.40.50.60.70.8

100 200 300 400 500 100 200 300 400 500

100 200 300 400 500 100 200 300 400 500

chain

1

2

3

Figure 4: Trace plots of the posterior mean sensitivities and specificities for the ascus data asestimated by the Gaussian, Clayton 90 (C90) and 270 (C270), Farlie-Gumbel-Morgenstern(FGM) and Frank copula based bivariate beta and bivariate normal (BRMA) distributions.

Model Test Parameter Mean Lower Median Upper n eff Rhat WAIC

GaussianHC2 Correlation -0.50 -0.98 -0.69 0.85 661.40 1.00

237.30Repc Correlation -0.90 -0.99 -0.94 -0.58 498.93 1.00

C90HC2 Correlation -0.13 -0.80 0.00 0.00 4.96 1.30

227.00Repc Correlation -0.70 -0.98 -0.88 0.00 3.08 2.25

C270HC2 Correlation -0.03 -0.56 0.00 0.00 35.78 1.08

270.10Repc Correlation -0.55 -0.96 -0.35 -0.16 2.40 2.35

FGMHC2 Correlation -0.08 -0.22 -0.22 0.22 913.98 1.00

243.50Repc Correlation -0.20 -0.22 -0.22 0.19 2675.94 1.00

FrankHC2 Correlation -0.51 -0.82 -0.49 1.00 2700.00 NaN

236.90Repc Correlation -0.75 -0.86 -0.75 1.00 2700.00 NaN

BRMA Both Correlation -0.85 -0.98 -0.90 -0.45 102.98 1.03 233.70

Table 2: The posterior mean, median, 95% credible intervals, effective sample size, potentialscale reduction factor of the correlation parameter(s) as estimated by the Gaussian, Clay-ton 90 (C90) and 270 (C270), Farlie-Gumbel-Morgenstern (FGM) and Frank copula basedbivariate beta and bivariate normal (BRMA) distributions for the ascus dataset.

eters except in the ‘Gaussian’, ‘FGM’ and ‘Frank’ models. From the copula based bivariatebeta distributions, it is apparent that the correlation between sensitivity and specificity inHC2 and repeat cytology is different.

The ‘Clayton90’ model has the lowest WAIC even though sampling from the posterior distri-bution was difficult as seen in their trace plots in Figure 4 and the n_eff and Rhat in Table 2.The difficulty in sampling from the posterior could be signalling over-parameterisation of the

49

Page 63: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 21

C270

C90

FGM

Frank

Gaussian

BRMA

0.5 1.0 1.5 2.0RSE RSP

0.5 1.0 1.5 2.0

Mean [95% equal−tailed credible intervals]

Model

Figure 5: Pooled relative sensitivity (on top) and relative specificity (bottom) of repeat cy-tology (posterior mean and 95% credible intervals) compared to HPV testing with HC2 todetect cervical precancer in women with an atypical Pap smear estimated.

correlation structure. It would thus be interesting to re-fit the models using only one cor-relation parameter and compare the models. WAIC is known to fail in certain settings andthis examples shows that it is crucial to check the adequacy of the fit and plausibility of themodel and not blindly rely on an information criterion to select the best fit to the data.

7. Discussion

Copula-based models offer great flexibility and ease but their use is not without caution.While the copulas used in this paper are attractive as they are mathematically tractable,Mikosch (2006) and Genest and Remillard (2006) noted that it might be difficult to estimatecopulas from data. Furthermore, the concepts behind copula models is slightly more complexand therefore require statistical expertise to understand and program them as they are notyet available as standard procedure/programs in statistical software.

In this paper, several advanced statistical models for meta-analysis of diagnostic accuracystudies were briefly discussed. The use of the R package CopulaDTA within the flexibleStan interface was demonstrated and shows how complex models can be implemented in aconvenient way.

In most practical situations, the marginal mean structure is of primary interest and the cor-relation structure is treated a nuisance making the choice of copula less critical. Nonetheless,an appropriate correlation structure is critical in the interpretation of the random variationin the data as well as obtaining valid model-based inference for the mean structure.

When the model for the mean is correct but the true distribution is misspecified, the esti-mates of the model parameters will be consistent but the standard errors will be incorrect

50

Page 64: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

22 CopulaDTA: Bivariate Beta-Binomial Models in R

Agresti (2002). Nonetheless, the bivariate beta distribution has the advantage to allow directjoint modelling of sensitivity and specificity, without the need of any transformation, andconsequently providing estimates with the appropriate meta-analytic interpretation but withthe disadvantage of being more computationally intensive for some of the copula functions.

Leeflang et al. (2013) showed that the sensitivity and specificity often vary with diseaseprevalence. The models presented above can easily be extended and implemented to jointlymodel prevalence, sensitivity and specificity using tri-variate copulas.

There were some differences between the models in estimating the meta-analytic sensitivityand specificity and the correlation. Therefore, further research is necessary to investigate theeffect of certain parameters, such as the number of studies, sample sizes and misspecificationof the joint distribution on the meta-analytic estimates.

8. Conclusion

The proposed Bayesian joint model using copulas to construct bivariate beta distributions,provides estimates with both the appropriate marginal as well as conditional interpretation,as opposed to the typical BRMA model which estimates sensitivity and specificity for specificstudies with a particular value for the random-effects. Furthermore, the models do not haveestimation difficulties with small sample sizes or large between-study variance because: i)the between-study variances are not constant but depends on the underlying means and iiBayesian methods are less influenced by small samples sizes.

The fitted models generally agree that the mean specificity was slightly lower than what Glaset al. (2003) reported and based on this we conclude that telomerase was not sensitive andspecific enough to be recommended for daily use.

In the ASCUS triage data, conclusion based on the fitted models is in line with what theauthors conclude: that HC2 was considerably more sensitive but sligthly and non-significantlyless specific than repeat cytology to triage women with an equivocal Pap smear to diagnosecervical precancer.

While the BRMA had the lowest WAIC for both datasets, we still recommend modelling ofsensitivity and specificity using bivariate beta distributions as they easily and directly providemeta-analytic estimates.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

M. Arbyn designed the OPSADAC project (Optimisation of statistical procedures to assess thediagnostic accuracy of cervical cancer screening tests) of which this study is a part of. Victoriaand M. Aerts conceptualized and initiated the study. Victoria wrote the code, analysed thedata and drafted manuscript. M. Arbyn and M. Aerts edited the manuscript. All authorsreviewed and approved the final manuscript.

51

Page 65: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 23

Acknowledgements

V. Nyaga recieved financial support from the Scientific Institute of Public Health (Brussels)through the OPSADAC project. M. Arbyn was supported by the COHEAHR project fundedby the 7th Framework Programme of the European Commission (grant No 603019) and theBelgian cancer center. M. Aerts was supported by the IAP research network nr P7/06 of theBelgian Government (Belgian Science Policy).

We are grateful to the editor and the anonymous reviewer for the valuable comments thathelped improve the paper considerably.

Supplementary materials

The datasets used in the tutorial along with the code for all the models is available for down-load with this article at the Journal of Statistical Software website http://www.jstatsoft.

org/.

References

Agresti A (2002). Categorical Data Analysis. 2nd edition. John Wiley & Sons, New-York,131.

Arbyn M, et al. (2012). “Evidence Regarding Human Papillomavirus Testing in SecondaryPrevention of Cervical Cancer.” Vaccine, 30, F88–F99.

Arbyn M, et al. (2013). “Human Papillomavirus Testing Versus Repeat Cytology for Triageof Minor Cytological Cervical Lesions.” Cochrane Database of Systematic Reviews, 31,CD008054.

Arends LR, et al. (2008). “Bivariate Random Effects Meta-Analysis of ROC Curves.” MedicalDecision Making, 28(5), 621–638.

Arnold BC, Ng HKT (2011). “Flexible Bivariate Beta Distributions.” Journal of MultivariateAnalysis, 102(8), 1194–1202.

Carpenter B, et al. (2016). “Stan: A Probabilistic Programming Language.” Journal ofStatistical Software (In press).

Chu H, Cole S (2006). “Bivariate Meta-Analysis of Sensitivity and Specificity with SparseData: A Generalized Linear Mixed Model Approach.” Journal of Clinical Epidemiology,59(12), 1331–1332.

Clayton DG (1978). “A Model for Association in Bivariate Life Tables and its Application inEpidemiological Studies of Familial Tendency in Chronic Disease Incidence.” Biometrika,65(1), 141–151.

Cong X, Cox DD, Cantor SB (2007). “Bayesian Meta-Analysis of Papanicolaou Smear Accu-racy.” Gynecologic Oncology, 107(1 Suppl 1), S133–S137.

Farlie DGJ (1960). “The Performance of Some Correlation Coefficients for a General BivariateDistribution.” Biometrika, 47, 307–323.

52

Page 66: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

24 CopulaDTA: Bivariate Beta-Binomial Models in R

Frank MJ (1979). “On the Simultaneous Associativity of F(x, y) and x + y - F(x, y).”Aequationes Mathematicae, 194–226.

Genest C, Remillard B (2006). “Comments on T. Mikosh’s Paper, Copulas: Tales and Facts.”Extremes, 9, 27–36.

Glas AS, et al. (2003). “Tumor Markers in the Diagnosis of Primary Bladder Cancer. ASystematic Review.” The Journal of Urology, 169(6), 1975–1982.

Gumbel EJ (1960). “Bivariate Exponential Distributions.” Journal of the American StatisticalAssociation, 55, 698–707.

Hoffman MD, Gelman A (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengthsin Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15(1), 1593–1623.

Leeflang MM, et al. (2013). “Variation of a Test’s Sensitivity and Specificity with DiseasePrevalence.” Canadian Medical Association Journal, 185(11), E537–E544.

Libby DL, Novick RE (1982). “Multivariate Generalized Beta Distributions with Applicationsto Utility Assessment.” Journal of Educational Statistics, 7(4), 271–294.

Meyer C (2013). “The Bivariate Normal Copula.” Communications in Statistics - Theory andMethods, 42(13), 2402–2422.

Mikosch TV (2006). “Copulas: Tales and Facts. Discussion Paper with a Rejoinder.” Extremes,9, 3–20, 55–62.

Molenberghs G and Verbeke G. (2005). Models for Discrete Longitudinal Data. Springer-Verlag, New York, 259, 267.

Morgenstern D (1956). “Einfache Beispiele Zweidimensionaler Verteilungen.” MitteilungsblattfurMathematische Statistik, 8, 234–235.

Moses LE, Shapiro D and Littenberg B (1993). “Combining Independent Studies of a Diag-nostic Test into a Summary ROC Curve: Data-Analytic Approaches and some AdditionalConsiderations.” Statistics in Medicine, 12(14), 1293–1316.

Nelsen RB (2006). An Introduction to Copulas. Springer-Verlag, New York.

Nikoloulopoulos AK (2015a). “A Mixed Effect Model for Bivariate Meta-analysis of DiagnosticTest Accuracy Studies Using a Copula Representation of the Random Effects Distribution.”Statistics in Medicine, 34(29), 3842–3865.

Nikoloulopoulos AK (2015b). CopulaREMADA: Copula Random Effects Model for Bi-variate and Trivariate Meta-Analysis of Diagnostic Test Accuracy Studies. R package ver-sion 0.9, URL https://CRAN.R-project.org/package=CopulaREMADA.

Nyaga VN (2016). CopulaDTA: Copula Based Bivariate Beta-Binomial Model for Diagnos-tic Test Accuracy Studies. R package version 0.0.4, URL http://CRAN.R-project.org/

package=CopulaDTA.

Olkin I, Liu R (2003). “A Bivariate Beta Distribution.” Statistics and Probability Letters, 62,407–412.

53

Page 67: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Journal of Statistical Software – Code Snippets 25

Olkin I, Trikalinos TA (2014). “Constructions for a Bivariate Beta Distribution.” ARXIV,1–10.

Patton AJ (2006). “Modelling Asymmetric Exchange Rate Dependence.” International Eco-nomic Review, 47(2), 527–556.

Plummer M (2003). “JAGS: A program for Analysis of Bayesian Graphical Models usingGibbs Sampling.” Proceedings of the 3rd international Workshop on Distributed StatisticalComputing, Technische Universit at Wien, 124.

Plummer M (2008). “Penalized Loss Functions for Bayesian Model Comparison.” Biostatistics,9, 523–539.

R Core Team (2016). R : A Language and Environment for Statistical Computing. R Foun-dation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH (2005). “Bi-variate Analysis of Sensitivity And Specificity Produces Informative Summary Measures inDiagnostic Reviews.” Journal of Clinical Epidemiology, 58(10), 982–990.

Riley RD (2009). “Multivariate Meta-analysis: The Effect of Ignoring Within-Study Correla-tion.” Journal of the Royal Statistical Society, 172(4).

Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR (2007a). “An Evaluation ofBivariate Random-Effects Meta-analysis for the Joint Synthesis of Two Correlated Out-comes.” Statistics in Medicine, 26(1), 78–97.

Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR (2007b). “Bivariate Random-Effects Meta-Analysis and the Estimation of Between-Study Correlation.” BMC MedicalResearch Methodology, 7(3).

Rutter CM, Gatsonis CM (2001). “A Hierarchical Regression Approach to Meta-Analysis ofDiagnostic Test Accuracy Evaluations.” Statistics in Medicine, 20, 2865–84.

Sarmanov O (1966). “Generalized Normal Correlation and Two-Dimensional Frechet Classes.”Soviet Mathematics - Doklady, 7, 596–599.

Sklar A (1959). “Fonctions de Repartition a n Dimensions et Leurs Marges.” Publications del’Institut de Statistique de L’Universite de Paris, 8, 229–231.

Stan Development Team (2016). RStan : The R Interface to Stan. Version 2.10.1. URLhttp://mc-stan.org/.

Takwoingi Y, Guo B, Riley RD and Deeks, JJ (2015). “Performance of Methods for Meta-Analysis of Diagnostic Test Accuracy with Few Studies or Sparse Data.” Statistical Methodsin Medical Research, 0(0), 1–19.

Vehtari A, Gelman A (2014). “WAIC and Cross-Validation in Stan”. Unpublished.

Watanabe S (2010). “Asymptotic Equivalence of Bayes Cross Validation and Widely Appli-cable Information Criterion in Singular Learning Theory.” Journal of Machine LearningResearch, 11, 3571–3594.

54

Page 68: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

26 CopulaDTA: Bivariate Beta-Binomial Models in R

Affiliation:

Victoria N NyagaUnit of Cancer EpidemiologyScientific Institute of Public HealthJuliette Wytsmanstraat 141050 Brussels, BelgiumE-mail: [email protected]: [email protected]

Marc ArbynUnit of Cancer EpidemiologyScientific Institute of Public HealthJuliette Wytsmanstraat 141050 Brussels, BelgiumE-mail: [email protected]

Marc AertsCenter for StatisticsHasselt UniversityAgoralaan building D3590 Diepenbeek, BelgiumE-mail: [email protected]

Journal of Statistical Software http://www.jstatsoft.org/

published by the Foundation for Open Access Statistics http://www.foastat.org/

MMMMMM YYYY, Volume VV, Code Snippet II Submitted: yyyy-mm-dddoi:10.18637/jss.v000.i00 Accepted: yyyy-mm-dd

55

Page 69: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

4. ANOVA Model for Network Meta-analysis of Diagnostic TestAccuracy Data

This chapter has been published as:Nyaga VN, Aerts M and Arbyn M. ANOVA model for network meta-analysis ofdiagnostic test accuracy data. Stat Methods Med Res, 2016. Prepublished Sep 20, 2016;DOI: 10.1177/0962280216669182

Page 70: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 71: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

ANOVA model fornetwork meta-analysisof diagnostic testaccuracy data

Journal TitleXX(X):1–26c©The Author(s) 2015

Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/ToBeAssignedwww.sagepub.com/

Victoria Nyaga1,2, Marc Aerts2 and Marc Arbyn1

AbstractProcedures combining and summarizing direct and indirect evidence fromindependent studies assessing the diagnostic accuracy of different tests for the samedisease are referred to as Network Meta-Analysis (NMA). NMA provide a unifiedinference framework and use the data more efficiently than seperate sub-groupanalysis. Nonetheless, handling the inherent correlation between sensitivity andspecificity continues to be a statistical challenge. We developed an Arm-Based (AB)hierarchical model which expresses the logit transformed sensitivity and specificityas the sum of fixed effects for test, correlated study-effects to model the inherentcorrelation between sensitivity and specificity and a random error associated withvarious tests evaluated in a given study. We present the accuracy of 11 tests usedto triage women with minor cervical lesions to detect cervical precancer. Finally, wecompare the results with those from a Contrast-Based (CB) model which expressesthe linear predictor as a contrast to a comparator test. The proposed AB model ismore appealing than the CB model since the former permit more straightforwardinterpretation of the parameters, makes use of all available data yielding narrowercredible intervals, and models more natural variance-covariance matrix structures.

Keywordsmeta-analysis, network meta-analysis, diagnostic tests, hierarchical model, arm-based

1Scientific Institute of Public Health, Unit of Cancer Epidemiology/Belgian Cancer Center, Belgium2Hasselt University, I-Biostat, Diepenbeek, Belgium

Corresponding author:Marc ArbynEmail: [email protected]

Prepared using sagej.cls [Version: 2015/06/09 v1.01]

58

Page 72: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

2 Journal Title XX(X)

Introduction

Network meta-analyses (NMA) have classically been used to extend conventionalpairwise meta-analyses by combining and summarizing direct and indirect evidenceon multiple ‘therapeutic’ interventions for a given condition when the set of evaluatedinterventions/treatments differs among studies. By borrowing strength from the indirectevidence, there is a potential gain in precision of the estimates1. Furthermore, theestimates may be less biased and more robust. Such an approach uses the data efficientlyand is line with the principle of intention-to-treat (ITT)2 in randomized clinical trialswhich requires that all valid available data should be used even when a part of the datais missing. In a diagnostic test accuracy study, an index test and possibly one or morecomparator tests are administered to each tested subject. A standard or reference test orprocedure is also applied to all the patients to classify them as having the target conditionor not. The patients results are then categorized by the index and reference test as truepositive, false positive, true negative and false negative.

The diagnostic accuracy of the index test is represented as a bivariate outcome andis typically expressed as sensitivity and specificity at a defined test cutoff. Differencesdue to chance, design, conduct, patients/participants, interventions, tests and referencetest imply there will be heterogeneity often in opposite direction for the two typicalaccuracy outcomes: sensitivity and specificity. While traditional meta-analyses allowfor comparison between two tests, there are often multiple tests for the diagnosis of aparticular disease outcome. To present the overall picture, inference about all the testsfor the same condition and patient characteristics is therefore required. The simultaneousanalysis of the variability in the accuracy of multiple tests within and between studiesmay be approached through a network meta-analysis.

In combining univariate summaries from studies where the set of tests differs amongstudies two types of linear mixed models have been proposed. The majority of networkmeta-analyses express treatment effects in each study as a contrast relative to a baselinetreatment in the respective study1;3. This is the so called contrast-based (CB) model.Inspired by the CB models developed for interventional studies, Menten and Lesaffre(2015)4 introduced a CB model for diagnostic test accuracy data to estimate the averagelog odds ratio for sensitivity and specificity of the index test relative to a baseline orcomparator test.

The second type of models is the classical two-way ANOVA model with randomeffects for study and fixed effect for tests5–7, the so called arm-based (AB) model.The AB model is based on the assumption that the missing arms or tests are missingat random. While the two types of models yield similar results for the contrasts withrestricted maximum likelihood (REML) procedures, the CB model is generally notinvariant to changes in the baseline test in a subset of studies and yields an odds ratio(OR) making it difficult to recover information on the absolute diagnostic accuracy(the marginal means), relative sensitivity or specificity of a test compared to another ordifferences in accuracy between tests, measures that are easily interpretable and oftenused in clinical epidemiology. It is common knowledge that the OR is only a goodapproximation of relative sensitivity/specificity when the outcome is rare8–10 but this

Prepared using sagej.cls

59

Page 73: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 3

is often not the case in diagnostic studies. Moreover, the AB model is simpler whenthe baseline/comparator treatment varies from one study to another or when the numberof tests varies substantially among studies. By accommodating more complex variance-covariance structures AB models have been shown to be superior to CB models11.

We apply the two-way ANOVA model in a diagnostic data setting by extending theAB model in two ways: 1. using two independent binomial distributions to describethe distribution of true positives and true negatives among the diseased and the healthyindividuals, 2. inducing a correlation between sensitivity and specificity by introducingcorrelated and shared study effects. The resulting generalized linear mixed model isanalogous to randomized trials with complete block designs or repeated measures inanalysis of variance models where studies are equivalent to blocks. The main assumptionis that, results missing for some tests and studies are missing at random. This approach isefficient because the correlation structure allows the model to borrow information fromthe ‘imputed’ missing data to obtain adjusted sensitivity and specificity estimates for allthe tests.

Motivating datasetTo illustrate the use of the proposed model in network meta-analysis of diagnostic testaccuracy data, we analyse data on a diversity of cytological or molecular tests to triagewomen with equivocal or mildly abnormal cervical cells12–16. A Pap smear is a screeningtest used to detect cervical precancer. When abnormalities in the Pap smear are nothigh grade, a triage test is needed to identify the women who need referral for furtherdiagnostic work-up. There are several triage options, such as repetition of the Pap smearor HPV DNA or RNA assays. HPV is the virus causing cervical cancer17. Several othermarkers can be used for triage as well, such as p16 or the combinations of p16/Ki67which are protein markers indicative for a transforming HPV infection15;18 .

The data are derived from a comprehensive series of meta-analyses on the accuracyof triage with HPV assays, cervical cytology or molecular markers applied oncervical specimens in women with minor cervical abnormalities12–16. Two patientgroups with minor cytological abnormalities were distinguished: women with ASC-US(atypical squamous cells of unspecified significance) and LSIL (low-grade squamousintraepithelial lesions). Two levels of precancer (disease) were considered: intraepithelialneoplasia lesion of grade two or worse (CIN2+) or of grade three or worse (CIN3+). Thedisease status was ascertained by colposcopy. This was followed by a partial histologicalverification of a biopsy specimen when colposcopy was positive.

In total, the accuracy of 11 tests for detecting cervical precancer were evaluated.Labelled 1 to 11 the tests were: hrHPV DNA testing with Hybrid Capture II (HC2),Conventional Cytology (CC), Liquid-Based Cytology (LBC), generic PCRs targetinghrHPV DNA (PCR) and commercially available PCR-based hrHPV DNA assays suchas: Abbott RT PCR hrHPV, Linear Array, and Cobas-4800; assays detecting mRNAtranscripts of five (HPV Proofer) or fourteen (APTIMA) HPV types HPV types; andprotein markers identified by cytoimmunochestry such as: p16 and p16/Ki67, which areover-expressed as a consequence of HPV infection. 125 studies with at least one test and

Prepared using sagej.cls

60

Page 74: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

4 Journal Title XX(X)

maximum of six tests were included allowing assessment of the accuracy of the eleventriage tests. In

Triage:ASC−USOutcome:CIN2+

Number of studies = 107

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:ASC−USOutcome:CIN3+

Number of studies = 51

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:LSILOutcome:CIN2+

Number of studies = 82

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:LSILOutcome:CIN3+

Number of studies = 44

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Figure 1. Network plot of all included tests to triage women with women with atypicalsquamous cells of unspecified significance (ASC-US) and low-grade squamous intraepitheliallesions (LSIL) to detect intraepithelial neoplasia lesion of grade two or worse (CIN2+) or ofgrade three or worse (CIN3+). HC2 and APTIMA were the most commonly assessed tests.

The size of the nodes in figure 1 is proportional to the number of studies evaluatinga test and thickness of the lines between the nodes is proportional to the number ofdirect comparisons between tests. The size of the node and the amount of informationin a node consequently influence the standard errors of the marginal means and therelative measures. From the network plot, HC2 and APTIMA were the most commonly

Prepared using sagej.cls

61

Page 75: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 5

assessed tests. The network in figure 1 is connected because there exists at least one studyevaluating a given test together with at least one of the other remaining 10 tests.

MethodologySuppose there are K tests and I studies. Studies assessing two tests (k = 2) are called‘two-arm’ studies while those with k > 2 are ‘multi-arm’ studies. For a certain study i,let (Yi1k, Yi2k) denote the true positives and true negatives, (Ni1k, Ni2k) the diseasedand healthy individuals and (πi1k, πi2k) the ‘unobserved’ sensitivity and specificityrespectively with test k in study i. Given study-specific sensitivity and specificity, twoindependent binomial distributions describe the distribution of true positives and truenegatives among the diseased and the healthy individuals as follows

Yijk | πijk, xi ∼ bin(πijk, Nijk), i = 1, . . . I, j = 1, 2, k = 1, . . . K, (1)

where xi generically denotes one or more covariates, possibly affecting πijk. In thenext section, we present the recently introduced contrast-based model4 followed by ourproposed arm-based model to estimate the mean as well as comparative measures ofsensitivity and specificity.

Contrast-based modelBy taking diagnostic test TK as the baseline, Menten and Lessafre (2015)4 proposed thefollowing model

logit(πijk) = θijk

θij1 = µij + (K − 1)× δij1K− δij2

K− δij3

K− . . .− δij(k−1)

K

θij2 = µij −δij1K

+ (K − 1)× δij2K− δij3

K− . . .− δij(k−1)

K...

θijK = µij −δij1K− δij2

K− δij3

K− . . .− δij(k−1)

Kδi = (δi11, . . . δi1(K−1), δi21, . . . , δi2(K−1)) ∼ N(νδ, Σ)

νδ = (νδ11, . . . , νδ1(K−1), νδ21, . . . , νδ2(K−1)) (2)

The νδ represents the average log odds ratio for sensitivity and specificity of the K - 1tests compared to the baseline test TK .

The matrix Σ models the variances and co-variances of the contrasts δi and contributein a complex manner to the variances and co-variances of θi. In fact, it would be almostfutile to express the corresponding matrix Σθi .

There are known difficulties in estimating the variance-covariance matrix Σ sinceeach sampled matrix should be positive-definite19. Moreover, the components of Σ

Prepared using sagej.cls

62

Page 76: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

6 Journal Title XX(X)

are difficult to comprehend and depend on baseline test. Furthermore, the modelidentification becomes difficult as the number of tests included increases.

To enable model identification, the authors therefore recommend a diagonal or blockdiagonal variance-covariance matrix Σ expressed as follows

Σdiagonal =

σ2δi11

. . . 0 0 . . . 0...

. . ....

.... . .

...0 . . . σ2

δi1K−10 . . . 0

0 . . . 0 σ2δi21

. . . 0...

. . ....

.... . .

...0 . . . 0 0 . . . σ2

δi2K−1

(3)

or

Σblock diagonal =

σ2δi11

. . . σδi11,δi1(K−1)0 . . . 0

.... . .

......

. . ....

σδi11,δi1(K−1). . . σ2

δi1K−10 . . . 0

0 . . . 0 σ2δi21

. . . σδi21,δi2(K−1)

.... . .

......

. . ....

0 . . . 0 σδi21,δi2(K−1). . . σ2

δi2K−1

(4)

While this reduces model complexity and difficulty in estimation, such a covariancematrices totally ignore the correlation between the logit sensitivity and logit specificity.

The authors estimate the absolute accuracy of the tests from the estimatedlogit−1(µjk) as follows

E(µj) =1

I

I∑

i=1

µij

µj1 = logit−1(E(µj)) + (K − 1)× νδj1K− νδj2

Ks − νδj3

K. . . − νδj(K−1)

K

µj2 = logit−1(E(µj)) −νδj1K

+ (K − 1)× νδj2K− νδj3

K. . . − νδj(K−1)

K...

µjK = logit−1(E(µj)) −νδj1K− νδj2

K− νδj3

K. . . − νδj(K−1)

K(5)

where logit−1(E(µj)) is the average probability of testing positive/negative.Equation 5 estimates the accuracy of tests for a hypothetical study with random-effects

Prepared using sagej.cls

63

Page 77: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 7

equal to zero but not the meta-analytic estimates. The meta-analytic estimates areobtained after the following complex integration

E(πij1) =

∫logit−1

(µij + (K − 1)× δij1

K− δij2

K. . .− δij(K−1)

K

)f(δij)dδij1 . . . dδij(K−1)

E(πij2) =

∫logit−1

(µij −

δij1K

+ (K − 1)× δij2K

. . .− δij(K−1)K

)f(δij)dδij1 . . . dδij(K−1)

...

E(πijK) =

∫logit−1

(µij −

δij1K− δij2

K. . .− δij(K−1)

K

)f(δij)dδij1 . . . dδij(K−1)

(6)

Using MCMC methods, the meta-analytic estimates can be obtained as follows;

E(πijk) =1

I

I∑

i=1

logit−1(θijk) (7)

Arm-based modelConsider a design where there is at least one test per study. The study serves as ablock where all diagnostic accuracy tests are hypothetically evaluated of which some aremissing. This modelling approach has potential gain in precision by borrowing strengthfrom studies with single tests as well as multi-arm studies. The proposed single-factordesign with repeated measures model is written as follows

logit(πijk) = θijk = µjk + ηij + δijk(ηi1ηi2

)∼ N

((00

),Σ

)

Σ =

[σ21 ρσ1σ2

ρσ1σ2 σ22

]

δi = (δij1, δij1, . . . δijK) ∼ N(0, diag(τ2j )) (8)

where logit−1(µ1k) and logit−1(µ2k) are the mean sensitivity and specificity in ahypothetical study with random-effects equal to zero respectively. ηij is the study effectfor healthy (j = 1) or diseased individuals (j = 2) and represents the deviation of aparticular study i from the mean sensitivity (j=1) or specificity (j=2), inducing between-study correlation. The study effects are assumed to be a random sample from a populationof such effects. The common between-study variability of sensitivity and specificityand the correlation thereof is captured by the parameters σ2

1 , σ22 , and ρ respectively.

δijk is the error associated with the sensitivity (j=1) or specificity (j=2) of test k inthe ith study. Conditional on study i, the repeated measurements are independent withvariance constant across studies such that τ2

j = (τ2j1, . . . , τ2jK) is a K dimensional vector

Prepared using sagej.cls

64

Page 78: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

8 Journal Title XX(X)

of variances. The full variance-covariance matrix of θi is expressed as follows

σ21 + τ211 σ2

1 . . . σ21 ρσ1σ2 . . . . . . ρσ1σ2

σ21 σ2

1 + τ212 . . ....

.... . . . . .

......

.... . . σ2

1

.... . . . . .

...σ21 . . . σ2

1 σ21 + τ21K ρσ1σ2 . . . . . . ρσ1σ2

ρσ1σ2 . . . . . . ρσ1σ2 σ22 + τ221 σ2

2 . . . σ22

.... . . . . .

... σ22 σ2

2 + τ222 . . ....

.... . . . . .

......

.... . . σ2

2

ρσ1σ2 . . . . . . ρσ1σ2 σ22 . . . σ2

2 σ22 + τ22K

The correlation corr(θijk, θijk′) between kth and k′th logit sensitivity (j = 1) or

specificity (j = 2) is ρjkk′ =σ2j√

(σ21 + τ2

jk) × (σ21 + τ2

jk′). In case τ2j1 = . . . = τ2jK = τ2j

(homogeneous variances across tests), the shared random element ηij within study iinduce a non-negative correlation between any two test results k and k′ from healthy

individuals (j = 1) or from diseased individuals (j = 2) equal to ρj =σ2j

σ2j+τ

2j

(implyingthat a covariance matrix with compound symmetry). While it might seem logical toexpect and allow for similar correlation between any two sensitivities or specificitiesin a given study, the variances τ2jk of different sensitivities or specificities of the samestudy may be different. In such instances, the unstructured covariance matrix is moreappropriate as it allows varying variances between the tests (in which case τ2

j is a Kdimension vector of the unequal variances).ρj or ρjkk′ is called the intra-study correlation coefficient which also measures the

proportion of the variability in θijk that is accounted for by the between study variability.It takes the value 0 when σ2

j = 0 (if study effects convey no information) and valuesclose to 1 when σ2

j is large relative to τ2j and the studies are essentially all identical.When all components of τ2

j equal to zero, the model reduces to fitting separate bivariaterandom-effect meta-analysis20;21 model for each test.

On the other hand, the correlation between logit sensitivity (j = 1) and specificity (j =2) of kth and k′th (k = k′) or (k 6= k′) test equals

ρ12kk′ =ρσ1σ2√

(σ21 + τ2jk) × (σ2

1 + τ2jk′)(9)

Homogeneous variances across tests imply that logit sensitivity and specificity arecorrelated in the same way in all the tests. This assumption may be reasonable if there isa common ‘threshold’ effect in all the cases. For network meta-analysis, different types oftests with different ‘thresholds’ are evaluated and the assumption of common correlationbetween sensitivity and specificity in all tests may no longer hold.

In most practical situations, the mean structure is of primary interest and not thecovariance structure. Nonetheless, an appropriate covariance modelling is critical in the

Prepared using sagej.cls

65

Page 79: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 9

interpretation of the random variation in the data as well as obtaining valid model-basedinference for the mean structure. Compound symmetry assumes homogeneity of varianceand covariance and such restriction could invalidate inference for the mean structurewhen the assumed covariance structure is misspecified22.

When the primary objective of the analysis is on estimating the marginal means ofsensitivity and specificity, the choice between compound symmetry and unstructuredcovariance structure is not critical because the inference procedure for the marginalmeans are the same. Moreover, over-parameterisation of the covariance structure mightlead to inefficient estimation and potentially poor assessment of standard errors ofthe marginal means23. Watanabe-Akaike Information Criterion (WAIC), a measure ofpredictive accuracy of a fitted model, can be used to choose the appropriate covariancestructure24.

In essence, the model separates the variation in the studies into two components: thewithin-study variation diag(τ2

j ) referring to the variation in the repeated sampling ofthe study results if they were replicated, and the between-study variation Σ referring tovariation in the studies true underlying effects.

The study-level covariate information is included in the linear predictor in Equation 8as follows

θijk =P∑

p=1

Xpiβpjk + ηij + δijk (10)

where βpjk is the pth coefficients corresponding to the Xpi covariate in a hypotheticalstudy with random-effects equal to zero respectively.

The population-averaged or the marginal sensitivity/specificity in the intercept-onlymodel for test k is estimated as

E(πijk) = E(logit−1(µjk + ηij + δijk))

=

∫ ∞

−∞logit−1(µjk + ηij + δijk) f(ηij) f(δijk) dηij dδijk

E(πijk) =1

I

I∑

1

logit−1(µjk + ηij + δijk) (11)

The relative sensitivity and specificity and other relative measures of test k (relativeto test k′, k 6= k′) are then estimated from the marginal sensitivity or specificity of test kand k’.

Ranking of the testsWhile ranking of tests using rank probabilities and rankograms is an attractive feature ofunivariate NMA, it is still a challenge to rank competing diagnostic tests especially whena test does not outperform the others on both sensitivity and specificity. Consider thediagnostic odds ratio (DOR)25 which is expressed in terms of sensitivity and specificityas

DORk =sensitivityk × specificityk

(1− sensitivityk)× (1− specificityk). (12)

Prepared using sagej.cls

66

Page 80: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

10 Journal Title XX(X)

and ranges from 0 to ∞ with: DORk > 1 or higher indicating better discriminatorytest performance, DORk = 1 indicating a test that does not discriminate between thehealthy and diseased, and DORk < 1 indicating an improper test. The DOR is a singleindicator combining information about sensitivity and specificity and is invariant ofdisease prevalence. However, the measure cannot distinguish between tests with highsensitivity but low specificity or vice-versa.

Alternatively, the superiority of a diagnostic test could be quantified using a superiorityindex introduced by Deutsch et al.26 expressed as

Sk =2ak + ck2bk + ck

, (13)

where ak is the number of tests to which test k is superior (higher sensitivity andspecificity), bk is the number of tests to which test k is inferior (lower sensitivity andspecificity), and ck the number of tests with equal performance as test k (equal sensitivityand specificity). S ranges from 0 to ∞ with; S tending to ∞ and S tending to 0 as thenumber of tests to which test k is superior and inferior increases respectively, and Stending to 1 the more the tests are equal. Since the number of tests not comparable totest k do not enter into the calculation of S the index for different tests may be based ondifferent sets of tests.

Missing data and exchangeabilityIn the models above, not all the studies provide estimates of all effects of interest becausesome of the components of the vector Yij = (Yij1, . . . , YijK) are missing. The Yijvector can be partitioned into the observed Yoij and the missing Ymij . For each componentof Yij let Rij denote a vector of missingness indicator with

Rijk =

1 if Yijk is observed,0 otherwise.

The joint distribution of (Y, R) given the parameters (β, φ) is given by

p(yij , Rij |βj ,φj) (14)

where φj contains the missingness parameters and βj contains(πij , Σ, ρ, σj , diag(τ j)). In a selection framework27;28 the joint distribution inEquation 14 is factorised as

p(yij | βj),φj p(Rij | Yij , φj) = p(yoij , ymij | βj , φj)p(Rij | Y oij , Y mij , φj) (15)

where p(Rij | yoij , ymij , φj) describes the mechanism for data missingness. Assumingthat the probability of missingness is conditionally independent of the unobserved datagiven the observed (so called missing at random (MAR)), the second part of Equation 15simplifies to

p(Rij | yoij , ymij , φj) = p(Rij | yoij , φj). (16)

Prepared using sagej.cls

67

Page 81: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 11

When the parameters βij and φij are distinct and functionally independent, the missingdata mechanism is ignorable and expression 16 can be dropped from the joint distributionin Equation 15. Intergrating over the unknown missing values in the first part ofEquation 15 yields a marginal density with the observed information which is to beevaluated ∫

p(yoij , ymij | βj) dymij = p(yoij | βj). (17)

Since the main objective is to be able to make valid and efficient inference about theparameters of interest and not to estimate or predict the missing data, the ignorabilitycondition validates inference based on the observed data likelihood only. Conditionalon πijk the studies are assumed to be exchangeable. The observed information Yijkon a given test/arm k generically represents a point estimate of πijk and contributes tothe estimation of the fixed effects µjk. At the second level of the hierarchy (Equation8), exchangeable normal prior distributions with mean zero split the variability intobetween- and within-study variability. The observed data in each study contributes tothe estimation of ηij while all the studies all-together contribute to the estimation of δijkwhere δijk and ηij are considered independent samples from a population controlled bythe hyper-parameters Σ and τ2

j which are estimated from the observed data. The hyper-parameters also have exchangeable vague or non-informative prior distributions. In fact,the exchangeability assumption is applied in both the CB and the AB models but in adifferent manner. The CB model assumes exchangeability of tests contrasts (odds ratios)across the studies while the AB assumes exchangeability of tests effects (means) acrossthe studies.

Prior distributionsWe decompose the covariance matrix Σ into a variance and correlation matrix such that

Σ = diag(σ1, σ2)× Ω× diag(σ1, σ2), (18)

where

Ω =

[1 ρρ 1

].

The model is completed by specifying vague priors on the mean, variance and correlationparameters as follows

tanh−1(ρ), µjk ∼ N(0, 25)

τj , σj ∼ U(0, 5). (19)

Since it is not clear when certain choices of prior distributions are vague and non-informative, it is necessary to vary the prior distribution and assess their influence onthe parameter estimates. The following prior distributions were also used as part of asensitivity analysis

ρ ∼ U(−1, 1)τj , σj ∼ cauchy(0, 2.5). (20)

Prepared using sagej.cls

68

Page 82: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

12 Journal Title XX(X)

An alternative prior distribution for the correlation matrix Ω is the LKJ prior distributionwith shape parameter ν = 1 or ν = 229.

Ω ∼ LKJcorr(ν) ∝ det(Ω)ν−1 for ν ≥ 1, (21)

where ν controls the expected correlation with larger values favouring less correlationand vice-versa. Other possible prior distributions for Σ are: the Inverse-Wishartdistribution having the advantage of computational convenience but being difficultto interpret or the more relaxed scaled inverse-Wishart which is a conjugate to themultivariate normal making Gibbs sampling simpler30.

ImplementationThe models are fitted in the Bayesian framework using Stan31, a probabilisticprogramming language which has implemented Hamilton Monte Carlo (MHC) and No-U-Turn sampler (NUTS)32 within R 3.3.033 using the rstan 2.9.0 package34. The Stancode for the model is provided alongside the supplementary material. We run three chainsin parallel until there is convergence. Trace plots are used to visually check whether thedistributions of the three simulated chains mix properly and are stationary.

For each parameter, convergence is assessed by examining the potential scale reductionfactor R, the effective number of independent simulation draws (neff ) and the MCMCerror. It is common practice to run simulations until R is no greater than 1.1 for allthe parameters. Since Markov chain simulations tend to be autocorrelated, neff isusually smaller compared to the total number of draws. To reduce autocorrelation andconsequently increase neff , it is necessary to do thinning by keeping every nth (e.g.every 10th, 20th, 30th . . . ) draw and discarding the rest of the samples. Besides, thinningsaves memory especially when the total number of iterations is large.

ResultsFigure 2 presents the study-specific sensitivity and specificity of all the 11 used to detectCIN2+ in ASC-US triage from all available studies and from studies that evaluatedat least two tests one of them being HC2. We successively present the sensitivity andspecificity of the eleven tests in triage of ASC-US and LSIL for outcomes CIN2+ andCIN3+, in figures 3, 4 and 5 respectively.

Representing the pooled sensitivity and specificity, the black diamonds are estimatedby the AB model from all the available studies, the red diamond by the same model butfrom studies with at least two tests with one of them being HC2 while the blue diamondsare estimated by the CB model from studies with at least two tests with one of them beingHC2.

Represented by the diamonds, the median was used as the measure of central tendencydue to the asymmetry of sensitivity and specificity. The vertical lines represent the95% equal-tailed credible intervals. In each instance, the studies included in estimatingthe diagnostic accuracy estimates are in grey points underlying the diamonds. Fromthe study-specific grey points there was substantial variation in both sensitivity and

Prepared using sagej.cls

69

Page 83: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 13

specificity between the studies and some studies had outlying values. It is also apparentthat the number of tests evaluated differed among studies.

All available data (black diamonds)The results presented are from the model assuming compound symmetry (insupplementary material: Results1e.xlsx). This is because the model fitted the data betteras reflected by lower WAIC, better mixing of chains, and reduction in autocorrelation(reproducible trace plots not shown here) compared to the model assuming heterogeneouswithin-study variances (see supplementary material: Results1a.xlsx).

Triage of women with ASC-US to detect CIN2+ According to figure 2, Linear Arraywas the most sensitive (0.92[0.86, 0.96]) but among the least specific (0.54 [0.43,0.65]) tests while HPV Proofer was the least sensitive (0.68 [0.59, 0.76]) and themost specific (0.79 [0.73, 0.84]) test. Both the diagnostic odds ratio and superiorityindex in the supplementary material (Results1e.xlsx) indicate that p16/Ki67 had thebest discriminatory power with a sensitivity of 0.85 [0.77, 0.91] and specificity of 0.75[0.66, 0.81]. Compared to HC2, LBC, p16 and HPV Proofer were less sensitive but morespecific, while p16/Ki67 was equally as sensitive but more specific. All other tests hadsimilar sensitivity and specificity as HC2 (see table 1).

Table 1. Posterior relative sensitivity and specificity and the corresponding 95% equal-tailedcredible intervals of other tests relative to HC2 in detecting cervical intraepithelial neoplasia ofgrade two or worse in women with atypical squamous cervical cells of unspecifiedsignificance estimated by the arm-based model.

Test Relative Sensitivity Relative SpecificityMean Lower Upper Mean Lower Upper

HC2 1 1 1 1 1 1Conventional Cytology (CC) 0.83 0.73 0.91 1.09 0.94 1.23Liquid-Based Cytology (LBC) 0.81 0.71 0.90 1.30 1.11 1.45Generic PCR assays 0.96 0.89 1.02 1.02 0.86 1.17Abbott RT PCR hrHPV 1.00 0.91 1.06 0.97 0.75 1.18Linear Array 1.01 0.96 1.05 0.82 0.68 0.96Cobas-4800 1.01 0.95 1.06 1.01 0.80 1.21P16 0.89 0.81 0.95 1.33 1.20 1.45P16/Ki67 0.93 0.85 1.00 1.39 1.23 1.54HPV Proofer(mRNA) 0.74 0.65 0.83 1.48 1.36 1.59APTIMA(mRNA) 0.97 0.91 1.01 1.14 1.00 1.27

Triage of women with ASC-US to detect CIN3+ It can be seen in figure 3 that AbbottRT PCR hrHPV was the most sensitive (0.98 [0.88, 1.00]) but among least specific (0.48[0.35, 0.60]) tests. The diagnostic odds ratio and the superiority index (see supplementarymaterial: Results1e.xlsx) indicate that p16/Ki67 had the best discriminatory power withsensitivity and specificity of 0.97 [0.84, 1.00] and 0.65 [0.53, 0.76] respectively. Relative

Prepared using sagej.cls

70

Page 84: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

14 Journal Title XX(X)

754242

1188

655

1233

555

1355

544

2099

844

1166

131212

754242

1188

655

1233

555

1355

544

2099

844

1166

131212

HC2

CC

LBC

Generic PCR

Abbot

Linear Array

Cobas

p16

p16/ki67

HPV Proofer

APTIMA

0 0.5 1 N

Sensitivity Specificity

0 0.5 1 N

Posterior median [95% equal tailed credible intervals]

Contrast−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(All available studies)

Study−specific sensitivityand specificity

N: Number of studies

Figure 2. Cobas and Linear array were the most sensitive tests as estimated from allavailable and reduced data respectively. HPV Proofer was the most specific test in detectingcervical intraepithelial neoplasia of grade two or worse in women with atypical squamouscervical cells of unspecified significance.

to HC2, LBC, Non-commercial PCR assays, p16 and HPV Proofer were less sensitivebut more specific while CC, Abbott RT PCR hrHPV, Cobas-4800 were as sensitive andspecific (see table 2).

Triage of women with LSIL to detect CIN2+ Figure 4 and the absolute diagnosticestimates presented in the supplementary material (Results1e.xlsx) show that HC2 wasthe most sensitive (0.94 [0.93, 0.95]) test but among the least specific (0.29 [0.27, 0.31])tests while HPV proofer was the least sensitive (0.64 [0.54, 0.73]) and the most specific

Prepared using sagej.cls

71

Page 85: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 15

372323

111

444

411

333

744

333

1044

322

844

131111

372323

111

444

411

333

744

333

1044

322

844

131111

HC2

CC

LBC

Generic PCR

Abbot

Linear Array

Cobas

p16

p16/ki67

HPV Proofer

APTIMA

0 0.5 1 N

Sensitivity Specificity

0 0.5 1 N

Posterior median [95% equal tailed credible intervals]

Contrast−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(All available studies)

Study−specific sensitivityand specificity

N: Number of studies

Figure 3. Abbot and Cobas were the most sensitive tests according to the arm- andcontrast-based models respectively in cervical intraepithelial neoplasia of grade three orworse in women with atypical squamous cervical cells of unspecified significance. Theposterior median of sensitivity estimated by the contrast-based model is in general moreextreme.

(0.73 [0.67, 0.78]) test detecting CIN2+ in LSIL cytology. Both the diagnostic odds ratioand superiority index presented in the supplementary material (Results1e.xlsx) indicateonce more that p16/Ki67 had the best discriminatory power with an estimated sensitivityand specificity of 0.86 [0.79, 0.91] and 0.63 [0.57, 0.69].

Triage of women with LSIL to detect CIN3+ The forest plot presented in figure 5 (seealso supplementary material: Results1.xlsx) shows that Abbott RT PCR hrHPV and

Prepared using sagej.cls

72

Page 86: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

16 Journal Title XX(X)

Table 2. Posterior relative sensitivity and specificity and the corresponding 95% credibleinterval of other tests relative to HC2 in detecting cervical intraepithelial neoplasia of gradethree or worse in women with atypical squamous cervical cells of unspecified significanceestimated by the arm-based model.

Test Relative Sensitivity Relative SpecificityMean Lower Upper Mean Lower Upper

HC2 1 1 1 1 1 1Conventional Cytology (CC) 0.80 0.31 1.06 1.18 0.67 1.66Liquid-Based Cytology (LBC) 0.83 0.69 0.94 1.45 1.24 1.63Generic PCR assays 0.84 0.69 0.96 1.13 0.83 1.41Abbott RT PCR hrHPV 1.03 0.94 1.08 0.98 0.70 1.22Linear Array 1.03 0.99 1.06 0.82 0.64 1.00Cobas-4800 1.03 0.97 1.07 0.99 0.75 1.23P16 0.88 0.79 0.95 1.34 1.16 1.51P16/Ki67 1.02 0.90 1.07 1.32 1.08 1.58HPV Proofer(mRNA) 0.86 0.76 0.95 1.59 1.43 1.74APTIMA(mRNA) 0.99 0.94 1.02 1.16 1.02 1.28

Table 3. Posterior relative sensitivity and specificity and the corresponding 95% equal-tailedcredible interval of other tests relative to HC2 in detecting cervical intraepithelial neoplasia ofgrade two or worse in women with low-grade squamous intraepithelial cervical lesions triageas estimated by the arm-based model.

Test Relative Sensitivity Relative SpecificityMean Lower Upper Mean Lower Upper

HC2 1 1 1 1 1 1Conventional Cytology (CC) 0.86 0.72 0.96 1.50 1.15 1.91Liquid-Based Cytology (LBC) 0.82 0.70 0.93 1.87 1.54 2.22Generic PCR assays 0.87 0.77 0.94 1.26 0.95 1.59Abbott RT PCR hrHPV 0.98 0.91 1.03 1.21 0.92 1.54Linear Array 0.98 0.93 1.02 0.97 0.77 1.20Cobas-4800 0.97 0.90 1.02 1.14 0.85 1.45P16 0.83 0.77 0.89 2.07 1.84 2.30P16/Ki67 0.91 0.84 0.97 2.18 1.93 2.42HPV Proofer(mRNA) 0.68 0.58 0.77 2.50 2.25 2.76APTIMA(mRNA) 0.95 0.90 0.99 1.43 1.24 1.63

Linear Array were the most sensitive but among the least specific tests in detectingCIN3+ in women with LSIL. The diagnostic odds ratio indicate that Abbott RT PCRhrHPV had the best discriminatory power (sensitivity = 0.99 [0.95, 1.00], specificity =0.28 [0.20, 0.38]). The superiority index indicates further that p16/Ki67 had equally bestdiscriminatory power as Abbott RT PCR hrHPV with sensitivity and specificity of 0.95[0.87, 0.98] and 0.47 [0.36, 0.57]. According to table 3 and 4 Abbott RT PCR hrHPV,Linear Array and Cobas-4800 were as sensitive and as specific while most of the rest of

Prepared using sagej.cls

73

Page 87: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 17

553535

533

544

1022

444

1055

533

1788

1066

1066

131111

553535

533

544

1022

444

1055

533

1788

1066

1066

131111

HC2

CC

LBC

Generic PCR

Abbot

Linear Array

Cobas

p16

p16/ki67

HPV Proofer

APTIMA

0 0.5 1 N

Sensitivity Specificity

0 0.5 1 N

Posterior median [95% equal tailed credible intervals]

Contrast−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(All available studies)

Study−specific sensitivityand specificity

N: Number of studies

Figure 4. Abbot and Cobas were the most sensitive tests according to the arm- andcontrast-based models respectively in cervical intraepithelial neoplasia of grade three orworse in women with atypical squamous cervical cells of unspecified significance. Theposterior median of sensitivity estimated by the contrast-based model is in general moreextreme.

the tests were less sensitive but more specific as HC2 in detecting CIN2+ or CIN3+ inLSIL triage.

Variance Components The total variability in sensitivity (in the logit scale) from acompound symmetry working variance-covariance structure ranged from 0.23 [0.02,0.76] (see supplementary material: Results1e.xlsx) in tests used to detect CIN3+ in ASC-US triage to 0.63 [0.40, 1.01] in tests used to detect CIN2+ in LSIL triage. The percentage

Prepared using sagej.cls

74

Page 88: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

18 Journal Title XX(X)

312121

111

444

511

333

544

422

944

422

744

111010

312121

111

444

511

333

544

422

944

422

744

111010

HC2

CC

LBC

Generic PCR

Abbot

Linear Array

Cobas

p16

p16/ki67

HPV Proofer

APTIMA

0 0.5 1 N

Sensitivity Specificity

0 0.5 1 N

Posterior median [95% equal tailed credible intervals]

Contrast−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(Studies with atleasttwo tests one ofthem being HC2)

Arm−based model(All available studies)

Study−specific sensitivityand specificity

N: Number of studies

Figure 5. Linear array and HPV Proofer were the most sensitive and specific tests indetecting cervical intraepithelial neoplasia of grade three or worse in women with low-gradesquamous intraepithelial cervical lesions. Compared to the arm-based model, the credibleintervals from the contrast-based model are wider.

of total variability in logit sensitivity attributable to between study variability rangedfrom 42.52% [0.31%, 99.47%] in tests used to detect CIN3+ in LSIL triage to 81.38%[29.26%, 99.77%] in tests used to detect CIN2+ in ASC-US triage.

Similarly for logit specificity, the total variability ranged from 0.37 [0.25, 0.58] in testsused to detect CIN3+ in LSIL to 0.53 [0.40, 0.74] in tests detecting CIN2+ in ASC-UStriage. Of the total variability in logit specificity, as low as 61.51% [33.91%, 78.40%] intests used to detect CIN2+ in ASC-US triage and as high as 77.45% [59.42%, 87.33%]in tests used to detect CIN2+ in LSIL triage was due to between study heterogeneity.

Prepared using sagej.cls

75

Page 89: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 19

Table 4. Posterior relative sensitivity and specificity and the corresponding 95% credibleinterval of other tests relative to HC2 in detecting cervical intraepithelial neoplasia of gradethree or worse in women with low-grade squamous intraepithelial cervical lesions asestimated by the arm-based model.

Test Relative Sensitivity Relative SpecificityMean Lower Upper Mean Lower Upper

HC2 1 1 1 1 1 1Conventional Cytology (CC) 0.73 0.32 1.00 1.93 1.10 2.78Liquid-Based Cytology (LBC) 0.85 0.70 0.94 2.05 1.62 2.48Generic PCR assays 0.81 0.67 0.92 1.72 1.24 2.21Abbott RT PCR hrHPV 1.03 0.99 1.05 1.12 0.79 1.49Linear Array 1.03 1.00 1.05 0.90 0.66 1.19Cobas-4800 0.99 0.93 1.03 1.13 0.81 1.52P16 0.86 0.77 0.92 2.17 1.83 2.54P16/Ki67 0.98 0.91 1.03 1.86 1.41 2.32HPV Proofer(mRNA) 0.77 0.64 0.86 2.73 2.38 3.08APTIMA(mRNA) 1.00 0.96 1.03 1.41 1.19 1.65

In other words, there was a stronger correlation between any two logit specificities in agiven study than between any two logit sensitivities.

There was in general a stronger correlation between any two logit specificities in agiven study than between any two logit sensitivities as indicated by larger between studyvariability of logit specificity.

The correlation between logit sensitivity and logit specificity was negative but notstatistically significant except among tests used to detect CIN2+ in LSIL triage group(ρ = -0.61 [-0.88, -0.26]). The insignificant correlation parameters suggest absence ofoverall study effect in the respective data.

Sensitivity Analysis The sensitivity analysis did not highlight any particular changeon the mean structure for different priors of the variance-covariance parameters (seesupplementary material: Results1b-1d.xlsx). Based on the MCMC error sampling thevariance-covariance Σ was better sampled and less auto-correlated with LKJ and cauchydistributions.

AB versus CB model (black and red vs. blue)The data were re-analysed to compare the estimates from the AB and CB models wasperformed. Studies included in the re-analysis evaluated at least two tests with one ofthem being HC2 also set as the common comparator. The network plot of the studiesincluded in the re-analyses is shown in figure 6.

The posterior estimates of the re-analysis are in the supplementary materials. With alower WAIC, the compound symmetry structure assumed in the AB model also fittedthe reduced datasets more appropriately (see supplementary material: Results2a.xlsxvs. Results2b.xlsx). The CB model with unstructured variance-covariance structure was

Prepared using sagej.cls

76

Page 90: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

20 Journal Title XX(X)

unidentifiable. The block-diagonal variance-covariance structure modeled by the CBmodel fitted the reduced datasets better than the simple diagonal variance structure (seesupplementary material: Results3a.xlsx vs. Results3b.xlsx). Overall, the AB model hadlower WAIC than the best CB model and therefore fitted the reduced datasets moreappropriately. A graphical summary of the results from the second and third analysis

Triage:ASC−USOutcome:CIN2+

Number of studies = 42

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:ASC−USOutcome:CIN3+

Number of studies = 23

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:LSILOutcome:CIN2+

Number of studies = 35

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:LSILOutcome:CIN3+

Number of studies = 21

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Figure 6. Network plot: Included studies evaluated at least two tests with HC2 as thecommon comparator.

are presented in figures 2, 3, 4 and 5 and represented by the red and blue diamondsfrom the AB and CB models respectively. Overall, there are discrepancies between thelocations of the black, red and blue diamonds.

Prepared using sagej.cls

77

Page 91: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 21

The location of the black diamonds is estimated from all available data, includingstudies evaluating single tests while the location of the red and blue diamonds aredetermined by studies evaluating at least two studies with one of them being HC2.

As a consequence of excluding some studies, the location of the black (full) andred (reduced datasets) diamonds are different though obtained using the same ABmodel. This illustrates the obvious fact that exclusion of studies could lead to differentconclusions and recommendations.

The posterior median and the 95% equal-tailed credible intervals from the CB modelare quite different from those of the AB model even with similar datasets. The widercredible intervals from the CB model could be attributed to loss of information byignoring the correlation between the contrasts and as consequence the correlationbetween sensitivity and specificity. Moreover, the best fitting CB covariance structurewas the block-diagonal of order K2. The over-parameterisation of the covariancestructure might also have contributed to the inefficient estimation and potentially poorassessment of standard errors of the marginal means. Furthermore, difficulty in CB modelidentification is known to increase with number of tests of interest in the model4.

As a cascade effect, the ranking of the tests based on the DOR and the superiorityalso changes (see supplementary material: Results1e.xlsx vs. Results2e.xlsx vs.Results3b.xlsx).

DiscussionIn this paper, we propose a conceptually simple model to estimate sensitivity andspecificity of multiple tests within a network meta-analysis framework analogous to asingle-factor analysis of variance method with repeated measures.

The model is based on the assumption that all the tests were hypothetically usedbut missing at random. When the mechanism of missing data is not a crucial aspectof inference, models ignoring the missing value mechanism and only using the observeddata as the proposed model does provide valid answers under a missing at random (MAR)process. In contrast to the CB model, the proposed AB model uses all available data inline with principle of intention-to-treat (ITT)2. The missing ‘unobservable’ sensitivitiesand specificities are parameters are estimated along with the other parameters in themodel based on the exchangeability assumption. The cost however is that the modelassumptions cannot be formally checked from the data under analysis.

When the data were never intended to be collected in the first place, the MARassumption has been shown to hold as is the case in diagnostic studies where older testsbecome less used and new tests progressively more available with time35.

In the analysis, we included studies with at least one test. This is still acceptablebecause such studies still provide partial information allowing estimation of the meanand the variance-covariance parameters and only the study effects estimates might havelarger standard errors30.

The proposed AB model allows for easy estimation of the marginal means andcredible intervals for the intra-class correlation. Bayesian methods are known to becomputationally intensive but with efficient sampling algorithms such as Hamilton

Prepared using sagej.cls

78

Page 92: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

22 Journal Title XX(X)

Monte Carlo sampling implemented in Stan31 convergence to a stationary distribution isaccelerated even with poor initial values. Furthermore parallel chain processing greatlyreduces computational time.

With the logit transformation, it is assumed that the transformed data is approximatelynormal with constant variance. For binary data as well as proportions, the mean andvariance depend on the underlying probability. Therefore, any factor affecting theprobability will change the mean and the variance. This implies that a linear modelwhere the predictors affect the mean but assume a constant variance will not beadequate. Nonetheless, when the model for the mean is correct but the true distributionis not normal, the maximum likelihood (ML) estimates of the model parameters willbe consistent but the standard errors will be incorrect36. An alternative to the logittransformation would be a variance stabilizing angular transformation; however thevariance stabilizing property of the transform depends on each n being large37.

The natural and optimal modelling approach would be to use the beta distribution.This was the motivation behind our work on copula based bivariate beta distribution inmeta-analysis of diagnostic data 38;39. Our further research will focus on how differentmean and correlation structures are accommodated and modelled using the beta-binomialdistribution in network meta-analysis of diagnostic data.

There were discrepancies in identifying the best test between the DOR and thesuperiority index. While the range of values estimated by the two measures range from 0to infinity, the DOR yield larger values than the superiority index. From the full dataset,the superiority index consistently identified p16/Ki67 as the best test in detecting cervicalprecancer in with equivocal or mildly abnormal cervical cells. From the reduced data,the DOR identified tests with very low sensitivity but high specificity or vice-versaas the best and in disagreement with the superiority index. This illustrates that DORcannot distinguish between tests with high sensitivity but low specificity or vice-versa.In contrast, the superiority index gives more weight to tests performing relatively wellon both diagnostic accuracy measures and less weight on tests performing poorly onboth diagnostic measures or tests performing better on one measure but poorly on theother26. Nonetheless, both measures do not allow to prioritise one parameter whichmay be clinically appropriate. For example, presence of CIN2+ and especially CIN3+indicate a considerable risk of developing cervical cancer and therefore cases should notbe missed by a test. Consequently, higher test sensitivity has more clinical importancecompared to specificity.

Incoherence or inconsistency within NMA is a major concern where for the samecontrast, the direct and indirect evidence differ substantially. Lu and Ades (2006)40, Diaset al. (2010)41 and Krahn et al.(2014)42, explain how to visualize, detect and handleinconsistencies. Since the AB model implicitly assumes consistency, the methods usedto detect and quantify inconsistency in CB need not be used in the AB models.

For the AB models, Hong et al. (2015)43 measure inconsistency by data-drivenmagnitude of bias, the discrepancy between observed and imputed treatment(test) effectswhile Piepho (2014)44 classifies grouping of studies according to the set of tests includedand introduce an interaction term: designs by test, to represent inconsistency.

Prepared using sagej.cls

79

Page 93: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 23

From our viewpoint, inconsistency/incoherence is a form of heterogeneity betweenthe studies which is often due to missing information in an outlying or influential study.In our model, the influence of the study on the mean is adequately captured by study-effects and the fact that the model hypothetically allows any two tests to be compareddirectly within each study makes inconsistency less an issue. That said, it is importantto assess and identify influence of certain observations on the marginal mean. Detectionof influential observations within the Bayesian framework is a computationally involvedexercise and still an active research area.

This article does not consider individual-level data for which the model adaptationis automatic. Future research includes a study on impact of various aspects of datamissingness on the robustness of the models.

Colposcopy, as used in the studies of our motivating datasets, is not a perfect standard.The proposed methodology does not take this into account and results might be somewhatbiased and and important differences between tests maybe obscured45.

Approaches to minimize the bias due to the use of an imperfect gold standard includethe application of latent class models4, studying sensitivity and specificity in differentpopulations, use of an expert review panel to arrive at a less error-prone diagnosis8 andframing the problem in terms of clinical outcomes rather than just accuracy45. A reviewof these approaches is presented by Reitsma et al. (2009)46. The integration of suchapproaches within the AB network modelling is beyond the scope of this paper but isscheduled as future research.

Partial unbalanced verification of test-positive and -negative subjects may yieldsubstantial bias in the accuracy estimates. Overestimation of sensitivity andunderestimation of specificity is the typical result when the majority of test-positivesand only a small fraction of test-negatives are submitted to gold standard assessment.In the motivating dataset, partial verification was minimized in principle by restrictinginclusion of studies with (nearly) complete assessment with the reference standard12;13.There are other methods to adjust for verification bias8;47 when verification in all studysubjects is not possible.

Conclusion

The proposed AB model contributes to the knowledge on methods used in systematicreviews of diagnostic data in presence of more than two competing tests. The AB modelis more appealing than the CB model for meta-analyses of diagnostic studies becausethe model parameters permit a more straightforward interpretation and uses all availabledata. Furthermore, more natural variance-covariance matrix structures can be easilyaccommodated.

References

1. Higgins J and Whitehead A. Borrowing strength from external trials in a meta-analysis.Statistics in medicine 1996; 15(24): 2733–2749.

Prepared using sagej.cls

80

Page 94: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

24 Journal Title XX(X)

2. Fisher L, Dixon D, Herson J et al. Intention to treat in clinical trials. In E PK (ed.) Statisticalissues in drug research and development. New York: Marcel Dekker, 1989. pp. 331–350.

3. Lumley T. Network meta-analysis for indirect treatment comparisons. Statistics in medicine2002; 21(16): 2313–2324.

4. Menten J and Lesaffre E. A general framework for comparative bayesian meta-analysis ofdiagnostic studies. BMC medical research methodology 2015; 15(1): 1.

5. Senn S. The many modes of meta. Drug Information Journal 2000; 34(2): 535–549.6. Whitehead A. Meta-analysis of controlled clinical trials, volume 7. West Sussex: John Wiley

& Sons, 2002.7. Piepho HP, Williams E and Madden L. The use of two-way linear mixed models in

multitreatment meta-analysis. Biometrics 2012; 68(4): 1269–1277.8. Zhou XH, Obuchowski NA and McClish DK. Design of Diagnostic Accuracy Studies.

Hoboken, New Jersey: John Wiley & Sons, 2011.9. Sinclair JC and Bracken MB. Clinically useful measures of effect in binary analyses of

randomized trials. Journal of clinical epidemiology 1994; 47(8): 881–889.10. Davies HTO, Crombie IK and Tavakoli M. When can odds ratios mislead? Bmj 1998;

316(7136): 989–991.11. Zhang J, Carlin BP, Neaton JD et al. Network meta-analysis of randomized clinical trials:

Reporting the proper summaries. Clinical Trials 2014; 11(2): 246–262.12. Arbyn M, Ronco G, Anttila A et al. Evidence regarding human papillomavirus testing in

secondary prevention of cervical cancer. Vaccine 2012; 30: F88–F99.13. Arbyn M, Roelens J, Simoens C et al. Human papillomavirus testing versus repeat cytology

for triage of minor cytological cervical lesions. Cochrane Database of Systematic Reviews2013; 3.

14. Arbyn M, Roelens J, Cuschieri K et al. The aptima hpv assay versus the hybrid capture 2test in triage of women with asc-us or lsil cervical cytology: A meta-analysis of the diagnosticaccuracy. International Journal of Cancer 2013; 132(1): 101–108.

15. Roelens J, Reuschenbach M, von Knebel Doeberitz M et al. p16ink4a immunocytochemistryversus human papillomavirus testing for triage of women with minor cytologic abnormalities.Cancer cytopathology 2012; 120(5): 294–307.

16. Verdoodt F, Szarewski A, Halfon P et al. Triage of women with minor abnormal cervicalcytology: Meta-analysis of the accuracy of an assay targeting messenger ribonucleic acid of 5high-risk human papillomavirus types. Cancer cytopathology 2013; 121(12): 675–687.

17. Bosch F, Lorincz A, Munoz N et al. The causal relation between human papillomavirus andcervical cancer. Journal of clinical pathology 2002; 55(4): 244–265.

18. Arbyn M, Ronco G, Cuzick J et al. How to evaluate emerging technologies in cervical cancerscreening? International Journal of Cancer 2009; 125(11): 2489–2496.

19. Daniels MJ and Pourahmadi M. Modeling covariance matrices via partial autocorrelations.Journal of Multivariate Analysis 2009; 100(10): 2352–2363.

20. Reitsma JB, Glas AS, Rutjes AW et al. Bivariate analysis of sensitivity and specificity producesinformative summary measures in diagnostic reviews. Journal of clinical epidemiology 2005;58(10): 982–990.

21. Chu H and Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data:a generalized linear mixed model approach. Journal of clinical epidemiology 2006; 59(12):

Prepared using sagej.cls

81

Page 95: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 25

1331–1332.22. Altham PM. Improving the precision of estimation by fitting a model. Journal of the Royal

Statistical Society, Series B 1984; 46(1): 118–119.23. Verbeke G and Molenberghs G. Inference for the variance components. In Linear Mixed

Models for Longitudinal Data. New York: Springer Series in Statistics, 2002. p. 64.24. Watanabe S. Asymptotic equivalence of bayes cross validation and widely applicable

information criterion in singular learning theory. Journal of Machine Learning Research 2010;: 3571–3594.

25. Glas AS, Lijmer JG, Prins MH et al. The diagnostic odds ratio: a single indicator of testperformance. Journal of clinical epidemiology 2003; 56(11): 1129–1135.

26. Deutsch R, Mindt MR, Xu R et al. Quantifying relative superiority among many binary-valueddiagnostic tests in the presence of a gold standard. Journal of Data Science 2009; 7(2): 161–177.

27. Rubin DB. Inference and missing data. Biometrika 1976; 63(3): 581–592.28. Little RJ and Rubin DB. Statistical analysis with missing data. New York: John Wiley &

Sons, 1987.29. Lewandowski D, Kurowicka D and Joe H. Generating random correlation matrices based on

vines and extended onion method. Journal of multivariate analysis 2009; 100(9): 1989–2001.30. Gelman A and Hill J. Multilevel linear models: varying slopes, non-nested models, and

other complexities. In Data Analysis Using Regression and Multilevel/Hierarchical Models.Cambridge, United Kingdom: Cambridge University Press, 2006. p. 286.

31. Carpenter B, Gelman A and Hoffman M. Stan: A probabilistic programming language.Journal of Statistical Software In press; .

32. Hoffman MD and Gelman A. The no-u-turn sampler: Adaptively setting path lengths inhamiltonian monte carlo. The Journal of Machine Learning Research 2014; 15(1): 1593–1623.

33. Team RC. A Language and Environment for Statistical Computing. R Foundation forStatistical Computing. Vienna, Austria, 2015. URL http://www.R-project.org/.

34. Team SD. Stan: A C++ Library for Probability and Sampling, Version 2.9.0, 2015. URLhttp://mc-stan.org/.

35. Schafer JL and Graham JW. Missing data: our view of the state of the art. Psychologicalmethods 2002; 7(2): 147.

36. Agresti A. Generalized linear models for counts. In Categorical Data Analysis, 2nd ed. NewYork: John Wiley & Sons, 2002. p. 131.

37. Crowder MJ. Beta-binomial anova for proportions. Applied Statistics 1978; : 34–37.38. Nyaga V. CopulaDTA: Copula based bivariate beta-binomial model for diagnostic

test accuracy studies, 2015. URL https://cran.r-project.org/package=

CopulaDTA. R package version 0.0.2.39. Nyaga V, Aerts M and Arbyn M. Marginal models for meta-analysis of diagnostic accuracy

studies in frequentist and bayesian framework using rstan and copularemada. Archives ofPublic Health 2015; 73(Suppl 1): O4.

40. Lu G and Ades A. Combination of direct and indirect evidence in mixed treatmentcomparisons. Statistics in medicine 2004; 23(20): 3105–3124.

Prepared using sagej.cls

82

Page 96: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

26 Journal Title XX(X)

41. Dias S, Welton N, Caldwell D et al. Checking consistency in mixed treatment comparisonmeta-analysis. Statistics in medicine 2010; 29(7-8): 932–944.

42. Krahn U, Binder H and Konig J. Visualizing inconsistency in network meta-analysis byindependent path decomposition. BMC medical research methodology 2014; 14(1): 131.

43. Hong H, Chu H, Zhang J et al. A bayesian missing data framework for generalized multipleoutcome mixed treatment comparisons. Research Synthesis Methods 2015; .

44. Piepho HP. Network-meta analysis made easy: detection of inconsistency using factorialanalysis-of-variance models. BMC medical research methodology 2014; 14(1): 61.

45. Valenstein PN. Evaluating diagnostic tests with imperfect standards. American Journal ofClinical Pathology 1990; 93(2): 252–258.

46. Reitsma JB, Rutjes AW, Khan KS et al. A review of solutions for diagnostic accuracy studieswith an imperfect or missing reference standard. Journal of clinical epidemiology 2009; 62(8):797–806.

47. Arbyn M, Ronco G, Cuzick J et al. How to evaluate emerging technologies in cervical cancerscreening? International journal of cancer 2009; 125(11): 2489–2496.

Funding

Nyaga V received financial support from the Scientific Institute of Public Health (Brussels) throughthe OPSADAC project. Arbyn M was supported by the COHEAHR project funded by the 7thFramework Programme of the European Commission (grant No 603019). Aerts M was supportedby the IAP research network nr P7/06 of the Belgian Government (Belgian Science Policy).

Acknowledgements

We are grateful to the two anonymous reviewers for all their valuable comments that helped toimprove the paper considerably.

Supplemental material

1. Model-code.txt: A text file with code of the fitted models in Stan language.2. Model-fitting.txt: A text file with code to fit the models and reproduce the results.3. mydata.csv: A file with the data used in this analysis.4. Results1a-e.xlsx: A file with posterior diagnostic accuracy estimates as estimated by the AB

model from all the available data.5. Results2a-e.xlsx: A file with posterior diagnostic accuracy estimates as estimated by the AB

model from studies that evaluated at least two tests with one of them being HC2.6. Results3a-b.xlsx: A file with posterior diagnostic accuracy estimates as estimated by the CB

model from studies that evaluated at least two tests with one of them being HC2.

Prepared using sagej.cls

83

Page 97: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

5. Beta-binomial Analysis of Variance Model for NetworkMeta-analysis of Diagnostic Test Accuracy Data

This chapter has been published as:Nyaga VN, Arbyn M and Aerts M. Beta-binomial analysis of variance model fornetwork meta-analysis of diagnostic test accuracy data. Stat Methods Med Res, 2016.Prepublished Jan 1, 2016; DOI: 10.1177/0962280216682532

Page 98: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 99: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Beta-binomial analysis ofvariance model for networkmeta-analysis of diagnostic testaccuracy data

Journal TitleXX(X):1–18c©The Author(s) 2016

Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/ToBeAssignedwww.sagepub.com/

Victoria N Nyaga1,2, Marc Arbyn1 and Marc Aerts2

AbstractThere are several generalized linear mixed models to combine direct and indirect evidenceon several diagnostic tests from related but independent diagnostic studies simultaneouslyalso known as network meta-analysis. The popularity of these models is due to the attractivefeatures of the normal distribution and the availability of statistical software to obtain parameterestimates. However, modeling the latent sensitivity and specificity using the normal distributionafter transformation is neither natural nor computationally convenient.In this article, we develop a meta-analytic model based on the bivariate beta distribution,allowing to obtain improved and direct estimates for the global sensitivities and specificitiesof all tests involved, and taking into account simultaneously the intrinsic correlation betweensensitivity and specificity and the overdispersion due to repeated measures.Using the beta distribution in regression has the following advantages, that the probabilitiesare modeled in their proper scale rather than a monotonic transform of the probabilities.Secondly, the model is flexible as it allows for asymmetry often present in the distributionof bounded variables such as proportions, which is the case with sparse data common inmeta-analysis. Thirdly, the model provides parameters with direct meaningful interpretationsince further integration is not necessary to obtain the meta-analytic estimates.

Keywordsnetwork meta-analysis, diagnostic studies, generalized linear mixed models, beta distribution,proportions, meta-analysis

1Scientific Institute of Public Health, Unit of Cancer Epidemiology, Belgian Cancer Center, Brussels, Belgium2Hasselt University, I-Biostat, Diepenbeek, Belgium

Corresponding author:Marc ArbynEmail: [email protected]

Prepared using sagej.cls [Version: 2015/06/09 v1.01]

86

Page 100: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

2 Journal Title XX(X)

Introduction

Network meta-analysis combine direct and indirect evidence from related but independent studieson all outcomes simultaneously. This improves the estimation process by sharing of informationbetween studies and thus yielding more precise estimates, especially for diagnostic tests evaluatedin a small number of studies. The comprehensive and unified analysis uses the data moreefficiently, decreases the chance of finding spurious significant treatment effects, and avoids themultiplicity problems where adjustment for multiple comparisons might be needed1. Furthermore,comparison of more than two tests is of greater relevance to different stakeholder (clinicians, policymakers, epidemiologists) in making decisions on which set of tests to use in practice.

Several methods falling within the general generalized linear mixed models (GLMM) forcomparing more than two diagnostic tests have been proposed. The earliest proposal by Siadaty etal.2, adopted generalized estimating equations on studies with at least one tests. The model treatsthe correlation as nuisance parameter and corrects for overdispersion without modeling it. Achanaet al.3 and Dimou et al.4 extended the classic random effects meta-analysis5 and fitted a generallinear model to the logit transformed sensitivity and specificity for two or more diagnostic tests.Such models require use of an ad hoc continuity correction when the number of true positives, truenegatives, false positives, or false negatives is zero in a study. Menten and Lessafre6 proposed acontrast-based model based on the relative odds ratio of other tests against a common benchmarktest and made the assumption that the two tests were assessed on paired patients within eachstudy. This model requires studies to evaluate at least two diagnostic tests, focuses on the oddsratio, and therefore has certain limitations.6–8 To overcome these limitations, Nyaga, Aerts, andArbyn9 proposed an arm-based (AB) logistic-binomial hierarchical model following the conceptof repeated measures and the assumption that the missing tests/arms were missing at random.

GLMMs are a part of the unified theory to model the dependence structure owing to theattractive features of the normal distribution and availability of statistical software to obtainparameter estimate from such models. These models assume that the transformed study-specificsensitivity and specificity (often logit transformed) have approximately a normal distributionwith constant variance. For proportions such a model is structurally flawed because the variancedepends on the underlying probability. The constant variance condition is rarely satisfied (indeedas the probability moves to 0 or 1, the variance moves toward zero and is highest when theprobability is 0.5). Furthermore, the dependence between the mean and variance implies that theparameter space for the two parameters is constrained. This is in contrast with the functionallyindependent and unbounded variance parameters in the normal distribution.

The assumption of normality for the random effects is not natural and difficult to defendempirically when there is a small number of studies. With a small number of studies, there could beproblems estimating the between study variability requiring informative priors,10 which are oftendifficult to specify. Alternatively, the often unrealistic assumption of homogeneity is applied.11

Since the random effects are latent, it is difficult to validate a model, yet the particular parametricmodel for the random effects can possibly impact the conclusions of the meta-analysis.10;12

Furthermore, the parameters have a study-specific interpretation and do not naturally describethe overall mean sensitivity and specificity and the heterogeneity thereof. Finally, integrating

Prepared using sagej.cls

87

Page 101: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 3

out the random effects to obtain the mean sensitivity and specificity is complex. There existsno analytical solution requiring numerical integration using markov chain monte carlo (MCMC)or other approximation techniques.

Since both sensitivity and specificity take values in the interval (0, 1), it is natural to modelthem using a bivariate beta without the need for a transformation. This allows for the asymmetrytypically present in the distribution of variables with restricted range such as proportions andaccommodates for overdispersion appropriately. The beta distribution is conjugate to the binomialdistribution and therefore the random effects are easily integrated out resulting in a beta-binomialmarginal distribution. Such a modeling approach yields direct marginal effects, averaged over allthe studies, as mostly of interest in meta-analyses.

Within the AB model framework, we develop a statistical model built by marginal betadistributions for sensitivity and specificity being linked by a copula density. The resulting bivariatebeta density describes the distribution of sensitivity and specificity jointly. The advantage of usingthe copula approach is that model estimation proceeds in two separate stages. First, the marginaldistributions for sensitivity and specificity are estimated separately and then the dependencestructure is estimated. Furthermore different copula densities can be used each resulting in a newdifferent bivariate beta density. The model builds further on the work of Nikoloulopoulos13 andour previous work on the use of copulas in meta-analysis of diagnostic accuracy data14 and ABnetwork meta-analysis of diagnostic accuracy data.9

In the next section, we give a description of an example data set to which the model developedin this paper is applied. The data set is derived from a comprehensive series of meta-analyses onthe accuracy of triage with human papillomavirus (HPV) assays, cervical cytology, or molecularmarkers applied on cervical specimens in women with minor cervical abnormalities. We thendescribe the proposed method, followed by a brief introduction of copula theory, missing data, andexchangeability. This is then followed by the parameter estimation procedure. We then present theresults of the data, followed by a discussion and conclusion.

Data set

To demonstrate how the developed model can be applied to a real data set, we examine the accuracyof 11 tests in detecting cervical pre-cancer in women with equivocal or mild abnormal cervicalcells previously detected by a pap smear. A Pap smear is a screening test used to detect cervicalpre-cancer. When abnormalities in the Pap smear are not high grade, a triage test is needed toidentify the women who need referral for further diagnostic work-up. There are several triageoptions, such as repetition of the Pap smear or triage with HPV assays, cervical cytology, ormolecular markers.

We combine information from 125 studies from series of meta-analyses15–19 that examinedat least one and a maximum of six triage tests making a total of 11 triage tests used for thedetection of intraepithelial neoplasia of grade two or worse (CIN2+) or of grade three or worse(CIN3+) in women with atypical squamous cells of unspecified significance (ASC-US) or low-grade squamous intraepithelial lesions (LSILs). Presence or absence of CIN2+ and CIN3+ wasascertained by colposcopy and/or biopsy. This was followed by a histological verification of

Prepared using sagej.cls

88

Page 102: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

4 Journal Title XX(X)

biopies specimen when colposcopy was positive, assuming absence of disease when colposcopywas negative.

Triage:ASC−USOutcome:CIN2+

Number of studies = 107

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:ASC−USOutcome:CIN3+

Number of studies = 51

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:LSILOutcome:CIN2+

Number of studies = 82

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Triage:LSILOutcome:CIN3+

Number of studies = 44

HC2CC

LBC

Generic PCR

Abbot

Linear ArrayCobas

p16

p16/ki67

HPV Proofer

APTIMA

Figure 1. Connected network plots showing that test HC2 and APTIMA were the most evaluated diagnostic testsused to triage women with minor cervical cytological abnormalities. The size of the nodes and the width of theconnecting lines is proportional to the number of test evaluations and the number of test comparisonsrespectively.

Labeled 1 to 11 the tests were: high-risk human papillomavirus (hrHPV DNA testing withhybrid capture-2 (HC2), Conventional Cytology (CC), Liquid-Based Cytology (LBC), generic

Prepared using sagej.cls

89

Page 103: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 5

PCRs targeting hrHPV DNA (PCR), and commercially available PCR-based hrHPV DNAassays such as: Abbott RTPCR hrHPV, Linear Array, and Cobas-4800; assays detecting mRNAtranscripts of 5 (HPV Proofer) or 14 (APTIMA) HPV types; and protein markers identified bycyto-immunochemistry such as: p16 and p16/Ki67, which are over-expressed as a consequenceof a transforming HPV infection. The evidence network is plotted in Figure 1, with each noderepresenting a test and its size proportional to the number of studies evaluating the test. Theedges represent a pair of tests in which at least one comparison exists with width proportionalto the number of direct comparison. The network plots show that HC2 and APTIMA are the mostevaluated tests.

Methodology

Copula functionTo model the correlation between sensitivity (say, x) and specificity (say, y) for a given test, thebivariate density f(x, y) needs to be defined. f(x, y) can be constructed in several ways one ofwhich follows the copula theory.

Following Sklar’s theorem,20;21 there exists for every bivariate distribution a copularepresentation C, which is unique for continuous random variables. The bivariate cumulativedistribution function of sensitivity and specificity can be written as

F (x, y) = C(F (x), F (y), ω) (1)

where C is the copula function and ω a parameter to capture the dependence structure. Bydifferentiating the copula, the bivariate probability density of sensitivity and specificity is obtainedand written as

f(x, y) = f(x)f(y)c(F (x), F (y), ω) (2)

where f(.) is the marginal density of sensitivity or specificity and c the copula density functioncorresponding to the copula function C. The copula theory avoids the assumption of normalitywhen modeling non-normal data and allows the marginal distribution and the dependence structureto be estimated separately.

The correlation between x and y is captured by the parameter ω and can be quantified by bothSpearman correlation ρ and Kendall’s tau τ as follows

ρ = 12

∫ ∫ 1

0C(F (x), F (y), ω)dC(F (x), F (y), ω)− 1 (3)

τ = 12

∫ ∫ 1

0F (x)F (y)dC(F (x), F (y), ω)− 3 (4)

While it is expected that sensitivity and specificity are negatively correlated, it could also bepositive in some situations and therefore, it is advisable to consider copula functions that modelnegative correlations as well as both negative and positive correlation. The analyst constructs thebivariate density by specifying a particular parametric form for f(x), f(y), and c(F (x), F (y), ω).

Prepared using sagej.cls

90

Page 104: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

6 Journal Title XX(X)

It is not necessary that f(x) and f(y) have similar parametric form. There are many differentcopulas to express the dependence between sensitivity and specificity. A non-exhaustive list ofcandidate copula functions is available in Nelsen.21

In our analysis, we will consider the Frank’s22 copula because it is extremely flexible inmodeling the correlation parameter. The function form of the copula is given by

C(F (x), F (y), ω) = − 1

ωlog

(1 +

(e−ωF (x) − 1)(e−ωF (y) − 1)

e−ω − 1

)−∞ < ω <∞ (5)

Since Frank’s copula is absolutely continuous, the bivariate copula density can be written as

c(F (x), F (y), ω) =∂2C(F (x), F (y), ω)

∂F (x)∂F (y)(6)

=ω(1− e−ω)e−ω(F (x)+F (y))

(1− e−ω − (1− e−ωF (x))(1− e−ωF (y)))2

The Spearman’s rho and Kendall’s tau are expressed as follows

ρ =1− 12D2(−ω)−D1(−ω)

ω(7)

τ =1− 12D1(−ω)− 1

ω

with

Dj(−ω) =j

ωj

∫ ω

0

tj

et − 1j = 1, 2 (8)

As ω approaches 0, the variables are independent. For ω > 0, the variables exhibit positiveassociation and vice versa for ω < 0.

OverdispersionSuppose there are K tests and I studies. Studies with (k = 2) are called ‘two-arm’ studieswhile those with k > 2 are ‘multi-arm’ studies. For a certain study i, let (Yi1k, Yi2k) denotethe observed true positives and observed true negatives, (Ni1k, Ni2k) the diseased and healthyindividuals and πjk represent the latent sensitivity (j = 1) and specificity (j = 2) respectively oftest k = 1, . . . ,K.

Due to the within-, between-study heterogeneity and other sources of variations, it is often thecase that latent variables πjk are not homogeneous but rather vary by study. As such the standardbinomial model

Yijk ∼ bin(πjk, Nijk) (9)

will be inappropriate as the model yields underestimated standard errors and consequently resultsin misleading inference for the regression parameters.

The data will generally exhibit greater variability than is specified by the implicit mean–variancerelationship, a phenomenon called overdispersion. A general form to merely correct for

Prepared using sagej.cls

91

Page 105: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 7

overdispersion without modeling it is to assume a more general variance function by includinga dispersion coefficient. The standard errors for πjk are then multiplied by the square root of thedispersion coefficient. This is the concept behind the quasi-likelihood models.

Alternatively, overdispersion is modeled in a hierarchical model. The latent variables aregiven a distribution leading to a compound probability model. Due to availability of the normaldistribution in most common statistical software, the latent variables are usually assumed to benormally distributed following a transformation of the latent variables to the real line. Nonetheless,the resulting compound distribution usually has a complex form and numerical approximationmethods are often used to obtain the parameter estimates. Such a GLMM using the logit link isillustrated by Nyaga, Aerts, and Arbyn9 in the following normal-binomial model

Yijk|πijk ∼ bin(πijk, Nijk)

logit(πijk) = γjk + ηij + εijk(ηi1ηi2

)∼ N

((00

),Σ

)

Σ =

[σ21 ρσ1σ2

ρσ1σ2 σ22

]

(εij1, εij2, . . . δijK) ∼ N(0, diag(τ 2j )) (10)

E(logit(πijk)) = γjk

where πijk = (πi1k, πi2k) represent the latent sensitivity and specificity respectively of test kin study i on the logit scale. logit−1(γ1k) and logit−1(γ2k) are the kth mean sensitivity andspecificity in a hypothetical study with random-effects equal to zero respectively.7 ηij is thestudy effect for healthy (j = 1) or diseased individuals (j = 2) and represents the deviation ofa particular study i from the mean sensitivity (j = 1) or specificity (j = 2) in the logit scale. εijkis the error associated with the sensitivity (j = 1) or specificity (j = 2) of test k in the ith studyon the logit scale.

The regression parameters logit−1(γjk) have a conditional or study-specific interpretation dueto the nonlinearity of the logit link. The more useful population-averaged sensitivity and specificityof the kth test are derived through the following complex integration

E(πijk) =

∫logit−1(γjk + ηij + εijk)f(ηi1, ηi2)f(εijk)dηi1dηi2dεijk) (11)

Using MCMC methods,E(πijk) is approximated by, with I the number of simulated values of(ηi1, ηi2), (εijk) according to their respective distributions

ˆπijk =

∑Ii=1 logit

−1(γjk + ηij + εijk)

I(12)

In general, the values of logit−1(γjk) are greater in absolute value than E(πijk) and increasewith the variance of the random effects. The between-study variance involves two integrals and is

Prepared using sagej.cls

92

Page 106: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

8 Journal Title XX(X)

derived as follows

var(πijk) =

∫(logit−1(γjk + ηij + εijk)− E(πijk))

2f(ηi1, ηi2)f(εijk)dηi1dηi2dεijk) (13)

Since πijk lies within 0 and 1, a more natural, flexible, and computationally more convenientdistribution to model the latent variables is the beta distribution, which is conjugate to the binomialdistribution. Following the copula theory (see equation ((2))), the lower level of hierarchy is thusmodeled as follows

(πi1πi2

)∼ 2∏

j=1

f(πijk)

c(F (πi1k), F (πi2k), ωk)

f(πijk|µjk, θj , δjk) =Beta(µjk, θjδjk)0 < µjk, θj , δjk < 1 (14)

The natural beta parameters (both restricted to be positive) are derived as

αjk =µjk

[1− θjδjkθjδjk

](15)

βjk =[1− µjk][[1− θjδjkθjδjk

](16)

where µjk is the mean sensitivity (j = 1) or specificity (j = 2) of test k, ωk captures theassociation between sensitivity and specificity in test k, θj captures the common overdispersionamong the sensitivities (j = 1) or specificities (j = 2) in a given study as a result of repeated testsin a study, and ρjk = θjδjk quantifies the extra-parametric variation in the data of which log(δjk)

log(θjδjk)

is attributed to between-study variability and log(θj)log(θjδjk)

to within-study variability. θjδjk = 0

corresponds to the case of no extra-binomial variation.

Missing data and exchangeability assumptionThe model easily incorporates studies reporting one or more tests, but not necessarily all K tests,under a missing at random23 assumption, without the need to impute data on the missing binomialcounts with debatable imputation methods. The imposed correlation structure allows borrowinginformation between the latent sensitivity and specificity across studies.

The observed information (Yijk, Nijk) generically represents a point estimate of πijk andcontributes to the estimation of (µjk, θj , δjk). Conditional on (µjk, θj , δjk), πijk are assumedexchangeable allowing generation of the missing πijk without any knowledge about an unobserved(Yijk, Nijk). By exchangeability, we imply that the πijk are a priori similar yet nonidentical. As aconsequence πijk implies information about πijk for a particular study i and also draws on relevantinformation about other studies via the hierarchical structure.24 In the frequentist framework,this is equivalent to the direct likelihood methodology25 to analyze correlated data for ignorablemissing-data mechanism.26 By including both fixed and random effects, the correlation structureallows for the appropriate adjustments to the parameters even when the data are incomplete.27

Prepared using sagej.cls

93

Page 107: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 9

Estimation

The models presented are fitted in the Bayesian framework using Stan28 within R 3.3.129 usingthe rstan 2.11.1 package. Basically, Stan is a probabilistic modeling language for statisticians towrite models in a familiar notation. The model is transformed to efficient C++ code and compiledinto an executable program. This makes implementation of complex models easy and fast.

Within the Bayesian paradigm, model specification is completed by assigning prior distributionsto the hyperparameters. Assigning proper prior distributions is a key aspect in Bayesian analysisbecause it reflects the amount of information known forehand about a parameter and can influencethe properties of the resulting posterior distribution.30

When there is prior information from a previous study or literature review, it can be assimilatedinto the estimation procedure by specifying informative priors; otherwise a vague or weaklyinformative prior is used to reflect prior ignorance.

Since the hyper-parameters (µjk, θj , δjk) all range between 0 and 1, beta(1, 1) = U(0, 1) priordistributions were applied. While it is expected that with vague prior distributions and large samplesizes, the influence of the prior distribution on the posterior inference will be minimal, it is goodpractice to do a sensitivity analysis by using different reasonable prior distributions and comparingthe posterior inferences. Sensitivity analysis is necessary because it is not always clear whencertain choices of prior distributions are vague or non-informative in relation to the size of thedataset. The second choice of prior distributions used for the sensitivity analysis were of the form(logit(µjk), logit(θj), logit(δjk)) ∼ N(0, 100)

Compared to other MCMC sampling algorithm, Stan sampling algorithm requires less samplingto reach the target distribution. As a consequence, we drew 4000/7000 samples from each ofthree chains, discarded the first 1000 samples and kept every 5th/10th draw to make inferencebased on a total of 1800 samples. Trace plots, the potential scale reduction factor, the effectivesample size, and MCMC error were used as the standard diagnostics. With proper chain mixingand convergence to the target distribution, the potential scale reduction factor is expected to beless than 1.1, the MCMC error relatively smaller than the parameters’ standard deviations (ratio≤ 0.05) and the effective sample size approximately equal to the total post-warm-up samples.

Results

Detection of CIN2+ in triage of women with ASC-US

Abbott RT-PCR hrHPV was the most sensitive (0.93 [0.85, 0.98]) but the least specific (0.39 [0.26,0.53]) test, while HPV Proofer (mRNA) was the most specific (0.74 [0.66, 0.82]) though amongthe least sensitive (0.73 [0.64, 0.81]) test (Table 1).

Detection of CIN3+ in triage of women with ASC-US

Abbott RT-PCR hrHPV, Linear Array, and Cobas-4800 were the most sensitive but among the leastspecific tests, while HPV Proofer (mRNA) was the most specific (0.73 [0.61, 0.83]) without muchloss in sensitivity (0.83 [0.79, 0.89]) (Table 2).

Prepared using sagej.cls

94

Page 108: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

10 Journal Title XX(X)

LBC, p16, and HPV Proofer (mRNA) were all less sensitive but more specific than HC2, whilemost of the other tests were as sensitive as and as specific as HC2 in detecting CIN2+ or CIN3+in triage of women with ASC-US.

Detection of CIN2+ in triage of women with LSILHC2 was the most sensitive (0.94 [0.92, 0.95]) but among the least specific (0.30 [0.26, 0.34]) test,while HPV Proofer (mRNA) was the most specific (0.72 [0.64, 0.79]) but the least sensitive (0.68[0.59, 0.75]) test (Table 3).

CC, LBC, p16, p16/Ki67, and HPV Proofer (mRNA) were all less sensitive but more specificthan HC2, while all other tests except Generic PCR assays were as sensitive as and as specific asHC2 in detecting CIN2+ in triage of women with ASC-US.

Detection of CIN3+ in triage of women with LSILLinear array was the most sensitive (0.98 [0.93, 1.00]) but the least specific tests (0.24 [0.15, 0.36]),while HPV Proofer (mRNA) was the most specific (0.68 [0.59, 0.76]) test with sensitivity of 0.77[0.67, 0.84]) (Table 4). LBC, Generic PCR assays, p16, and HPV Proofer (mRNA) were all lesssensitive but more specific than HC2, while p16/ Ki67 and APTIMA (mRNA) were as sensitive asbut more specific than HC2. All other tests except CC were as sensitive as and as specific as HC2.

Table 1. Posterior absolute and relative (vs. hybrid capture-2) sensitivity and specificity and their corresponding95% credible interval of tests detecting intra-epithelial neoplasia lesions of grade 2 or worse in women withatypical squamous cells of unspecified significance.

Absolute Sensitivity Absolute Specificity Relative Sensitivity Relative specificity Numberof studiesTest Mean Lower Upper Mean Lower Upper Mean Lower Upper Mean Lower Upper

Hybrid Capture-2 0.92 0.90 0.93 0.53 0.49 0.57 1 1 1 1 1 1 75Conventional Cytology 0.73 0.63 0.81 0.62 0.55 0.68 0.80 0.69 0.89 1.16 1.02 1.32 11Liquid-Based Cytology 0.73 0.63 0.81 0.69 0.57 0.78 0.79 0.69 0.89 1.29 1.05 1.50 6Generic PCR assays 0.88 0.81 0.93 0.57 0.50 0.63 0.96 0.89 1.02 1.07 0.91 1.21 12Abbott RT PCR hrHPV 0.93 0.85 0.98 0.39 0.26 0.53 1.01 0.93 1.07 0.74 0.48 1.01 5Linear Array 0.89 0.83 0.94 0.43 0.34 0.53 0.98 0.90 1.03 0.81 0.64 1.00 13Cobas-4800 0.90 0.83 0.95 0.58 0.43 0.70 0.98 0.90 1.04 1.09 0.81 1.33 5P16 0.81 0.74 0.86 0.67 0.59 0.74 0.88 0.81 0.94 1.26 1.09 1.43 20P16/Ki67 0.84 0.75 0.90 0.70 0.62 0.77 0.92 0.82 0.99 1.32 1.15 1.49 8HPV Proofer(mRNA) 0.73 0.64 0.81 0.74 0.66 0.82 0.80 0.69 0.88 1.39 1.21 1.59 11APTIMA(mRNA) 0.89 0.81 0.93 0.57 0.48 0.66 0.97 0.87 1.02 1.07 0.90 1.26 13

Correlation and overdispersionThere was generally negative but not significant correlation between sensitivity and specificityof different diagnostic tests (see supplementary materials: overdispersion.xlsx). There was moreheterogeneity in specificity than in sensitivity of which most was due to within- rather thanbetween-study differences.

Prepared using sagej.cls

95

Page 109: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Table 2. Posterior absolute and relative (vs. hybrid capture-2) sensitivity and specificity and their corresponding95% credible interval of tests detecting intra-epithelial neoplasia lesions of grade 2 or worse in women withlow-grade squamous intraepithelial lesions.

Absolute Sensitivity Absolute Specificity Relative Sensitivity Relative Specificity Numberof studiesTest Mean Lower Upper Mean Lower Upper Mean Lower Upper Mean Lower Upper

Hybrid Capture-2 0.94 0.92 0.95 0.48 0.43 0.54 1 1 1 1 1 1 37Conventional Cytology 0.65 0.26 0.93 0.62 0.34 0.86 0.70 0.28 0.99 1.29 0.68 1.81 1Liquid-Based Cytology 0.80 0.68 0.88 0.70 0.53 0.83 0.85 0.73 0.94 1.45 1.06 1.79 4Generic PCR assays 0.76 0.65 0.88 0.54 0.37 0.69 0.81 0.69 0.94 1.13 0.74 1.45 4Abbott RT PCR hrHPV 0.95 0.87 0.99 0.39 0.27 0.54 1.02 0.93 1.06 0.80 0.55 1.13 3Linear Array 0.95 0.91 0.98 0.40 0.28 0.51 1.02 0.97 1.05 0.83 0.57 1.08 7Cobas-4800 0.95 0.88 0.99 0.49 0.33 0.69 1.01 0.94 1.06 1.01 0.66 1.45 3P16 0.82 0.75 0.88 0.61 0.57 0.66 0.88 0.80 0.94 1.28 1.11 1.45 10P16/Ki67 0.93 0.79 0.99 0.68 0.52 0.82 0.99 0.84 1.06 1.41 1.08 1.74 3HPV Proofer(mRNA) 0.83 0.75 0.89 0.73 0.61 0.83 0.88 0.80 0.95 1.52 1.23 1.77 8APTIMA(mRNA) 0.92 0.88 0.96 0.52 0.43 0.61 0.99 0.94 1.03 1.08 0.87 1.29 13

Table 3. Posterior absolute and relative (vs. hybrid capture-2) sensitivity and specificity and their corresponding95% credible interval of tests detecting intra-epithelial neoplasia lesions of grade 3 or worse in women withatypical squamous cells of unspecified significance.

Absolute Sensitivity Absolute Specificity Relative Sensitivity Relative Specificity Numberof studiesTest Mean Lower Upper Mean Lower Upper Mean Lower Upper Mean Lower Upper

Hybrid Capture-2 0.94 0.92 0.95 0.30 0.26 0.34 1 1 1 1 1 1 55Conventional Cytology 0.79 0.64 0.91 0.47 0.35 0.60 0.84 0.68 0.97 1.59 1.16 2.06 5Liquid-Based Cytology 0.79 0.66 0.89 0.54 0.38 0.67 0.84 0.70 0.95 1.80 1.25 2.33 5Generic PCR assays 0.81 0.71 0.9 0.37 0.27 0.47 0.86 0.76 0.96 1.24 0.87 1.67 10Abbott RT PCR hrHPV 0.92 0.82 0.97 0.33 0.21 0.49 0.98 0.87 1.04 1.10 0.69 1.68 4Linear Array 0.90 0.83 0.95 0.30 0.24 0.37 0.96 0.88 1.02 1.00 0.77 1.28 10Cobas-4800 0.91 0.81 0.96 0.27 0.19 0.38 0.97 0.86 1.02 0.92 0.64 1.27 5P16 0.77 0.69 0.84 0.59 0.49 0.68 0.82 0.73 0.89 1.99 1.58 2.41 17P16/Ki67 0.87 0.82 0.91 0.60 0.50 0.69 0.93 0.87 0.97 2.02 1.62 2.45 10HPV Proofer(mRNA) 0.68 0.59 0.75 0.72 0.64 0.79 0.72 0.63 0.80 2.42 2.05 2.82 10APTIMA(mRNA) 0.90 0.86 0.94 0.38 0.31 0.45 0.96 0.91 1.00 1.28 1.01 1.60 13

96

Page 110: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

12 Journal Title XX(X)

Table 4. Posterior absolute and relative (vs. hybrid capture-2) sensitivity and specificity and their corresponding95% credible interval of tests detecting intra-epithelial neoplasia lesions of grade 3 or worse in women withlow-grade squamous intraepithelial lesions.

Absolute sensitivity Absolute specificity Relative sensitivity Relative specificity Numberof studiesTest Mean Lower Upper Mean Lower Upper Mean Lower Upper Mean Lower Upper

Hybrid Capture-2 0.96 0.94 0.98 0.26 0.22 0.3 1 1 1 1 1 1 31Conventional Cytology 0.65 0.28 0.9 0.51 0.21 0.79 0.68 0.30 0.94 2.00 0.83 3.13 1Liquid-Based Cytology 0.81 0.69 0.89 0.53 0.37 0.69 0.84 0.72 0.93 2.07 1.38 2.80 4Generic PCR assays 0.78 0.67 0.9 0.42 0.28 0.56 0.81 0.69 0.93 1.66 1.07 2.23 5Abbott RT PCR hrHPV 0.97 0.91 1.00 0.25 0.14 0.39 1.01 0.95 1.04 0.97 0.53 1.56 3Linear Array 0.98 0.93 1.00 0.24 0.15 0.36 1.01 0.96 1.05 0.94 0.58 1.43 5Cobas-4800 0.94 0.87 0.98 0.26 0.17 0.37 0.98 0.90 1.03 1.00 0.63 1.49 4P16 0.83 0.76 0.89 0.50 0.40 0.61 0.86 0.79 0.93 1.96 1.49 2.50 9P16/Ki67 0.93 0.85 0.98 0.44 0.33 0.55 0.97 0.88 1.02 1.72 1.24 2.31 4HPV Proofer(mRNA) 0.77 0.67 0.84 0.68 0.59 0.76 0.80 0.69 0.87 2.66 2.15 3.22 7APTIMA(mRNA) 0.96 0.91 0.98 0.35 0.28 0.43 0.99 0.94 1.03 1.37 1.03 1.78 11

Bivariate beta-binomial versus bivariate logistic-binomial distribution versusunivariate beta-binomial

With a lower watanabe akaike information criterion (WAIC) in three data sets (See Table 5),the bivariate beta-binomial distribution fitted the data better than the bivariate logistic-binomialdistribution.

In Figure 2, the posterior mean estimates from the multivariate models show shorter credibleinterval as compared to the estimates from the univariate beta-binomial, especially in cases wherethe number of studies evaluating a certain test is small. This is a result of borrowing informationbetween sensitivity and specificity and the contribution of indirect evidence.

The credible intervals for the bivariate beta-binomial are mostly slightly shorter than those fromthe bivariate logistic-binomial. This could be due to the mean–variance relationship present in thebeta distribution where information about the mean parameters also contributes to estimation of thedispersion parameters. Furthermore, the constrained parameter space of the dispersion parametersimplies that the credible intervals for the mean would be shortest close to 0/1 and widest close to0.5. In the normal distribution, the variance–covariance parameters are functionally independent,whose precision depends greatly on the number of studies relative to the number of outcomes, thenumber of variance–covariance parameters to be estimated, and the choice of the prior distribution.

Discussion

Though perhaps the most formally rigorous analytic method for modeling probabilities, it iscommon to assume that the monotonically transformed true study effects are normally distributed.Such an assumption is only correct

approximately and often made for the sake of convenience, lack of unambiguous counterpartof the multivariate normal distribution, and availability of software to fit the GLMMs. Due totheir latent nature, the validity of the normally distributed random effects for non-Gaussian data is

Prepared using sagej.cls

97

Page 111: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 13

HC2CC

LBCGeneric PCR

AbbotLinear Array

Cobasp16

p16/ki67HPV Proofer

APTIMA

HC2CC

LBCGeneric PCR

AbbotLinear Array

Cobasp16

p16/ki67HPV Proofer

APTIMA

CIN2+

CIN3+

0 0.5 1ASC−US

Sensitivity

ASC−US

Specificity

LSIL

Sensitivity

LSIL

Specificity

0 0.5 1 0 0.5 1 0 0.5 1

Posterior mean [95% equal−tailed credible interval]

Univariatebeta−binomial

Bivariatelogistic−binomial

Bivariatebeta−binomial

Figure 2. Posterior diagnostic estimates in triage of women with atypical squamous cells of unspecifiedsignificance (ASC-US) and low-grade squamous intraepithelial lesions (LSIL). Compared to the univariate model,multivariate models lead to gain in precision especially when the sample size is small. This is achieved byborrowing information from the other studies through the correlation structure.

difficult to check. Furthermore, the posterior density of the random effects in GLMM frameworkis in general not normal.31.

It is often the case that different link functions in the GLMM lead to different results andhence different conclusions.7 Furthermore, inference consistency also depends on the correctspecification of both the link function and the random effects distribution.32 This motivates theneed for alternative flexible distributions that better fit the properties of the latent probabilitiesand do not require the choice of any link function. It is often the case that the study numbers aresmall or that latent sensitivity and specificity values are close to 0/1. Furthermore, the variance andmean in proportions are inextricably linked. Therefore, the distribution of these latent probabilitiesis often asymmetric and skewed.

Prepared using sagej.cls

98

Page 112: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

14 Journal Title XX(X)

Table 5. Watanabe-Akaike information criterion for models fitted to the four data sets. Lower values of WAICimply higher predictive accuracy.

Random-effects distribution Dataset WAIC

Bivariate normal 1 2849314.8Bivariate beta 1 2696930.5Bivariate normal 2 9156341.9Bivariate beta 2 7122583.9Bivariate normal 3 3577491.8Bivariate beta 3 4045364.9Bivariate normal 4 24717187.4Bivariate beta 4 23295427.7

The beta distribution models the latent variables directly rather than indirectly via a monotonictransformation such as the logit link, and as such making the choice of the link functionunnecessary. The distribution is conjugate to that of the outcome, directly estimates both theconditional and the population-averaged estimates. In contrast, the normal distribution yields theconditional estimates for which the interpretation is transformation dependent. Inference aboutthe population mean is difficult and not straight-forward because additional complex integration isnecessary and one usually needs numerical averaging to obtain the estimates.

In contrast to using a univariate beta-binomial akin to separate meta-analysis, one coherentmeta-analysis leads to a gain in precision. Compared to the bivariate normal distribution, thebivariate beta yields shorter credible intervals for the posterior mean.

In the case study, we used the Frank’s copula due to its flexibility. We acknowledge that it ischallenging to select a candidate copula and though it is difficult to detect copula misspecification,it might in general lead to biased results, especially when using a less flexible copula. Copulamisspecification can be overcome by use of composite likelihood33;34 which is beyond the scopeof this article.

While the beta distribution is a natural distribution to model probabilities, it is nonethelessa parametric choice. Since it can be difficult to evaluate the distributional assumptions ofthe latent random effects, other approaches such semi-, non-parametric,35 or a finite mixtureof normal distribution can be considered to safeguard against any misspecification. However,these alternatives have also their share of short-comings. The non-parametric approach is lessefficient compared to a parametric assumption close to the real distribution,36 and therefore needslarger sample sizes to really show its strength. Use of finite mixtures of normal distribution iscomputationally expensive and the number of mixture components is assumed for identifiabilitypurposes.31 Especially for latent variables, these non- or semi-parametric approaches are expectedto lead to computational and identifiability problems.

HPV testing is widely accepted for triaging pap smear results categorized as ASC-US, but haslimited use in triaging LSIL.37 Among the different high-risk panels or full-range HPV genotyping

Prepared using sagej.cls

99

Page 113: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 15

tests, Arbyn et al.15 concluded that there were subtle differences in sensitivity and specificitybetween them and HC2 (the most frequently used HPV assay). Furthermore, genotyping forlimited number of HPV types was more specific but less sensitive as HC2.38 Recent studies havein turn concluded that p16/Ki67 was comparable in sensitivity but more specific than HC2 indetecting CIN2+ in both ASC-US and LSIL.37;39 From the analysis, p16/Ki67 was less sensitivebut more specific than HC2 in detecting CIN2+ and CIN3+ in ASC-US but as sensitive and morespecific than HC2 in detecting CIN2+ and CIN3+ in LSIL.

The disease outcome in the example data set was based on colposcopy combined with histologydespite its less than- perfect accuracy. When the reference procedure is not 100% accurate atdetermining the presence/absence of disease, then there might be imperfect gold standard bias.This is manifested in biased estimates of the index test accuracy and indices of association such asrelative risk and distorted p values.40

Approaches to minimize the bias due to the use of an imperfect gold standard include use ofmathematical models e.g. latent class models41 and studying sensitivity and specificity in differentpopulations, use of an expert review panel to arrive at a less error-prone diagnosis and framing theproblem in terms of clinical outcomes rather than just accuracy.42 A review of these approachesfor diagnostic accuracy studies is presented by Reitsma et al.43

Latent class models combine information from multiple imperfect diagnostic tests to uncoverthe unobserved disease structure44. The fact that the analysis proceeds without a formal clinicaldefinition of disease remains to be the main criticism of latent class analysis.45 Nonetheless,fitting the latent class model recognizes the fact that reference standards are often imperfecteven when they may be the “best available” diagnostic tool. However, other biases (selectionand spectrum of included patients, timing of testing, partial verification, incorporation bias, levelof blinding between tests and reference, attrition bias, etc.) that may be as influential in theestimation of the true accuracy estimates, should be considered. Therefore, isolated correctionfor imperfect verification, without adjustment for other types of bias and covariate information,does not necessarily provide a better estimate of the real true accuracy.

The main limitation to the developed model is lack of implementation in popular statisticalsoftware. However, with improving computing hardware and flexible software such as Stanbecoming more available, the implementation of the bivariate beta distribution is straightforward.Implementation and availability of the procedure as an R package within the Bayesian frameworkare foreseen. While we focused on aggregate data, model adaptation to include patient level dataas well as covariates follows quite automatically.

Conclusion

In this article, we proposed and developed models for network meta-analysis of diagnostic datathat directly models the correlated latent probabilities using a bivariate beta distribution. Whilethe model assumptions regarding the latent random variables cannot be directly assessed, werecommend the use of a beta distribution as it better fits the natural properties of the latentprobabilities and better serves the purpose of population-based inference. It is more sensible to

Prepared using sagej.cls

100

Page 114: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

16 Journal Title XX(X)

fit the correlated beta-distribution as it models for the extra-variation appropriately resulting toimproved parameter estimates and standard errors.

Acknowledgements

We thank Mrs Lan Xu for providing a demonstration data set extracted from published meta-analyses on the accuracy

of triage tests for CIN2+ in women with ASC-US as part of the supplementary material.

Author’s contribution

M. Arbyn designed the project; the OPSADAC project (Optimization of statistical procedures to assess the diagnostic

accuracy of cervical cancer screening tests). Victoria and M. Aerts conceptualized and initiated the study. Victoria wrote

the code, analyzed the data, and drafted manuscript. M. Arbyn and M. Aerts edited the manuscript. All authors reviewed

and approved the final manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of

this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of

this article: Nyaga V received financial support from the Scientific Institute of Public Health (Brussels) through the

OPSADAC project. Arbyn M was supported by the COHEAHR project funded by the 7th Framework Programme of

the European Commission (grant No 603019). Aerts M was supported by the IAP research network nr P7/06 of the

Belgian Government (Belgian Science Policy).

References

1. Bender R, Bunce C, Clarke M et al. Attention should be given to multiplicity issues in systematic reviews. J ClinEpidemiol 2008; 61: 857–65. DOI:10.1016/j.jclinepi.2008.03.004.

2. Siadaty MS, Philbrick JT, Heim SW et al. Repeated-measures modeling improved comparison of diagnostic tests inmeta-analysis of dependent studies. J Clin Epidemiol 2004; 57(7): 698–711. DOI:10.1016/j.jclinepi.2003.12.007.

3. Achana FA, Cooper NJ, Bujkiewicz S et al. Network meta-analysis of multiple outcome measures accountingfor borrowing of information across outcomes. BMC Med Res Methodol 2014; 14: 92. DOI:10.1186/1471-2288-14-92.

4. Dimou NL, Adam M and Bagos PG. A multivariate method for meta-analysis and comparison of diagnostic tests.Stat Med 2016; 35: 3509–3523. DOI:10.1002/sim.6919.

5. Reitsma JB, Glas AS, Rutjes AW et al. Bivariate analysis of sensitivity and specificity produces informativesummary measures in diagnostic reviews. J Clin Epidemiol 2005; 58(10): 982–990. DOI:10.1016/j.jclinepi.2005.02.022.

6. Menten J and Lesaffre E. A general framework for comparative Bayesian meta-analysis of diagnostic studies.BMC Med Res Methodol 2015; 15: 1–13. DOI:10.1186/s12874-015-0061-7.

7. Chu H, Nie L, Chen Y et al. Bivariate random effects models for meta-analysis of comparative studies with binaryoutcomes: Methods for the absolute risk difference and relative risk. Statistical Methods in Medical Research2010; 21(6): 621–633. DOI:10.1177/0962280210393712.

Prepared using sagej.cls

101

Page 115: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Nyaga et al. 17

8. Zhang J, Carlin BP, Neaton JD et al. Network meta-analysis of randomized clinical trials: Reporting the propersummaries. Clin Trials 2014; 11(2): 246–262. DOI:10.1177/1740774513498322.

9. Nyaga V, Aerts M and Arbyn M. ANOVA model for network meta-analysis of diagnostic test accuracy data. StatMethods Med Res 2016; 0: 1–19. DOI:10.1177/0962280216669182.

10. Higgins JPT, Thompson SG and Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. J R Stat SocSer A (Stat Soc) 2009; 172: 137–159. DOI:10.1111/j.1467-985X.2008.00552.x.

11. Chung Y, Rabe-Hesketh S and Choi IH. Avoiding zero between-study variance estimates in random-effects meta-analysis. Stat Med 2013; 32: 4071–4089. DOI:10.1002/sim.5821.

12. Aitkin M. Meta-analysis by random effect modelling in generalized linear models. Stat Med 1999; 18: 2343–2351.DOI:10.1002/(SICI)1097-0258(19990915/30)18:17/18〈2343::AID-SIM260〉3.0.CO;2-3.

13. Nikoloulopoulos AK. A mixed effect model for bivariate meta-analysis of diagnostic test accuracy studiesusing a copula representation of the random effects distribution. Stat Med 2015; 34(29): 3842–3865. DOI:10.1002/sim.6595.

14. Nyaga VN, Arbyn M and Aerts M. CopulaDTA : Copula Based Bivariate Beta-Binomial Models for DiagnosticTest Accuracy Studies in a Bayesian Framework. Journal of Statistical Software 2016; VV.

15. Arbyn M, Ronco G, Anttila A et al. Evidence regarding human papillomavirus testing in secondary prevention ofcervical cancer. Vaccine 2012; 30 Suppl 5: F88–99. DOI:10.1016/j.vaccine.2012.06.095.

16. Arbyn M, Roelens J, Simoens C et al. Human papillomavirus testing versus repeat cytology for triage ofminor cytological cervical lesions. The Cochrane database of systematic reviews 2013; 3: CD008054. DOI:10.1002/14651858.CD008054.pub2.

17. Arbyn M, Roelens J, Cuschieri K et al. The APTIMA HPV assay versus the hybrid capture 2 test in triage ofwomen with ASC-US or LSIL cervical cytology: A meta-analysis of the diagnostic accuracy. Int J Cancer 2013;132: 101–108. DOI:10.1002/ijc.27636.

18. Roelens J, Reuschenbach M, Von Knebel Doeberitz M et al. P16INK4a immunocytochemistry versus humanpapillomavirus testing for triage of women with minor cytologic abnormalities: A systematic review and meta-analysis. Cancer Cytopathol 2012; 120: 294–307. DOI:10.1002/cncy.21205.

19. Verdoodt F, Szarewski A, Halfon P et al. Triage of women with minor abnormal cervical cytology: Meta-analysis ofthe accuracy of an assay targeting messenger ribonucleic acid of 5 high-risk human papillomavirus types. CancerCytopathol 2013; 121: 675–687. DOI:10.1002/cncy.21325.

20. Sklar M. Fonctions de repartition a n dimensions et leurs marges. Publications de l’Institut de statistique del’Universite de Paris 1959; 8: 229–231.

21. Nelsen RB. Sklar’s theorem. In Bickel P, Diggle P, Fienberg S et al. (eds.) An introduction to copulas, 2 ed. NY,USA: Springer-Verlag New York. pp. 17–23.

22. Frank MJ. On the simultaneous associativity of F(x, y) and x + y - F(x, y). Aequationes Mathematicae 1979; 19:194–226. DOI:10.1007/BF02189866.

23. Rubin DB. Biometrika Trust Inference and Missing Data Inference and missing data. Biometrika 1976; 63: 581–592.

24. Jackman S. Hierachical statistical models. Chichester, UK: Wiley Series in Probability and statistics. pp. 301–362.25. Molenberghs G and Kenward M. The direct likelihood method. In Stephen S and Vic B (eds.) Missing data in

clinical studies, chapter 7. West Sussex, England: Wiley Series in Statistics in Practice, 2007. pp. 77–91.26. Little R and Rubin D. Mixed normal and non-normal data with missing values, ignoring the missing-data

mechanism. In Balding DJ, Bloomfield P, Cressie NAC et al. (eds.) Statistical analysis with missing data, 2ed. Hoboken, New Jersey: Wiley Series in Probability and statistics, 2002. pp. 292–309.

27. Beunckens C, Molenberghs G, Kenward M et al. Direct likelihood analysis versus simple forms of imputation formissing data in randomized clinical trials. Clin Trials 2005; 2: 379–386.

28. Carpenter B, Hoffman MD, Brubaker M et al. The Stan Math Library: Reverse-Mode Automatic Differentiationin C++. J Stat Softw In Press; 1509.07164.

Prepared using sagej.cls

102

Page 116: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

18 Journal Title XX(X)

29. R Core Team. R : A Language and Environment for Statistical Computing, 2016. URL http://www.

r-project.org/.30. Gelman A. Prior distribution. In El-Shaarawi AH and Piegorsch WW (eds.) Encyclopedia of Environmetrics, 2 ed.

West Sussex, UK: John Wiley & Sons, Ltd., 2002. pp. 1634–1637. DOI:10.1002/9780470057339.vap039.31. Molenberghs G and Verbeke G. The Generalized linear mixed model (GLMM). In Bickel P, Diggle P, Fienberg

S et al. (eds.) Models for discrete longitudinal data, 1 ed. NY, USA: Springer-Verlag New York, Springer Street,New York, USA, 2005. pp. 265–278.

32. Zeger SL, Liang KY and Albert PS. Models for Longitudinal Data: A Generalized Estimating Equation Approach.Biometrics 1988; 44: 1049–1060.

33. Chen Y, Liu Y, Ning J et al. A composite likelihood method for bivariate meta-analysis in diagnostic systematicreviews. Stat Methods Med Res 2014; 0: 1–17. DOI:10.1177/0962280214562146.

34. Chen Y, Hong C, Ning Y et al. Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomialmodel approach. Stat Med 2016; 35: 21–40. DOI:10.1002/sim.6620.

35. Zapf A, Hoyer A, Kramer K et al. Nonparametric meta-analysis for diagnostic accuracy studies. Stat Med 2015;34: 3831–3841. DOI:10.1002/sim.6583.

36. Agresti A, Caffo B and Ohman-Strickland P. Examples in which misspecification of a random effects distributionreduces efficiency, and possible remedies. Comput Stat Data Anal 2004; 47: 639–653. DOI:10.1016/j.csda.2003.12.009.

37. Bergeron C, Ikenberg H, Sideri M et al. Prospective Evaluation of p16 / Ki-67 Dual-Stained Cytology for ManagingWomen With Abnormal Papanicolaou Cytology : PALMS Study Results. Cancer Cytopathol 2015; 123: 373–381.DOI:10.1002/cncy.21542.

38. Marc A, Lan X, Freija V et al. Genotyping for human papillomavirus types 16 and 18 in women with minorcervical lesions: a systematic review and meta-analysis. Ann Intern ed Epub ahead of print 15 Novemer 2016; .

39. Schmidt D, Bergeron C, Denton KJ et al. p16 / Ki-67 Dual-Stain Cytology in the Triage of ASCUS and LSILPapanicolaou Cytology. Cancer Cytopathol 2011; 119: 158–166. DOI:10.1002/cncy.20140.

40. Walter SD and Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassifieddata: a review. J Clin Epidemiol 1988; 41: 923–937. DOI:10.1016/0895-4356(88)90110-2.

41. Walter SD, Irwig L and Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. J ClinEpidemiol 1999; 52: 943–951. DOI:10.1016/S0895-4356(99)00086-4.

42. Valenstein PN. Evaluating diagnostic tests with imperfect standards. Am J Clin Pathol 1990; 93: 252–258.43. Reitsma JB, Rutjes AWS, Khan KS et al. A review of solutions for diagnostic accuracy studies with an imperfect

or missing reference standard. J Clin Epidemiol 2009; 62: 797–806. DOI:10.1016/j.jclinepi.2009.02.005.44. van Smeden M, Oberski DL, Reitsma JB et al. Problems in detecting misfit of latent class models in diagnostic

research without a gold standard were shown. J Clin Epidemiol 2015; 74: 158–166. DOI:10.1016/j.jclinepi.2015.11.012.

45. Pepe MS and Janes H. Insights into latent class analysis of diagnostic test performance. Biostatistics 2007; 8:474–484. DOI:10.1093/biostatistics/kxl038.

Prepared using sagej.cls

103

Page 117: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Part III

Discussion

Page 118: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 119: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

6. Discussion

In this thesis, we presented statistical methods to perform meta-analysis and net-work meta-analysis of diagnostic accuracy studies. The statistical methods devel-oped herein contribute in assessing evidence on new screening and triage methodsfor cervical cancer prevention and consequently enable better decisions in manage-ment of women with minor cytological lesions.

In chapter 2, we presented frequentist methods to perform meta-analysis of pro-portions. This chapter addressed the following problems encountered in poolingproportions: studies with estimated proportion equal to zero or one which wouldotherwise be excluded from analysis and confidence intervals exceeding the 0 to 1range. We also modelled the within-study variances using the binomial distribu-tion by fitting the binomial-logistic model. It is often assumed that the within-studystandard-errors which are used in weighting the studies are known. In practice,these variances are usually unknown and best modelled using the binomial distri-bution [80].

One of the difference between the fixed- and the random-effects model is theweighting of studies. The fixed-effects model typically uses weights inverse to thevariance such that larger studies have greater weights. In some instances, assigningmore weights to studies with more statistical power is appropriate. However, for bi-nary outcomes, weighting-according-to-the-variance method may introduce biasesthereby distorting the combined effect or even lead to contradictory results whendifferent effect measures are used [81]. The random-effects model takes a more con-servative approach and assigns even weights to all studies.

The most widely used measure of diagnostic accuracy is an often negatively cor-related bivariate latent outcome consisting of sensitivity and specificity. In chapter 3,we developed a method to model the two latent outcomes jointly using a bivari-ate beta distribution within the Bayesian framework. In chapter 4 and 5 we ex-tended the conventional meta-analysis comparing two interventions into networkmeta-analysis. The models developed in these two chapters used all available dataincluding studies which evaluated only one diagnostic test. The modelling approachassumed that all studies hypothetically evaluated all diagnostic tests some of whichwere missing at random. Through exchangeability and MAR assumption, studiescontributed to the estimation of its their own study effect and the effects of otherstudies through the correlation structure.

It is almost a convention to use the multi (bi)-variate normal distribution to de-scribe joint distribution of a monotonic transform of sensitivity and specificity. Thisis the approach we took in chapter 4. Here, we decomposed the logit sensitivityand specificity into fixed effects for test, correlated study-effects and a random er-ror associated with each test in a given study. The normal distribution was used to

Page 120: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

model the resulting multivariate distribution. The popularity of the normal distri-bution is partly due to its familiarity, attractive features, lack of native multivariatedistribution analogous to non-normal univariate distributions and limitations in sta-tistical software. In spite of this, the choice of the transformation has impact on theinterpretation as well as the estimation of the model parameters. Furthermore, sincecorrelation is not invariant to general transformations e.g the logit, the modelled lin-ear correlation between the logit transformed sensitivity and specificity says nothingabout the correlation between sensitivity and specificity.

In chapter 5, we avoided the normality assumption and modelled the latent sen-sitivity and specificity without a transformation using a beta distribution in an over-dispersed model. The beta distribution is a more natural choice to model propor-tions and no transformation is required. As a consequent, the model parametersare easily interpreted in their natural scale. The distribution is flexible enough toallow for asymmetry often present in proportions. Moreover, it is a computationallyconvenient distribution since beta distribution is conjugate to the binomial distribu-tion and therefore no further integration is required to obtain the marginal estimates.Furthermore, both conditional and marginal estimates are obtained from the model.

In practice, the use of bivariate beta distribution to jointly model sensitivity andspecificity has been limited because ‘known’ densities either model positive correla-tion, or both positive and negative correlation but over a restricted range. Using thecopula theory, we were able to construct different bivariate beta distributions andmodel different dependence structures between sensitivity and specificity. The newbivariate beta densities were derived as a product of two beta marginal distribu-tions and a given copula density. An extra benefit of using a copula based bivariatedistribution is that the estimation procedure proceeds in two separate stages. Themarginal distributions of sensitivity and specificity are estimated separately in thefirst step followed by the dependence structure in a second step.

Despite the simplicity and flexibility offered by the copula approach, choosing acopula that adequately captures the dependence between sensitivity and specificitywithout jeopardizing the attractive features of the marginal distributions deservescareful consideration. There are neither guidelines on choosing the copula functionand nor diagnostics to check whether the selected copula converges to the under-lying dependence structure in the real data. As such, different choices of copulasmay yield different and possibly biased results. To circumvent this challenge, differ-ent copula densities can be fitted to the data in a sensitivity analysis and to explorethe robustness to the copulas. To avoid the need to specify the copula, a pseudo-likelihood function can be constructed for the overall effect sizes using a workingindependence assumption when data is complete or MCAR [82].

It is worth noting that since random-effects are non-measurable, the validity ofany assumptions made about them is difficult to check [83]. To safeguard againstany misspecification or relax the model assumptions, a non- or semi-parametric dis-tribution distribution can be used. However, using a non-parametric distributionsleads to loss in inefficiency when a parametric assumption would otherwise not bebadly violated [20].

The methods developed in this thesis for NMA assumed that all diagnostic testswere in principle evaluated in all studies some of which are missing by design andtherefore missing at random. The MAR assumption means that the probability that

107

Page 121: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

an observation is missing can depend on the observed data but not on the missingdata. Though pivotal, this assumption cannot be verified with the data. That said,the assumption is plausible whenever data is missing by design. In diagnostic stud-ies, older tests become less used and new tests progressively more available withtime making the MAR assumption plausible. By further assuming that the missing-data mechanism was ignorable, the hierarchical models developed herein providedvalid answers on the diagnostic accuracy of all tests using only the available data.This meant that we never needed to impute data on the missing tests in a givenstudy.

It is widely accepted that making comparisons based on studies that only evalu-ate one test may lead to bias when multiple studies evaluating test ‘A’ are performedin different population than those evaluating test ‘B’. Having said that, by relying onthe MAR assumption, our network meta-analysis models incorporated all availabledata including studies that evaluated only one test. While this is not the conven-tion, as long as the these studies are similar to studies assessing many diagnostictests, such studies contribute partially towards the estimation of the mean and thevariance-covariance parameters though their study effects estimates might be un-certain. In fact, omitting such studies may have profound impact if they have largersample sizes among the few studies that compare tests with small frequencies [84].This impact can be formally assessed by checking the validity of the homogeneityand consistency assumptions [39] as well as using methods for outlier detection[84].

The main concern in meta-analysis of observational studies is that observationalstudies lack the experimental element of random allocation of interventions and arelikely to be subject to unidentified sources of confounding and risk modification. Ithas therefore been argued that CB models preserve the force and validity of a ran-domized trial in meta-analysis. By modelling the average outcome in each arm theAB models are ‘said’ to ‘break randomization’ because the assumption of exchange-able absolute effects across studies cannot be guaranteed unless all trial arms canbe thought of as a sample from a single, reasonably homogeneous super-population[85].

All in all, whenever information is combined across separate studies, the strengthof the randomization procedure is nonetheless weakened [86]. Conceptually, the ABmodels assume that studies are independent (studies at random) while the CB mod-els assume that effects are random but the studies are fixed: a more stringent as-sumption [87]. Either of the assumptions can be argued to be unrealistic and sincethere exists no test to (in)validate either assumptions, it is only prudent to conductNMA with the least level of assumptions. In order to achieve accuracy and useful-ness of MA in assessing the strength of the available evidence it is crucial that studiesare similar enough to be grouped and sources of heterogeneity explored. Becausebias can be present in the original studies, careful consideration should be given inspecifying the eligibility criteria of studies included in NMA.

The OSPADAC project was designed for the meta-analysis of diagnostic accuracystudies. The methods developed here can be applied in other contexts. The moststraightforward application is the use of the bivariate beta distributions as priors forcorrelated binomial random variables. Other applications in everyday occurrencesinclude purchasing and consumption behaviour analysis of different but potentiallycorrelated products [88]. Bivariate correlated endpoints in the interval (0, 1) are also

108

Page 122: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

commonly encountered in clinical trials considering response and toxicity rates andepidemiological studies. Another application area is the estimation of correlatedeffects in small area estimation.

6.1 Further Development and Research

Methods for network meta-analysis of diagnostic accuracy data are in general math-ematically complex. On top of that, the not so easy to grasp concepts behind copulamodels adds to this complexity. The fact that these procedures are not yet part ofroutine procedures in statistical implies that a certain level of statistical and pro-gramming expertise is required to fit them.

It would be of great interest to the scientific community to have these proceduresimplemented and disseminated as add-on packages in a statistical software. The factthat Stan has been designed to operate efficiently with minimal user intervention im-plies that developing and dissemination of methods herein into software would freetime for researchers and data-analysts to develop models instead of worrying abouthow to fit them to the data. Dissemination of appropriate and optimal statisticalmethods as well as further development of more user-friendly is quintessential. Aninternational working group with such objectives would be very helpful in this re-spect.

While the MCMC technique implemented in Stan accelerates convergence to thestationary distribution, the large data sets in our network meta-analysis rendered itslow and some computations took several hours to finish. Other faster alternativewould accelerate establishing the use of network meta-analysis for diagnostic data.Variational inference has been shown to be typically a faster sampling technique thanexact MCMC methods that can scale massive datasets. Nonetheless, just like MCMC,their iterative nature still renders them slow compared to analytic approximationsmethods e.g INLA. INLA has been proposed as alternative to MCMC methods thatsaves computation time without substantial loss in accuracy. However, this benefitis only when the number of hyper-parameters is small and when applied to latentGaussian models [89, 90]. Simulation studies could be conducted to shed more lighton this issue.

The models developed in this thesis are only appropriate when the standard ref-erence test is perfect. When the reference procedure is not 100% accurate at deter-mining the presence/absence of disease, then there might be imperfect gold stan-dard bias. This bias can lead to overestimation or underestimation of the test indexdiagnostic accuracy and latent class models (LCM) [91, 92, 13, 93] have been devel-oped to mathematically correct for this bias. The bias is manifested in the estimatesof the index test accuracy and indices of association such as relative risk and dis-torted p-values [91, 92, 13]. If the reference and the index test are independent i.emake independent errors, as is in most situations [3], then sensitivity and specificityof the index test will be underestimated. Exceptionally, the index and reference testmaybe correlated leading to overestimation of the test accuracy [94, 74]. Other ap-proaches to minimize the bias due to the use of an imperfect gold standard includestudying sensitivity and specificity in different populations, use of an expert reviewpanel to arrive at a less error-prone diagnosis and framing the problem in terms ofclinical outcomes rather than just accuracy [3].

109

Page 123: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Our research opens-up possibilities on extending the NMA models to cover caseswhere the standard reference is imperfect. As a matter of fact, NMA provide an idealenvironment for fitting LCM to assess the diagnostic test accuracy of multiple testswhen the standard reference is imperfect. The LCM would combine informationfrom multiple imperfect diagnostic tests to uncover the unobserved disease struc-ture and obtain valid estimates without a perfect definition of disease status clas-sification. That said, LCMs are not always identifiable due to the large number ofparameters involved and the label- switching problem where the latent componentare indistinguishable. To enable estimation of parameters, a constrain is necessary.The constrain translates into the assumption that the classification errors in the ref-erence and the index test are either independent or dependent condition on the truedisease status and after addressing the label-switching problem [95, 96].

Throughout this thesis, we used aggregated data and took study as the unit ofanalysis. In the emerging era of personalized medicine, meta-analysis results re-lating to the ‘average’ patient are becoming insufficient. The models herein can beextended to include IPD. Two approaches for IPD data have been developed: theone- and two-stage approach for performing meta-analysis of IPD. The one-stageapproach synthesizes the IPD from all studies in a hierarchical model that accountsfor all sources of variability within and between studies [97, 23]. The two-stage ap-proach is the most common approach. First, the relative-effects and their variancesin each study are summarized separately in the first stage. They are then combinedto produce an overall summary estimates in the second stage [16, 98]. Though con-ceptually more complicated, the one-stage approach is recommended for a num-ber of reasons. The approach uses the exact binomial distribution and therefore noad-hoc continuity correction is required, the asymptotic normality assumptions isavoided, and both subject- and study-level sources of heterogeneity are appropri-ately explored [99]. Though considered gold-standard for meta-analysis, IPD arerarely available and methods combining IPD and aggregate data using all availableevidence to obtain more reliable estimates with better statistical properties have beenproposed[97, 100].

110

Page 124: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 125: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

6.2 Summary

Worldwide, cervical cancer is the fourth most common cancer in females affectingmore than half a million women worldwide, half of whom die. Eighty five percentof cervical cancer cases occur in low and middle-income countries. It is the lead-ing cause of death in women with cancer in sub-Saharan Africa. Cervical cancer iscaused by high-risk types of the human papillomavirus (HPV).

A Pap smear is a screening test used to detect cervical precancer. Out of every 100screened women, 90 to 98 will have a negative result i.e. normal cervical cells. Theyare recommended to wait for three to five years before the next screening. Approx-imately 3 to 20 per thousand women screened would have high-grade abnormalcells meaning that they would be at high risk of developing cervical cancer. Theywould be referred immediately for further investigation and treatment. Two to tenpercent of women will have inconclusive or low-grade Pap smear results. Cervicalprecancerous lesions, in particular those with low-grade or undetermined atypicalfindings, clear without treatment. Hence, it is very important to identify women atan increased risk of developing cervical (pre-) cancer and need further testing andtreatment. This avoids anxiety and discomfort related to diagnostic and/or thera-peutic work-up. Furthermore, it would reduce the financial burden on the womenand the health-care system as a whole.

The accuracy of different tests triaging women with minor Pap smear results(equivocal or low-grade) has been assessed in many diagnostic studies. These testsdetect DNA or RNA of high-risk HPV types. Other triage tests used for triage ofminor lesions include repeat Pap smear and protein markers indicative of a trans-forming HPV infection. The results from all the studies can be synthesized in a meta-analysis. This is done in a systematic review where evidence fitting a pre-specifiedeligibility criterion is gathered to answer a specific research question. Meta-analysesoffer a comprehensive method to gather information for clinical decision making.By conducting a meta-analysis, precision of the accuracy estimates can be improvedcompared to separate studies. Moreover, controversies arising from apparently con-flicting studies can be explained.

In this thesis, statistical methods to perform meta-analysis of diagnostic accuracystudies are developed. The methods are specific to binomial and proportions datawhich is typical of diagnostic studies. The thesis addresses the breakdown of normalapproximation procedures encountered when the estimated proportions are equal tozero or one, or when there are few number of studies.

A conventional diagnostic accuracy meta-analysis compares two tests only. Thisthesis presents also a comprehensive and unified inference framework called net-work meta-analyses that utilizes and synthesizes available data on all different di-agnostic tests for the same disease simultaneously. Network meta-analyses improvethe estimation process by sharing information between studies yielding more pre-cise estimates, especially for diagnostic tests evaluated in a small number of studies.Moreover, it ensures more efficient use of data and decreases the chance of find-ing spurious significant effects. Furthermore, the procedure yields all comparisonsbetween any two tests. Such comparisons are of greater relevance to different stake-holder (clinicians, policy makers, epidemiologists) in making decisions on whichtests to use in practice.

112

Page 126: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug
Page 127: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

6.3 Samenvatting

Baarmoederhalskanker is de vierde meest voorkomende vrouwelijke kanker in dewereld die jaarlijks meer dan een half miljoen vrouwen treft, waaraan ongeveerde helft overlijdt. Vijfentachtig percent van de gevallen van baarmoederhalskankerwordt gediagnosticeerd in landen met een middel of laag inkomen. In Sub-Sahara-Afrika is het de meest frequente oorzaak van sterfte bij vrouwen met kanker. Baar-moederhalskanker wordt veroorzaakt door bepaalde types van het humaan papillo-mavirus.

Een uitstrijkje van de baarmoederhals wordt gebruikt als een test om voorloperstadia van baarmoederhalskanker op te sporen. 90-98% van de screeningsuitstrijkjeszijn normaal en in dit geval is het voldoende om een volgend uitstrijkje af te nemendrie tot vijf jaar later. Bij ongeveer 3 a 20 per duizend gescreende vrouwen vindt meneen hooggradig letsel, waar een verhoogd risico bestaat om baarmoederhalskankerte ontwikkelen. Deze vrouwen moeten verwezen worden voor verder onderzoek enbehandeling. Twee tot tien percent van de gescreende vrouwen hebben een laag-gradig of onduidelijk uitstrijkje. Voorloperstadia van baarmoederhalskanker kun-nen spontaan verdwijnen zonder behandeling, voornamelijk wanneer het uitstrijkjeonduidelijk of laaggradig was. Vandaar het belang van een triage test die onder-scheid maakt tussen vrouwen met een verhoogd of een laag risico om baarmoeder-halskanker of voorloperstadia hiervan te ontwikkelen. Met een dergelijke triage testkan men angst en onbehagen vermijden die gepaard gaan met de doorverwijzingvan een vrouw voor verder diagnostisch onderzoek en/of behandeling. Bovendienleidt het gebruik van een accurate triage test ook tot een vermindering van onnodigekosten voor de gezondheidszorg.

De accuraatheid van verschillende testen die kunnen gebruikt worden bij triagevan vrouwen met geringe afwijkingen van het uitstrijkje is geevalueerd in vele gepub-liceerde diagnostische studies. Deze triage testen detecteren DNA of RNA van hoog-risico HPV types. Herhaling van het uitstrijkje of eiwit merkers die wijzen op de aan-wezigheid of ontwikkeling van een precursor van baarmoederhalskanker kunnenaangewend worden voor triage. De resultaten van dergelijke diagnostische stud-ies kunnen worden samengevat in een meta-analyse. Afzonderlijke studies die aanbepaalde selectiecriteria voldoen kunnen opgenomen worden in een meta-analyse.Met een meta-analyse kunnen alle in de literatuur beschikbare informatie wordenopgenomen om een klinische vraag te beantwoorden. Met een meta-analyse kun-nen meer precieze schattingen worden gemaakt van de accuraatheid van een testdan met afzonderlijke studies en kunnen verschillen tussen individuele studies wor-den verklaard. In deze thesis werden statistische methodes ontwikkeld voor meta-analyse van diagnostische testen. Deze methodes zijn geschikt voor de analysevan binomiale data of proporties (uitkomst is positief of negatief). De ontwikkeldemethodes bieden een oplossing voor het falen van de normale benadering bij hetvoorkomen van 0% of 100% gevoeligheid of specificiteit, of waarden die dicht bijdeze extreme waarden liggen, en bij de beschikbaarheid van slechts een klein aantalstudies.

Een klassieke meta-analyse vergelijkt twee verschillende testen. Deze thesis biedteen omvattend kader voor de gelijktijdige meta-analyse (netwerk meta-analyse) vanmultipele testen om een bepaalde ziekte op te sporen. Network meta-analyses ver-

114

Page 128: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

beteren de schatting van de accuraatheid van een veelheid van verschillende testenook wanneer er slechts weinig studies zijn die een bepaalde test evalueren. Netwerkmeta-analyses verzekeren een meer efficient gebruik van de data en reduceren dekans op valse effecten, en vergelijken alle mogelijke testen met mekaar. De resul-taten geven een totaalbeeld en zijn daarom relevant voor beleidsvoerders, clinici enepidemiologen die beslissingen dienen de nemen voor het gebruik van testen.

115

Page 129: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

Bibliography

[1] Hoekzema E, Barba-Muller E, Pozzobon C, et al. Pregnancy leads to long-lasting changes in human brain structure. Nature Neuroscience, 20(2):287–296,2017.

[2] Lewandowski D, Kurowicka D, and Joe H. Generating random correlationmatrices based on vines and extended onion method. Journal of MultivariateAnalysis, 100(9):1989–2001, 2009.

[3] Valenstein PN. Evaluating diagnostic tests with imperfect standards. AmericanJournal of Clinical Pathology, 93(2):252–258, 1990.

[4] Eusebi P. Diagnostic accuracy measures. Cerebrovascular Diseases, 36(4):267–272, 2013.

[5] Reitsma JB, Glas AS, Rutjes AWS, et al. Bivariate analysis of sensitivity andspecificity produces informative summary measures in diagnostic reviews.Journal of Clinical Epidemiology, 58(10):982–990, 2005.

[6] Greco T, Landoni G, Biondi-Zoccai G, D’Ascenzo F, and Zangrillo A. ABayesian network meta-analysis for binary outcome: how to do it. StatisticalMethods in Medical Research, page 0962280213500185, 2013.

[7] Congdon P. Hierarachical priors for pooling strength and in general linearmodel regression. In Balding DJ, Bloomfield P, Cressie NAC, et al., editors,Bayesian Statistical Modelling, chapter 5, page 152. John Wiley & Sons, Ltd, WestSussex, England, second edition, 2006.

[8] Law M, Jackson D, Turner R, Rhodes K, and Viechtbauer W. Two new meth-ods to fit models for network meta-analysis with random inconsistency effects.BMC Medical Research Methodology, 16(1):87, 2016.

[9] Lee KJ and Thompson SG. The use of random effects models to allow forclustering in individually randomized trials. Clinical Trials, 2(2):163–173, 2005.

[10] Chung Y, Rabe-Hesketh S, and Choi IH. Avoiding zero between-studyvariance estimates in random-effects meta-analysis. Statistics in Medicine,32(23):4071–4089, 2013.

[11] Lane PW. Meta-analysis of incidence of rare events. Statistical Methods in Med-ical Research, 22(2):117–32, 2013.

[12] Shuster JJ and Walker MA. Low-event-rate meta-analyses of clinical trials:Implementing good practices. Statistics in Medicine, 35(14), 2016.

Page 130: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[13] Eusebi P, Reitsma JB, and Vermunt JK. Latent Class Bivariate Model for theMeta-Analysis of Diagnostic Test Accuracy Studies. BMC Medical ResearchMethodology, 14:88, 2014.

[14] Abrams K and Sanso B. Approximate Bayesian inference for random effectsmeta-analysis. Statistics in Medicine, 17(2):201–218, jan 1998.

[15] Jackson D, Riley R, and White IR. Multivariate meta-analysis: Potential andpromise. Statistics in Medicine, 30(20):2481–2498, 2011.

[16] Riley RD, Price MJ, Jackson D, et al. Multivariate meta-analysis using individ-ual participant data. Research Synthesis Methods, 6(2):157–174, 2015.

[17] Riley RD, Abrams KR, Sutton AJ, Lambert PC, and Thompson JR. Bivariaterandom-effects meta-analysis and the estimation of between-study correlation.BMC Medical Research Methodology, 7:3, 2007.

[18] Neuhaus JM, Hauck WW, and Kalbfleisch JD. The Effects of Mixture Distribu-tion Misspecification when Fitting Mixed-Effects Logistic Models. Biometrika,79(4):755–762, 1992.

[19] Grilli L and Rampichini C. Specification of random effects in multilevel mod-els: a review. Quality & Quantity, 49(3):967–976, 2015.

[20] Agresti A, Caffo B, and Ohman-Strickland P. Examples in which misspecifica-tion of a random effects distribution reduces efficiency, and possible remedies.Computational Statistics & Data Analysis, 47(3):639–653, 2004.

[21] Alonso A, Litiere S, and Laenen A. A Note on the Indeterminacy of theRandom-Effects Distribution in Hierarchical Models. The American Statistician,64(4):318–324, 2010.

[22] Jackson D. Confidence intervals for the between-study variance in randomeffects meta-analysis using generalised Cochran heterogeneity statistics. Re-search Synthesis Methods, 4(3):220–229, 2013.

[23] Simmonds MC and Higgins JPT. A general framework for the use of logis-tic regression models in meta-analysis. Statistical Methods in Medical Research,25(6):2858–2877, 2016.

[24] Baker R and Jackson D. New models for describing outliers in meta-analysis.Research Synthesis Methods, 7(3):314–328, 2016.

[25] Kruschke JK. When there are few 1’s or 0’s in the data. In Doing Bayesiananalysis: a tutorial with R, JAGS and Stan, chapter 20, page 454. AcademicPress/Elsevier Inc, Burlington MA, 1 edition, 2011.

[26] Higgins JPT and Whitehead A. Borrowing strength from external trials in ameta-analysis. Statistics in Medicine, 15(24):2733–2749, dec 1996.

[27] Paolino P. Maximum Likelihood estimation of models with beta-distributeddependent variables. Political Analysis, 9(4):325–346, 2001.

117

Page 131: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[28] Ferrari A and Comelli M. A comparison of methods for the analysis of bino-mial proportion data in behavioral research. Journal of Neuroscience Methods,247:131–140, 2016.

[29] Sarabia JM and Gomez-Deniz E. Construction of multivariate distributions:A review of some recent results. Statistics and Operations Research Transactions,32(1):3–35, 2008.

[30] Sklar M. Fonctions de repartition a n dimensions et leurs marges. Publicationsde l’Institut de statistique de l’Universite de Paris, 8:229–231, 1959.

[31] Roger N, Jose Juan QM, Rodrıguez-Lallena JA, and Manuel UF. Distribu-tion functions of copulas: A class of bivariate probability integral transforms.Statistics and Probability Letters, 54(3):277–282, 2001.

[32] Riley RD, Abrams KR, Sutton AJ, Lambert PC, and Thompson JR. Bivariaterandom-effects meta-analysis and the estimation of between-study correlation.BMC Medical Research Methodology, 7:3, 2007.

[33] Jackson D, White IR, and Riley RD. A matrix-based method of momentsfor fitting the multivariate random effects model for meta-analysis and meta-regression. Biometrical Journal, 55(2):231–245, 2013.

[34] Chen H, Manning AK, and Dupuis J. A Method of Moments Estimator forRandom Effect Multivariate Meta-Analysis. Biometrics, 68(4):1278–1284, 2012.

[35] Debray TPA, Moons KGM, vanValkenhoef G, et al. Get real in individual par-ticipant data (IPD) meta-analysis: A review of the methodology. Research Syn-thesis Methods, 6(4), 2015.

[36] Thompson SG and Higgins JPT. How should meta-regression analyses be un-dertaken and interpreted? Statistics in Medicine, 21(11):1559–1573, 2002.

[37] Kanters S, Ford N, Druyts E, et al. Use of network meta-analysis in clinicalguidelines. Bulletin of the World Health Organisation, 94(10):782–784, 2016.

[38] Salanti G, Giovane CD, Chaimani A, Caldwell DM, and Higgins JPT. Eval-uating the quality of evidence from a network meta-analysis. PLoS ONE,9(7):e99682, 2014.

[39] Donegan S, Williamson P, D’Alessandro U, and Tudur Smith C. Assessingkey assumptions of network meta-analysis: a review of methods. ResearchSynthesis Methods, 4:291–323, 2013.

[40] White IR, Barrett JK, Jackson D, and Higgins JPT. Consistency and inconsis-tency in network meta-analysis: model estimation using multivariate meta-regression. Research Synthesis Methods, 3(2):111–25, 2012.

[41] Dias S, Welton NJ, Sutton AJ, et al. Evidence synthesis for decision making 4:inconsistency in networks of evidence based on randomized controlled trials.Medical Decision Making, 33(5), 2013.

118

Page 132: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[42] Jackson D, Barrett JK, Rice S, White IR, and Higgins JPT. A design-by-treatment interaction model for network meta-analysis with random incon-sistency effects. Statistics in Medicine, 33(21):3639–3654, 2014.

[43] Jackson D, Boddington P, and White IR. The design-by-treatment interactionmodel: a unifying framework for modelling loop inconsistency in networkmeta-analysis. Research Synthesis Methods, 7(3):329–332, 2015.

[44] vanValkenhoef G, Dias S, Ades AE, et al. Automated generation of node-splitting models for assessment of inconsistency in network meta-analysis. Re-search Synthesis Methods, 7(1):80–93, 2016.

[45] Higgins JPT and Welton NJ. Network meta-analysis: A norm for comparativeeffectiveness? The Lancet, 386(9994):628–630, 2015.

[46] Efthimiou O, Mavridis D, Riley RD, Cipriani A, and Salanti G. Joint synthe-sis of multiple correlated outcomes in networks of interventions. Biostatistics,16(1):84–97, 2015.

[47] Menten J and Lesaffre E. A general framework for comparative Bayesian meta-analysis of diagnostic studies. BMC Medical Research Methodology, 15:70, 2015.

[48] Lu G and Ades AE. Combination of direct and indirect evidence in mixedtreatment comparisons. Statistics in Medicine, 23(20):3105–3124, 2004.

[49] Hong H, Chu H, Zhang J, and Carlin BP. A Bayesian missing data frameworkfor generalized multiple outcome mixed treatment comparisons. Research Syn-thesis Methods, 7(1):6–22, 2016.

[50] Piepho HP, Williams ER, and Madden LV. The Use of Two-Way Linear MixedModels in Multitreatment Meta-Analysis. Biometrics, 68(4):1269–1277, 2012.

[51] J. Z, B.P. C, J.D. N, et al. Network meta-analysis of randomized clinical trials:Reporting the proper summaries. Clinical Trials, 11(2):246–262, 2014.

[52] Jansen JP, Crawford B, Bergman G, and Stam W. Bayesian meta-analysis ofmultiple treatment comparisons: An introduction to mixed treatment compar-isons. Value in Health, 11(5):956–964, 2008.

[53] Dias S and Ades AE. Absolute or relative effects? Arm-based synthesis of trialdata. Research Synthesis Methods, 7(1):23–28, 2015.

[54] Piepho HP. Network-meta analysis made easy: detection of inconsistency us-ing factorial analysis-of-variance models. BMC Medical Research Methodology,14:61, 2014.

[55] Breslow NE and Clayton DG. Approximate Inference in Generalized LinearMixed Models. Journal of the American Statistical Society, 88(421):9–25, 1993.

[56] Gasparrini A, Armstrong B, and Kenward MG. Multivariate meta-analysisfor non-linear and other multi-parameter associations. Statistics in Medicine,31(29):3821–3839, 2012.

119

Page 133: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[57] Doebler P. mada: Meta-Analysis of Diagnostic Accuracy, R package version 0.5.7,2015. https://cran.r-project.org/package=mada.

[58] R Core Team. R: A Language and Environment for Statistical Computing, RFoundation for Statistical Computing, Vienna, Austria. 2016. https://www.r-project.org/.

[59] Takwoingi Y and Deeks J. MetaDAS: A SAS macro for meta-analysis of diag-nostic accuracy studies User guide, 2010. http://srdta.cochrane.org/.

[60] Hong H, Carlin BP, Shamliyan TA, et al. Comparing Bayesian and frequen-tist approaches for multiple outcome mixed treatment comparisons. MedicalDecision Making, 33(5):702–714, 2013.

[61] Verde PE. Meta-analysis of diagnostic test data: A bivariate Bayesian modelingapproach. Statistics in Medicine, 29(30):3088–3102, 2010.

[62] Lunn D, Thomas A, Best N, and Spiegelhalter D. WinBUGS — a Bayesian mod-elling framework: concepts, structure, and extensibility. Statistics and Comput-ing, 10:325–337, 2000.

[63] Lunn D, Spiegelhalter D, Thomas A, and Best N. The BUGS project: Evo-lution, critique and future directions (with discussion). Statistics in Medicine,28:3049—-3082, 2009.

[64] Plummer M. JAGS: A program for analysis of Bayesian graphical models us-ing Gibbs sampling. Proceedings of the 3rd international workshop on distributedstatistical computing, 124:125, 2003.

[65] Statisticat LLC. Bayesian Inference, 2016.

[66] Carpenter B, Gelman A, Hoffman MD, et al. Stan: A probabilistic program-ming language. Journal of Statistical Software, 76(1), 2017.

[67] Stan Development Team. RStan: the R interface to Stan, Version 2.10.1, 2016.

[68] Hoffman M and Gelman A. The No-U-Turn Sampler: Adaptively Setting PathLengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research,15:30, 2014.

[69] Gelman A, Lee D, and Guo J. Stan: A probabilistic programming languagefor Bayesian inference and optimization. Journal of Educational and BehavioralStatistics, 40(5):530–543, 2015.

[70] Tsu V and Jeronimo J. Saving the world’s Women from Cervical Cancer. TheNew England Journal of Medicine, 374(26):2509–2511, 2016.

[71] Newton CL and Mould TA. Invasive cervical cancer. Obstetrics, Gynaecology &Reproductive Medicine, 27(1):7–13, 2017.

[72] Arbyn M, Roelens J, Simoens C, et al. Human papillomavirus testing versusrepeat cytology for triage of minor cytological cervical lesions. The CochraneDatabase of Systematic Reviews, (3):CD008054, mar 2013.

120

Page 134: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[73] Arbyn M, Buntinx F, Van Ranst M, et al. Virologic versus cytologic triage ofwomen with equivocal Pap smears: a meta-analysis of the accuracy to de-tect high-grade intraepithelial neoplasia. Journal of the National Cancer Institute,96(4):280–93, feb 2004.

[74] Arbyn M, Ronco G, Cuzick J, Wentzensen N, and Castle PE. How to evaluateemerging technologies in cervical cancer screening? International Journal ofCancer, 125(11):2489–2496, 2009.

[75] Arbyn M, Sasieni P, Meijer CJ, et al. Chapter 9: Clinical applications of HPVtesting: A summary of meta-analyses. Vaccine, 24:S78–S89, 2006.

[76] Arbyn M, Ronco G, Anttila A, et al. Evidence regarding human papillomavirustesting in secondary prevention of cervical cancer. Vaccine, 30 Suppl 5:F88–99,2012.

[77] Lees BF, Erickson BK, and Huh WK. Cervical cancer screening: evidence be-hind the guidelines. American Journal of Obstetrics and Gynecology, 214(4):438–443, apr 2016.

[78] Roelens J, Reuschenbach M, Von Knebel Doeberitz M, et al. P16INK4a im-munocytochemistry versus human papillomavirus testing for triage of womenwith minor cytologic abnormalities: A systematic review and meta-analysis.Cancer Cytopathology, 120(5):294–307, 2012.

[79] Verdoodt F, Szarewski A, Halfon P, Cuschieri K, and Arbyn M. Triage ofwomen with minor abnormal cervical cytology: Meta-analysis of the accu-racy of an assay targeting messenger ribonucleic acid of 5 high-risk humanpapillomavirus types. Cancer Cytopathology, 121(12):675–687, 2013.

[80] Hamza TH, Arends LR, vanHouwelingen HC, and Stijnen T. Multivariate ran-dom effects meta-analysis of diagnostic tests with multiple thresholds. BMCMedical Research Methodology, 9:73, 2009.

[81] Tang JL. Weighting bias in meta-analysis of binary outcomes. Journal of ClinicalEpidemiology, 53(11):1130–1136, 2000.

[82] Chen Y, Hong C, and Riley RD. An alternative pseudolikelihood method formultivariate random-effects meta-analysis. Statistics in Medicine, 34(3):361–380, 2015.

[83] Litiere S, Alonso A, and Molenberghs G. The impact of a misspecified random-effects distribution on the estimation and the performance of inferential proce-dures in generalized linear mixed models. Statistics in Medicine, 27(16):3125–3144, 2007.

[84] Zhang J, Yuan Y, and Chu H. The Impact of Excluding Trials from NetworkMeta-Analyses - An Empirical Study. PLOS ONE, 11(12):e0165889, 2016.

[85] Hong H, Chu H, Zhang J, and Carlin BP. Rejoinder to the discussion of “aBayesian missing data framework for generalized multiple outcome mixedtreatment comparisons,” by S. Dias and A.E. Ades. Research Synthesis Meth-ods, 7(1):6–22, 2016.

121

Page 135: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[86] Glenny AM, Altman DG, Song F, et al. Indirect comparisons of competinginterventions. Health Technology Assessment, 9(26):65, 2005.

[87] Shuster JJ, Guo JD, and Skyler JS. Meta-analysis of safety for low event-ratebinomial trials. Research Synthesis Methods, 3(1):30–50, 2012.

[88] Danaher PJ and Harde BGS. Bacon with Your Eggs? Applications of a NewBivariate Beta-Binomial Distribution. The American Statistician, 59(4):282–286,2005.

[89] Rue H, Martino S, and Chopin N. Approximate Bayesian inference for latentGaussian models by using integrated nested Laplace approximations. Jour-nal of the Royal Statistical Society. Series B: Statistical Methodology, 71(2):319–392,2009.

[90] Sauter R and Held L. Network meta-analysis with integrated nested Laplaceapproximations. Biometrical Journal, 57(6):1038–1050, nov 2015.

[91] Walter SD and Irwig LM. Estimation of test error rates, disease prevalence andrelative risk from misclassified data: a review. Journal of Clinical Epidemiology,41(9):923–937, 1988.

[92] Walter SD, Irwig L, and Glasziou PP. Meta-analysis of diagnostic tests withimperfect reference standards. Journal of Clinical Epidemiology, 52(10):943–951,1999.

[93] vanSmeden M, Naaktgeboren Ca, Reitsma JB, Moons KGM, and deGroot JaH.Latent Class Models in Diagnostic Studies When There is No ReferenceStandard–A Systematic Review. American Journal of Epidemiology, 179(4):423–431, 2014.

[94] Zhou XH, Obuchowski NA, and McClish DK. Statistical methods in diagnosticmedicine. John Wiley & Sons, Hoboken, New Jersey, second edition, 2011.

[95] Chung H, Loken E, and Schafer JL. Difficulties in Drawing Inferences WithFinite-Mixture Models. The American Statistician, 58(2):152–158, 2004.

[96] Jasra A, Holmes CC, and Stephens DA. Markov Chain Monte Carlo Methodsand the Label Switching Problem in Bayesian Mixture Modeling. StatisticalScience, 20(1):50–67, 2005.

[97] Riley RD, Thompson JR, and Abrams KR. An alternative model for bivari-ate random-effects meta-analysis when the within-study correlations are un-known. Biostatistics, 9(1):172–186, 2008.

[98] Idris NRN and Misran NA. A Modified Two-Stage Method for Combiningthe Aggregate-Data and Individual-Patient-Data in Meta-Analysis. Journal ofApplied Sciences, 15(10):1231–1238, 2015.

[99] Debray TPA, Moons KGM, Abo-Zaid GMA, Koffijberg H, and Da Riley R. In-dividual Participant Data Meta-Analysis for a Binary Outcome: One-Stage orTwo-Stage? PLOS ONE, 8(4), 2013.

122

Page 136: DOCTORAL DISSERTATION Optimisation of statistical procedures … · vical cancer screening in the general population. Cochrane Database of Systematic Re-views, 2017. Prebublised Aug

[100] Donegan S, Williamson P, D’Alessandro U, Garner P, and Smith CT. Combin-ing individual patient data and aggregate data in mixed treatment comparisonmeta-analysis: Individual patient data may be beneficial if only for a subset oftrials. Statistics in Medicine, 32(6):914–930, 2013.

123