an observational study of deep learning and automated ...the system architecture is summarized in...

10
ARTICLE An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening Liming Hu, David Bell, Sameer Antani, Zhiyun Xue, Kai Yu, Matthew P. Horning, Noni Gachuhi, Benjamin Wilson, Mayoore S. Jaiswal, Brian Befano, L. Rodney Long, Rolando Herrero, Mark H. Einstein, Robert D. Burk, Maria Demarco, Julia C. Gage, Ana Cecilia Rodriguez, Nicolas Wentzensen, Mark Schiffman See the Notes section for the full list of authors’ affiliations. Correspondence to: Mark Schiffman, MD, MPH, National Cancer Institute, Room 6E544, 9609 Medical Center Drive, Rockville, MD 20850 (e-mail: [email protected]). Abstract Background: Human papillomavirus vaccination and cervical screening are lacking in most lower resource settings, where approximately 80% of more than 500 000 cancer cases occur annually. Visual inspection of the cervix following acetic acid application is practical but not reproducible or accurate. The objective of this study was to develop a “deep learning”-based visual evaluation algorithm that automatically recognizes cervical precancer/cancer. Methods: A population-based longitudinal cohort of 9406 women ages 18–94 years in Guanacaste, Costa Rica was followed for 7 years (1993–2000), incorporating multiple cervical screening methods and histopathologic confirmation of precancers. Tumor registry linkage identified cancers up to 18 years. Archived, digitized cervical images from screening, taken with a fixed-focus camera (“cervicography”), were used for training/validation of the deep learning-based algorithm. The resultant image prediction score (0–1) could be categorized to balance sensitivity and specificity for detection of precancer/cancer. All statistical tests were two-sided. Results: Automated visual evaluation of enrollment cervigrams identified cumulative precancer/cancer cases with greater accuracy (area under the curve [AUC] ¼ 0.91, 95% confidence interval [CI] ¼ 0.89 to 0.93) than original cervigram interpretation (AUC ¼ 0.69, 95% CI ¼ 0.63 to 0.74; P < .001) or conventional cytology (AUC ¼ 0.71, 95% CI ¼ 0.65 to 0.77; P < .001). A single visual screening round restricted to women at the prime screening ages of 25–49 years could identify 127 (55.7%) of 228 precancers (cervical intraepithelial neoplasia 2/cervical intraepithelial neoplasia 3/adenocarcinoma in situ [AIS]) diagnosed cumulatively in the entire adult population (ages 18–94 years) while referring 11.0% for management. Conclusions: The results support consideration of automated visual evaluation of cervical images from contemporary digital cameras. If achieved, this might permit dissemination of effective point-of-care cervical screening. Cervical cancer remains a leading cause of cancer mortality and morbidity worldwide (1). Approximately 80% of the half-million cases and 90% of the quarter-million deaths per year occur in low- and middle-income countries, where prevention programs are limited. In some low-resource countries, cervical cancer is the leading female malignancy, with lifetime cumulative inci- dence exceeding 5%. The number of new cases is projected to increase in the decades ahead as the global population grows and ages (2). Cervical cancer arises from persistent infection of the cervix with approximately a dozen carcinogenic types of human papil- lomavirus (HPV) (3). We can prevent it by prophylactic vaccina- tion and screening/treatment of cervical cancer precursor lesions (“precancer”). However, mainstays of cervical cancer ARTICLE Received: August 21, 2018; Revised: October 12, 2018; Accepted: December 3, 2018 Published by Oxford University Press 2019. This work is written by US Government employees and is in the public domain in the US. 1 JNCI J Natl Cancer Inst (2019) ( ): djy225 doi: 10.1093/jnci/djy225 Article 00 Downloaded from https://academic.oup.com/jnci/advance-article-abstract/doi/10.1093/jnci/djy225/5272614 by SuUB Bremen user on 29 January 2019

Upload: others

Post on 03-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

ARTICLE

An Observational Study of Deep Learning and Automated

Evaluation of Cervical Images for Cancer Screening

Liming Hu David Bell Sameer Antani Zhiyun Xue Kai Yu Matthew P HorningNoni Gachuhi Benjamin Wilson Mayoore S Jaiswal Brian Befano L Rodney LongRolando Herrero Mark H Einstein Robert D Burk Maria Demarco Julia C GageAna Cecilia Rodriguez Nicolas Wentzensen Mark Schiffman

See the Notes section for the full list of authorsrsquo affiliationsCorrespondence to Mark Schiffman MD MPH National Cancer Institute Room 6E544 9609 Medical Center Drive Rockville MD 20850 (e-mail schiffmmmailnihgov)

Abstract

Background Human papillomavirus vaccination and cervical screening are lacking in most lower resource settings whereapproximately 80 of more than 500 000 cancer cases occur annually Visual inspection of the cervix following acetic acidapplication is practical but not reproducible or accurate The objective of this study was to develop a ldquodeep learningrdquo-basedvisual evaluation algorithm that automatically recognizes cervical precancercancerMethods A population-based longitudinal cohort of 9406 women ages 18ndash94 years in Guanacaste Costa Rica was followed for7 years (1993ndash2000) incorporating multiple cervical screening methods and histopathologic confirmation of precancersTumor registry linkage identified cancers up to 18 years Archived digitized cervical images from screening taken with afixed-focus camera (ldquocervicographyrdquo) were used for trainingvalidation of the deep learning-based algorithm The resultantimage prediction score (0ndash1) could be categorized to balance sensitivity and specificity for detection of precancercancer Allstatistical tests were two-sidedResults Automated visual evaluation of enrollment cervigrams identified cumulative precancercancer cases with greateraccuracy (area under the curve [AUC]frac14091 95 confidence interval [CI]frac14089 to 093) than original cervigram interpretation(AUCfrac14069 95 CIfrac14063 to 074 Plt 001) or conventional cytology (AUCfrac14071 95 CIfrac14065 to 077 Plt 001) A single visualscreening round restricted to women at the prime screening ages of 25ndash49 years could identify 127 (557) of 228 precancers(cervical intraepithelial neoplasia 2cervical intraepithelial neoplasia 3adenocarcinoma in situ [AIS]) diagnosed cumulativelyin the entire adult population (ages 18ndash94 years) while referring 110 for managementConclusions The results support consideration of automated visual evaluation of cervical images from contemporary digitalcameras If achieved this might permit dissemination of effective point-of-care cervical screening

Cervical cancer remains a leading cause of cancer mortality andmorbidity worldwide (1) Approximately 80 of the half-millioncases and 90 of the quarter-million deaths per year occur inlow- and middle-income countries where prevention programsare limited In some low-resource countries cervical cancer isthe leading female malignancy with lifetime cumulative inci-dence exceeding 5 The number of new cases is projected to

increase in the decades ahead as the global population growsand ages (2)

Cervical cancer arises from persistent infection of the cervixwith approximately a dozen carcinogenic types of human papil-lomavirus (HPV) (3) We can prevent it by prophylactic vaccina-tion and screeningtreatment of cervical cancer precursorlesions (ldquoprecancerrdquo) However mainstays of cervical cancer

AR

TIC

LE

Received August 21 2018 Revised October 12 2018 Accepted December 3 2018

Published by Oxford University Press 2019 This work is written by US Government employees and is in the public domain in the US

1

JNCI J Natl Cancer Inst (2019) ( ) djy225

doi 101093jncidjy225Article

0 0

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

screening programs in high-resource settings including cervicalcytology (Pap tests) and colposcopy require infrastructure andextensively trained personnel that are lacking in most lowerresource settings Newer powerful cervical cancer preventiontools like prophylactic HPV vaccination and screening with sen-sitive HPV tests could be very useful in low-resource settingsHowever again dissemination has been severely limited bylack of resources and organization

In search of programmatic simplicity and sustainable costsauthorities including the World Health Organization the USPresidentrsquos Emergency Plan for AIDS Relief and the Indian gov-ernment have endorsed screening of the cervix by visual in-spection after application (VIA) of acetic acid to highlightprecancerous or cancerous abnormalities (4ndash6) when more ad-vanced methods are not feasible The health worker performingVIA rates the cervical appearance as normal or abnormal withparticular attention to possible invasive cancer If the appear-ance is abnormal and if the size and position of the visible le-sion permit the cervical epithelium is destroyed by freezing orheating (ldquosee and treatrdquo) Excision is sometimes required for se-vere lesions As a screening test VIA is simple and inexpensiveand has been shown to find some invasive cancers while theyare still local and curable by surgery (7) However the main goalof screening is to prevent cancer by detection and treatment ofprecancers and VIA is not accurate in distinguishing precancerfrom much more common minor abnormalities leading to bothovertreatment and undertreatment (8ndash10) Increasingly it is rec-ognized that the visual identification of precancer by healthworkers even by experienced nurses and doctors using a colpo-scope the reference standard visual tool is too often unreliableand inaccurate (1112) Thus we currently still lack a practicalaccurate visual screening approach

In other similarly subjective medical diagnostic situationsnew methods of pattern recognition by computer (referred to asdeep or machine learning) have proven useful (13) Machinelearning-based approaches to cervical cancer screening haveyielded promising early results but have lacked a good measure-ment of precancer sufficient sample size or prospective follow-up (Supplementary Table 1 available online) (1415)

We applied a deep learning-based object detection method[Faster R-CNN or faster region-based convolutional neuralnetwork (16)] algorithm to cervical images taken during aNational Cancer Institute (NCI) prospective epidemiologic studywith long follow-up and rigorously defined precancer end-points to develop a detection algorithm that can identify cervi-cal precancer Here we demonstrate the proof-of-principle ofan ldquoautomated visual evaluationrdquo algorithm applied to archiveddigitized cervical images

Materials and Methods

Study Population

Conducted from 1993 to 2001 the NCI-funded ProyectoEpidemiologico Guanacaste accumulated approximately 30 000screening visits of more than 9000 participants (933 accep-tance) (Figure 1) This longitudinal cohort study of HPV and cer-vical cancer was conducted in a random sample of a moderate-risk poorly screened Costa Rican province (1718) Participantsaged 18ndash94 years were screened at baseline and periodically forup to 7 years using multiple methods at intervals determined bytheir screening results and resultant estimated risk of precancer(mean number of visitsfrac14 34 SDfrac14 25 approximately 90 of

women with some follow-up) (1718) In a subsequent linkagestudy the cancer registry was used to extend follow-up for inva-sive cancer up to 18 years (1819)

Cervicography

Each screening visit included multiple kinds of tests cytologyHPV testing and cervicography (18) Cervicography now discon-tinued was a visual screening method based on the interpreta-tion of a pair of cervical photographs (called cervigrams) (2021)Two sequential duplicate cervigrams of the cervix were takenat each screening visit after acetic acid application using afixed-focus ring-lit film camera called a cerviscope The rolls offilm were mailed regularly and developed into projection slidesin the United States by National Testing Laboratory Worldwide(Fenton MO) Each cervigram image was originally projected ona wall screen for magnification and classified by one of twohighly experienced National Testing Laboratory Worldwidephysician colposcopist evaluators as normal atypical positivefor minor low-grade HPV-induced changes or positive for pre-cancer or cancer (the last two combined here) After completionof the field effort the photographic images were digitized andthe files compressed for storage (1722)

Other Screening Tests in the Guanacaste Cohort

Cytology methods included conventional Pap smears a proto-type liquid-based method (23) and a first-generation automatedapproach incorporating an early version of a neural network-based method (24) The conventional Pap smear performed inCosta Rica best represents the kind typically performed inlower- and middle-resource regions HPV testing was performedby MY09-MY11 consensus primer PCR (25) but was not used forcolposcopy referral in this early epidemiologic demonstration ofHPV predictive value We defined positivity as detection of oneof 13 high-risk HPV per the International Agency For ResearchOn Cancer classification (26)

CIN21 Cases and Controls

Cases included women who were diagnosed with histologicCIN2 or worse (CIN2thorn) during enrollment or follow-up Womenfound at a cohort screening examination to have abnormal cy-tology or cervicography screening results were referred to col-poscopy performed by a single study gynecologist who biopsiedthe worst-appearing lesion Histologic CIN2 or worse (CIN2thorn)was treated by large loop excision of the transformation zoneall women with definite or even possible CIN2thorn (eg high-gradesquamous intraepithelial lesion cytology) were referred formanagement and censored from further study The histopatho-logic reference diagnosis of CIN2thorn was determined by majorityreview of biopsies and excision specimens by a panel of oneCosta Rica pathologist and one US consultant pathologist withdisagreements leading to another US pathologistrsquos review

Collection and use of the visual images were approved bythe Costa Rican and NCI ethical committees The images werecollected originally under written informed consent that cov-ered subsequent research use The specific use for machinelearning-based algorithms on study images was also approvedby the NCI Institutional Review Board

AR

TIC

LE

2 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Automated Visual Evaluation Algorithm to EvaluateCervical Images

Three controls per case were chosen from among women whodid not present with CIN2thorn during active surveillance fre-quency matched to time of diagnosis of case (enrollmentfol-low-up) A random approximately 70 (197279) of cases andfrequency-matched controls were chosen for training and theremaining approximately 30 were reserved as the initial vali-dation test set We randomly chose a single image from eachpair of images available per prediagnostic case visit and controlvisit Of note after the initial validation (SupplementaryMethods available online) the main analysis focused onimages taken at enrollment visits

The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online) The au-tomated visual evaluation detection algorithm performs twomain functions detecting (locating) the cervix within the inputimage and predicting the probability that the image representsa case of CIN2thorn The algorithm based on Faster R-CNN performsobject (cervix) detection feature extraction (calculating the fea-tures of the object) and classification (predicting the case prob-ability score) We explored several different techniques asdescribed in the Supplementary Methods (available online) andFaster R-CNN algorithm provided the best combination of speedand accuracy (27) The inputs to train the model were the cervi-gram images cervix location (rectangle encompassing the

cervix) and the class ground truth (case of CIN2thorn or controlltCIN2)

To create the cervix locator function of the automated visualevaluation algorithm we first trained an independent cervix lo-cator algorithm using Faster R-CNN to separate the cervix fromthe background This involved annotating 2000 images (manu-ally drawing a rectangle around the cervix) using a softwaretool shown in Supplementary Figure 2 (available online) devel-oped for this task By incorporating the trained locator functionthe only input required for the automated visual evaluation al-gorithm was the cervigram image The final model providedboth the predicted cervix location and the case probability

CNN-based methods are data driven and require large quan-tities of training data to perform well however our datasetfrom a population cohort study was limited to the smaller num-ber of cases that actually occurred To compensate we per-formed transfer learning (28) by first initializing the CNNarchitecture with pretrained weights from a model trained withthe ImageNet dataset (29) a database of millions of (noncervi-gram) images of all kinds of natural objects We then retrainedwith the cervigram images in the training set We also aug-mented the image data (30) artificially increasing the numberof cervical images by performing minor distortions to the origi-nal images including rotation mirroring sheering and gammatransformation We used the faster R-CNN end-to-end trainingconfiguration of stochastic gradient descent optimization withthe parameters in Supplementary Table 2 (available online)

Women included in analysis (N=9406)

Total CIN2+ cases N=279 Total ltCIN2 controls N=9127

Training set cases randomly selected (N=189)

Training set controls randomly selected (N=555)

CIN2+ cases not in training (N=90)Validaon set does not include eight cancer upgrades aer random drawScreening set included only enrollment images five cases did not have enrollment images available

Controls not used due to lack of enrollment image (N=378)

Training set is one image per woman (N=744 189 cases and 555 controls)

Screening set is enrollment image for women (N=8259 85 cases and 8174

controls)

Validaon and screening set control pool (N= 8194)

Validaon set used last image during follow-up (N=242)

Screening set used enrollment image note 20 validaon set

controls did not have an enrollment image (N=8174)

Validaon Set is one image per woman (N=324 82 cases and 242 controls)

Enrolled in Guanacaste Costa Rica cohort (N=10080)

Excluded (N=674)No Image collected (N=630)Mulple colpo sessions (N=31)Inadequate histology (N=13)

Analysis sets

Figure 1 Cervical images used for training and validation The images were drawn from the Proyecto Epidemiologico Guanacaste a longitudinal cohort study of human

papillomavirus infection other screening tests and risk of cervical precancercancer (1993ndash2001) The training and initial validation made use of the last images taken

prior to case diagnosis and approximately 31 controls frequency matched to cases on time of study (enrollmentfollow-up) The main analysis focused on images (ex-

cluding all images from women in the training set) from cohort enrollment and examined how the automated image analysis of enrollment screening images per-

formed in prediction of cases found over the course of the entire cohort study In an analysis that counted the absolute numbers of cases detected and controls

referred it was necessary to reweight that is to multiply the findings in the randomly selected validation test by the inverse of the 30 sampling fraction to estimate

numbers for all cases and matched controls (see Methods) CIN frac14 cervical intraepithelial neoplasia

AR

TIC

LE

L Hu et al | 3

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

These techniques are further described in the SupplementaryMethods (available online)

National Library of Medicine (National Institutes of HealthUSA) collaborators independently checked the technique Thisseries of confirmatory experiments was not designed to standalone rather it was meant to increase credibility of the originalfindings by replication outside of the inventing group They firstfollowed the architecture of the proposed method to evaluatethe performance of the cervix region locator and of the visualevaluation algorithm independently evaluating different sub-sets of the Guanacaste cohort images for model training thanthe original They also tested the trained network with testimages that were altered to represent a different camera zoomand moderate image resizing

To visualize what the algorithm bases its prediction on wealso implemented a system that generates a heat map showingwhich regions of the image most strongly influence the predic-tion This system is described in Supplementary Methods (avail-able online)

Statistical Analyses

The automated visual evaluation detection algorithm yields ascore (ranging from 00 to 10) that predicts whether the imagerepresents a case of histologic CIN2thorn Examining first the repro-ducibility of the scores we compared 322 replicate pairs ofimages included for this purpose in the validation set thePearson correlation within pairs was 097 (95 confidence

interval [CI]frac14 097 to 098) the very high agreement justified us-ing only one image per pair for subsequent analyses

We conducted the analyses presented here using cohort en-rollment images from the 30 of women in the initial validationset plus enrollment images from the ldquoleftoverrdquo women in theGuanacaste cohort with ltCIN2 not chosen for either the train-ing or initial validation set (Figure 1) The main validation analy-sis focused on images taken at cohort enrollment visitsevaluated accuracy of the automated visual evaluation algo-rithm compared with the originally performed baseline screen-ing tests (cervicography cytology) and HPV testing introducedfor validation but not used as a screening test at that time

We evaluated the accuracy of the automated visual evalua-tion detection algorithm for identification of CIN2thorn cases in thevalidation set using Receiver Operating Characteristic (ROC)curves and its summary statistic area under the curve (AUC)We created ROC-like curves for the ordinal categories of theoriginal screening tests and a ROC curve of the continuous auto-mated visual evaluation empirically by the trapezoid methodThese AUC values were tested for statistical significance againstthe AUC for automated visual evaluation by two-sided chi-squared tests (31) For analyses requiring a categorical positivenegative result for automated visual evaluation we chose thecutpoint in the continuous score distribution specific to thatage group in age-stratified analyses that maximized Youdenrsquosindex (sensitivity thorn specificity ndash 1) (32)

In a separate analysis to estimate the impact of a singlescreening round for the full Guanacaste cohort we filled in forimages from women used in the training set and therefore

Figure 2 The system architecture of the automated visual evaluation algorithm Two models are trained a cervix locator (top) and the automated visual evaluation de-

tection algorithm (bottom) The final validation algorithm incorporated both cervix locator and automated visual evaluation

AR

TIC

LE

4 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

excluded from the validation of the algorithm To do so wereweighted specifically because the initial validation set was arandom sample of the training validation group we could mul-tiply the results for women in the validation set by the inverseof the sampling fraction (30) to represent all women used fortraining and validation We then added the remaining noncasesto reconstitute the full cohort

Cases that resulted in differences between automated visualevaluation algorithm results and original evaluator interpretationof the same cervigram images were rereviewed without maskingby an expert gynecologic oncologist and colposcopist (ME) to noteany subjective patterns that might explain discrepancies

Results

The cohort study population enrolled in 1993ndash1994 (Table 1) wasa random sample of adult women in Guanacaste Province rang-ing in age from 18 to 94 years (median 35 years) The provincewas mixed rural and small town Most women were married(761) and parity was high (923) Smoking was unusual (107were current smokers) but oral contraceptive use (present orpast) was common (777 had ever used oral contraceptives)The prevalence of carcinogenic HPV types was less than 20and declined sharply with age

There were 241 histologically confirmed cases of precancer(CIN2CIN3) and 38 cases of cancer observed among 9406women followed for up to 7 years in the population-basedGuanacaste cohort Glandular lesions were very uncommonand categorized with squamous lesions of equivalent severity(eg rare AIS was included with CIN3) The mean automated vi-sual evaluation algorithm result scores at enrollment wereequivalent for CIN2 vs CIN3 (070 and 069 Pfrac14 51) and statisti-cally nonsignificantly higher for the uncommon cancers (080Pfrac14 42) (Figure 3) In subsequent analyses we combined CIN2CIN3 and cancer in a single case group

The AUC for automated visual evaluation of the enrollmentcervical images for women of all ages was 091 (95 CIfrac14 089 to093) (Figure 4) Automated visual evaluation was statisticallysignificantly more accurate than cervigram result (AUCfrac14 06995 CIfrac14 063 to 074 Plt 001)

As shown in Figure 4 automated visual evaluation perfor-mance was also statistically significantly more accurate thanconventional Pap smears (AUCfrac14 071 95 CIfrac14 065 to 077Plt 001) liquid-based cytology (AUCfrac14 079 95 CIfrac14 073 to 084Plt 001) first-generation neural network-based cytology(AUCfrac14 070 95 CIfrac14 063 to 076 Plt 001) and HPV testing(AUCfrac14 082 95 CIfrac14 077 to 087 Plt 001)

We examined all 16 discrepant findings that were algorithmpositiveHPV negative The additional positives by automatedvisual evaluation tended to occur among younger women(medianfrac14 265 Pfrac14 06 by Wilcoxon test compared with the restof the cases in the cohort whose median age was 35 years)They were likely to be diagnosed with CIN2 (1216 [750] com-pared with 2061 [328] of other cases Pfrac14 008)

In anticipation of designing screening programs we consid-ered three distinct age ranges 18ndash24 years 25ndash49 years and 50thornyears with the intermediate age group of primary interest be-cause it coincides with the majority (130228) of CIN2-CIN3cases and higher sensitivity (127130 Plt 001 compared withyounger women) (Table 2) We estimated that a single auto-mated visual evaluation screening round targeting women atthe prime screening ages of 25ndash49 years could identify 557(127228) of precancers (CIN2CIN3AIS) diagnosed cumulatively

in the entire adult population To achieve this level of sensitiv-ity in the entire population by a single round of screeningwomen aged 25ndash49 years would require referring 110 (9828906) of the entire population (and 180 [9825460] of thoseaged 25ndash49 years) for treatment

Review of enrollment images from cases with discrepant hu-man vs automated visual evaluations (and a random tenth ofdiscrepant noncases) revealed that many of the automated vi-sual evaluation algorithm result-positivecervigram-negativecases of CIN2thorn (additional true positives) had suboptimalimages (eg poor focus or washed-out color during scanning offilm) or there were obstructing vaginal folds or blood No clearpatterns emerged from review of images with discrepant resultsamong women with ltCIN2

Table 1 Selected demographic features of the 9406 women in thisanalysis of the Guanacaste cohort and their enrollment screeningresults

Feature or test result No

Enrollment age y18ndash29 2598 (276)30ndash49 4357 (463)50ndash94 2451 (261)

Marital statusMarried 7157 (761)Separatedwidowed 508 (102)Single 1290 (137)

Age at first intercourse y16 3269 (348)17ndash18 2579 (274)19thorn 3550 (378)

Lifetime No of partners1 4934 (525)2ndash3 1947 (331)4thorn 1358 (144)

Lifetime No of pregnancies0 723 (77)1ndash2 2588 (275)3ndash4 2475 (263)5thorn 3620 (385)

Current smokerNo 8398 (893)Yes 1003 (107)

Ever use oral contraceptionYes 5677 (777)No 1634 (224)

Enrollment Pap resultNormal 7906 (841)ASC-US 4227 (84)LSIL 471 (50)High-grade 232 (25)

Enrollment cervicographyNegative 8176 (869)Atypical 868 (92)Low-grade 316 (34)High-gradecancer 46 (05)

Enrollment HPV resultNegative for carcinogenic types 6683 (711)Positive (not HPV16) 2314 (246)HPV16 393 (42)

Missing test results not shown but counted in totals such that percentages do

not sum to 100 ASC-US frac14 atypical squamous cells of undetermined signifi-

cance HPV frac14 human papillomavirus LSIL frac14 low-grade squamous intraepithelial

lesion

AR

TIC

LE

L Hu et al | 5

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Some trends emerged based on analysis of the heat mapshighlighting regions of the image that most influence the algo-rithm prediction (Supplementary Figures 3ndash7 available online)The model usually based its prediction primarily on the regionsurrounding the os and performed best when the cervix wasoriented directly towards the camera It was more affected byregions with visible texture than by smooth regions In generalthe model was not excited by glare from the camerarsquos light orother distractors although in some cases the presence of cottonswabs or other noncervix objects did distract the model

Analysis of the long-term follow-up of the cohort includinglinkage to the Costa Rican Tumor Registry revealed 39 cancersMost were in the training set 17 were not and had enrollmentseverity scores (Table 3) Women with cancer missed by auto-mated visual evaluation tended to be diagnosed by the tumorregistry merge (mainly older women diagnosed after 7 years ofactive follow-up)

The automated visual evaluation detection algorithm couldtheoretically be used to triage women testing positive for HPVrather than for primary screening Training another automatedvisual evaluation model restricted to HPV-positive women (datanot shown) a sensitivity of 100 (1313) was achieved with aspecificity of 575 (103179) Therefore primary HPV testing us-ing self-sampling if it matched the HPV test performance of theearly assays we used in 1993ndash1994 followed by the automatedvisual evaluation algorithm restricted to positives could achievethe same aggregate sensitivity while more than halving thenumber of women requiring cervical examinations

Discussion

Using a deep learning approach called Faster R-CNN with exten-sive image augmentation based on a pretrained model wetrained and validated an image analyzer that performsldquoautomated visual evaluationrdquo of the cervix As a primaryscreening method the algorithm trained using digitized cervi-grams from the Guanacaste Cohort achieved excellent sensitiv-ity for detection of CIN2thorn in the age group at highest risk ofprecancers The performance surpassed colposcopist evaluatorinterpretations of the same images (cervicography) and com-pared favorably to conventional Pap smears (and alternativekinds of cytology) while matching the screening accuracy of anearly version of PCR-based HPV testing The additional positiveevaluations among women that were negative for HPV need fur-ther study before they can be believed logically it is hard to ac-cept and there was an indication that the cases were anunusually young group with incident CIN2 suggesting somepossibility of misclassification of both visual appearance andoutcome

Restricted to the age group at which risk of precancer peaksto achieve nearly perfect sensitivity for cases occurring up to 7years after examination generated a large number of false posi-tives among screened noncases More balanced cutpoints forpositivity might be chosen to limit excessive treatments al-though sensitivity would drop Another possibility for improv-ing specificity while retaining high sensitivity might becombination screening (commonly called ldquocotestingrdquo) with

recnaC3NIC2NIC

0

025

050

075

100

Cas

e pr

obab

ility

Figure 3 Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial

neoplasia (CIN) 2 CIN3 or cancer These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm

Therefore we combined all into a single case group We show a box and whisker plot of 32 CIN2 38 CIN3 and seven cancers giving quartiles means and outliers of

case probability for each diagnosis within the cohort

AR

TIC

LE

6 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 2: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

screening programs in high-resource settings including cervicalcytology (Pap tests) and colposcopy require infrastructure andextensively trained personnel that are lacking in most lowerresource settings Newer powerful cervical cancer preventiontools like prophylactic HPV vaccination and screening with sen-sitive HPV tests could be very useful in low-resource settingsHowever again dissemination has been severely limited bylack of resources and organization

In search of programmatic simplicity and sustainable costsauthorities including the World Health Organization the USPresidentrsquos Emergency Plan for AIDS Relief and the Indian gov-ernment have endorsed screening of the cervix by visual in-spection after application (VIA) of acetic acid to highlightprecancerous or cancerous abnormalities (4ndash6) when more ad-vanced methods are not feasible The health worker performingVIA rates the cervical appearance as normal or abnormal withparticular attention to possible invasive cancer If the appear-ance is abnormal and if the size and position of the visible le-sion permit the cervical epithelium is destroyed by freezing orheating (ldquosee and treatrdquo) Excision is sometimes required for se-vere lesions As a screening test VIA is simple and inexpensiveand has been shown to find some invasive cancers while theyare still local and curable by surgery (7) However the main goalof screening is to prevent cancer by detection and treatment ofprecancers and VIA is not accurate in distinguishing precancerfrom much more common minor abnormalities leading to bothovertreatment and undertreatment (8ndash10) Increasingly it is rec-ognized that the visual identification of precancer by healthworkers even by experienced nurses and doctors using a colpo-scope the reference standard visual tool is too often unreliableand inaccurate (1112) Thus we currently still lack a practicalaccurate visual screening approach

In other similarly subjective medical diagnostic situationsnew methods of pattern recognition by computer (referred to asdeep or machine learning) have proven useful (13) Machinelearning-based approaches to cervical cancer screening haveyielded promising early results but have lacked a good measure-ment of precancer sufficient sample size or prospective follow-up (Supplementary Table 1 available online) (1415)

We applied a deep learning-based object detection method[Faster R-CNN or faster region-based convolutional neuralnetwork (16)] algorithm to cervical images taken during aNational Cancer Institute (NCI) prospective epidemiologic studywith long follow-up and rigorously defined precancer end-points to develop a detection algorithm that can identify cervi-cal precancer Here we demonstrate the proof-of-principle ofan ldquoautomated visual evaluationrdquo algorithm applied to archiveddigitized cervical images

Materials and Methods

Study Population

Conducted from 1993 to 2001 the NCI-funded ProyectoEpidemiologico Guanacaste accumulated approximately 30 000screening visits of more than 9000 participants (933 accep-tance) (Figure 1) This longitudinal cohort study of HPV and cer-vical cancer was conducted in a random sample of a moderate-risk poorly screened Costa Rican province (1718) Participantsaged 18ndash94 years were screened at baseline and periodically forup to 7 years using multiple methods at intervals determined bytheir screening results and resultant estimated risk of precancer(mean number of visitsfrac14 34 SDfrac14 25 approximately 90 of

women with some follow-up) (1718) In a subsequent linkagestudy the cancer registry was used to extend follow-up for inva-sive cancer up to 18 years (1819)

Cervicography

Each screening visit included multiple kinds of tests cytologyHPV testing and cervicography (18) Cervicography now discon-tinued was a visual screening method based on the interpreta-tion of a pair of cervical photographs (called cervigrams) (2021)Two sequential duplicate cervigrams of the cervix were takenat each screening visit after acetic acid application using afixed-focus ring-lit film camera called a cerviscope The rolls offilm were mailed regularly and developed into projection slidesin the United States by National Testing Laboratory Worldwide(Fenton MO) Each cervigram image was originally projected ona wall screen for magnification and classified by one of twohighly experienced National Testing Laboratory Worldwidephysician colposcopist evaluators as normal atypical positivefor minor low-grade HPV-induced changes or positive for pre-cancer or cancer (the last two combined here) After completionof the field effort the photographic images were digitized andthe files compressed for storage (1722)

Other Screening Tests in the Guanacaste Cohort

Cytology methods included conventional Pap smears a proto-type liquid-based method (23) and a first-generation automatedapproach incorporating an early version of a neural network-based method (24) The conventional Pap smear performed inCosta Rica best represents the kind typically performed inlower- and middle-resource regions HPV testing was performedby MY09-MY11 consensus primer PCR (25) but was not used forcolposcopy referral in this early epidemiologic demonstration ofHPV predictive value We defined positivity as detection of oneof 13 high-risk HPV per the International Agency For ResearchOn Cancer classification (26)

CIN21 Cases and Controls

Cases included women who were diagnosed with histologicCIN2 or worse (CIN2thorn) during enrollment or follow-up Womenfound at a cohort screening examination to have abnormal cy-tology or cervicography screening results were referred to col-poscopy performed by a single study gynecologist who biopsiedthe worst-appearing lesion Histologic CIN2 or worse (CIN2thorn)was treated by large loop excision of the transformation zoneall women with definite or even possible CIN2thorn (eg high-gradesquamous intraepithelial lesion cytology) were referred formanagement and censored from further study The histopatho-logic reference diagnosis of CIN2thorn was determined by majorityreview of biopsies and excision specimens by a panel of oneCosta Rica pathologist and one US consultant pathologist withdisagreements leading to another US pathologistrsquos review

Collection and use of the visual images were approved bythe Costa Rican and NCI ethical committees The images werecollected originally under written informed consent that cov-ered subsequent research use The specific use for machinelearning-based algorithms on study images was also approvedby the NCI Institutional Review Board

AR

TIC

LE

2 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Automated Visual Evaluation Algorithm to EvaluateCervical Images

Three controls per case were chosen from among women whodid not present with CIN2thorn during active surveillance fre-quency matched to time of diagnosis of case (enrollmentfol-low-up) A random approximately 70 (197279) of cases andfrequency-matched controls were chosen for training and theremaining approximately 30 were reserved as the initial vali-dation test set We randomly chose a single image from eachpair of images available per prediagnostic case visit and controlvisit Of note after the initial validation (SupplementaryMethods available online) the main analysis focused onimages taken at enrollment visits

The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online) The au-tomated visual evaluation detection algorithm performs twomain functions detecting (locating) the cervix within the inputimage and predicting the probability that the image representsa case of CIN2thorn The algorithm based on Faster R-CNN performsobject (cervix) detection feature extraction (calculating the fea-tures of the object) and classification (predicting the case prob-ability score) We explored several different techniques asdescribed in the Supplementary Methods (available online) andFaster R-CNN algorithm provided the best combination of speedand accuracy (27) The inputs to train the model were the cervi-gram images cervix location (rectangle encompassing the

cervix) and the class ground truth (case of CIN2thorn or controlltCIN2)

To create the cervix locator function of the automated visualevaluation algorithm we first trained an independent cervix lo-cator algorithm using Faster R-CNN to separate the cervix fromthe background This involved annotating 2000 images (manu-ally drawing a rectangle around the cervix) using a softwaretool shown in Supplementary Figure 2 (available online) devel-oped for this task By incorporating the trained locator functionthe only input required for the automated visual evaluation al-gorithm was the cervigram image The final model providedboth the predicted cervix location and the case probability

CNN-based methods are data driven and require large quan-tities of training data to perform well however our datasetfrom a population cohort study was limited to the smaller num-ber of cases that actually occurred To compensate we per-formed transfer learning (28) by first initializing the CNNarchitecture with pretrained weights from a model trained withthe ImageNet dataset (29) a database of millions of (noncervi-gram) images of all kinds of natural objects We then retrainedwith the cervigram images in the training set We also aug-mented the image data (30) artificially increasing the numberof cervical images by performing minor distortions to the origi-nal images including rotation mirroring sheering and gammatransformation We used the faster R-CNN end-to-end trainingconfiguration of stochastic gradient descent optimization withthe parameters in Supplementary Table 2 (available online)

Women included in analysis (N=9406)

Total CIN2+ cases N=279 Total ltCIN2 controls N=9127

Training set cases randomly selected (N=189)

Training set controls randomly selected (N=555)

CIN2+ cases not in training (N=90)Validaon set does not include eight cancer upgrades aer random drawScreening set included only enrollment images five cases did not have enrollment images available

Controls not used due to lack of enrollment image (N=378)

Training set is one image per woman (N=744 189 cases and 555 controls)

Screening set is enrollment image for women (N=8259 85 cases and 8174

controls)

Validaon and screening set control pool (N= 8194)

Validaon set used last image during follow-up (N=242)

Screening set used enrollment image note 20 validaon set

controls did not have an enrollment image (N=8174)

Validaon Set is one image per woman (N=324 82 cases and 242 controls)

Enrolled in Guanacaste Costa Rica cohort (N=10080)

Excluded (N=674)No Image collected (N=630)Mulple colpo sessions (N=31)Inadequate histology (N=13)

Analysis sets

Figure 1 Cervical images used for training and validation The images were drawn from the Proyecto Epidemiologico Guanacaste a longitudinal cohort study of human

papillomavirus infection other screening tests and risk of cervical precancercancer (1993ndash2001) The training and initial validation made use of the last images taken

prior to case diagnosis and approximately 31 controls frequency matched to cases on time of study (enrollmentfollow-up) The main analysis focused on images (ex-

cluding all images from women in the training set) from cohort enrollment and examined how the automated image analysis of enrollment screening images per-

formed in prediction of cases found over the course of the entire cohort study In an analysis that counted the absolute numbers of cases detected and controls

referred it was necessary to reweight that is to multiply the findings in the randomly selected validation test by the inverse of the 30 sampling fraction to estimate

numbers for all cases and matched controls (see Methods) CIN frac14 cervical intraepithelial neoplasia

AR

TIC

LE

L Hu et al | 3

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

These techniques are further described in the SupplementaryMethods (available online)

National Library of Medicine (National Institutes of HealthUSA) collaborators independently checked the technique Thisseries of confirmatory experiments was not designed to standalone rather it was meant to increase credibility of the originalfindings by replication outside of the inventing group They firstfollowed the architecture of the proposed method to evaluatethe performance of the cervix region locator and of the visualevaluation algorithm independently evaluating different sub-sets of the Guanacaste cohort images for model training thanthe original They also tested the trained network with testimages that were altered to represent a different camera zoomand moderate image resizing

To visualize what the algorithm bases its prediction on wealso implemented a system that generates a heat map showingwhich regions of the image most strongly influence the predic-tion This system is described in Supplementary Methods (avail-able online)

Statistical Analyses

The automated visual evaluation detection algorithm yields ascore (ranging from 00 to 10) that predicts whether the imagerepresents a case of histologic CIN2thorn Examining first the repro-ducibility of the scores we compared 322 replicate pairs ofimages included for this purpose in the validation set thePearson correlation within pairs was 097 (95 confidence

interval [CI]frac14 097 to 098) the very high agreement justified us-ing only one image per pair for subsequent analyses

We conducted the analyses presented here using cohort en-rollment images from the 30 of women in the initial validationset plus enrollment images from the ldquoleftoverrdquo women in theGuanacaste cohort with ltCIN2 not chosen for either the train-ing or initial validation set (Figure 1) The main validation analy-sis focused on images taken at cohort enrollment visitsevaluated accuracy of the automated visual evaluation algo-rithm compared with the originally performed baseline screen-ing tests (cervicography cytology) and HPV testing introducedfor validation but not used as a screening test at that time

We evaluated the accuracy of the automated visual evalua-tion detection algorithm for identification of CIN2thorn cases in thevalidation set using Receiver Operating Characteristic (ROC)curves and its summary statistic area under the curve (AUC)We created ROC-like curves for the ordinal categories of theoriginal screening tests and a ROC curve of the continuous auto-mated visual evaluation empirically by the trapezoid methodThese AUC values were tested for statistical significance againstthe AUC for automated visual evaluation by two-sided chi-squared tests (31) For analyses requiring a categorical positivenegative result for automated visual evaluation we chose thecutpoint in the continuous score distribution specific to thatage group in age-stratified analyses that maximized Youdenrsquosindex (sensitivity thorn specificity ndash 1) (32)

In a separate analysis to estimate the impact of a singlescreening round for the full Guanacaste cohort we filled in forimages from women used in the training set and therefore

Figure 2 The system architecture of the automated visual evaluation algorithm Two models are trained a cervix locator (top) and the automated visual evaluation de-

tection algorithm (bottom) The final validation algorithm incorporated both cervix locator and automated visual evaluation

AR

TIC

LE

4 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

excluded from the validation of the algorithm To do so wereweighted specifically because the initial validation set was arandom sample of the training validation group we could mul-tiply the results for women in the validation set by the inverseof the sampling fraction (30) to represent all women used fortraining and validation We then added the remaining noncasesto reconstitute the full cohort

Cases that resulted in differences between automated visualevaluation algorithm results and original evaluator interpretationof the same cervigram images were rereviewed without maskingby an expert gynecologic oncologist and colposcopist (ME) to noteany subjective patterns that might explain discrepancies

Results

The cohort study population enrolled in 1993ndash1994 (Table 1) wasa random sample of adult women in Guanacaste Province rang-ing in age from 18 to 94 years (median 35 years) The provincewas mixed rural and small town Most women were married(761) and parity was high (923) Smoking was unusual (107were current smokers) but oral contraceptive use (present orpast) was common (777 had ever used oral contraceptives)The prevalence of carcinogenic HPV types was less than 20and declined sharply with age

There were 241 histologically confirmed cases of precancer(CIN2CIN3) and 38 cases of cancer observed among 9406women followed for up to 7 years in the population-basedGuanacaste cohort Glandular lesions were very uncommonand categorized with squamous lesions of equivalent severity(eg rare AIS was included with CIN3) The mean automated vi-sual evaluation algorithm result scores at enrollment wereequivalent for CIN2 vs CIN3 (070 and 069 Pfrac14 51) and statisti-cally nonsignificantly higher for the uncommon cancers (080Pfrac14 42) (Figure 3) In subsequent analyses we combined CIN2CIN3 and cancer in a single case group

The AUC for automated visual evaluation of the enrollmentcervical images for women of all ages was 091 (95 CIfrac14 089 to093) (Figure 4) Automated visual evaluation was statisticallysignificantly more accurate than cervigram result (AUCfrac14 06995 CIfrac14 063 to 074 Plt 001)

As shown in Figure 4 automated visual evaluation perfor-mance was also statistically significantly more accurate thanconventional Pap smears (AUCfrac14 071 95 CIfrac14 065 to 077Plt 001) liquid-based cytology (AUCfrac14 079 95 CIfrac14 073 to 084Plt 001) first-generation neural network-based cytology(AUCfrac14 070 95 CIfrac14 063 to 076 Plt 001) and HPV testing(AUCfrac14 082 95 CIfrac14 077 to 087 Plt 001)

We examined all 16 discrepant findings that were algorithmpositiveHPV negative The additional positives by automatedvisual evaluation tended to occur among younger women(medianfrac14 265 Pfrac14 06 by Wilcoxon test compared with the restof the cases in the cohort whose median age was 35 years)They were likely to be diagnosed with CIN2 (1216 [750] com-pared with 2061 [328] of other cases Pfrac14 008)

In anticipation of designing screening programs we consid-ered three distinct age ranges 18ndash24 years 25ndash49 years and 50thornyears with the intermediate age group of primary interest be-cause it coincides with the majority (130228) of CIN2-CIN3cases and higher sensitivity (127130 Plt 001 compared withyounger women) (Table 2) We estimated that a single auto-mated visual evaluation screening round targeting women atthe prime screening ages of 25ndash49 years could identify 557(127228) of precancers (CIN2CIN3AIS) diagnosed cumulatively

in the entire adult population To achieve this level of sensitiv-ity in the entire population by a single round of screeningwomen aged 25ndash49 years would require referring 110 (9828906) of the entire population (and 180 [9825460] of thoseaged 25ndash49 years) for treatment

Review of enrollment images from cases with discrepant hu-man vs automated visual evaluations (and a random tenth ofdiscrepant noncases) revealed that many of the automated vi-sual evaluation algorithm result-positivecervigram-negativecases of CIN2thorn (additional true positives) had suboptimalimages (eg poor focus or washed-out color during scanning offilm) or there were obstructing vaginal folds or blood No clearpatterns emerged from review of images with discrepant resultsamong women with ltCIN2

Table 1 Selected demographic features of the 9406 women in thisanalysis of the Guanacaste cohort and their enrollment screeningresults

Feature or test result No

Enrollment age y18ndash29 2598 (276)30ndash49 4357 (463)50ndash94 2451 (261)

Marital statusMarried 7157 (761)Separatedwidowed 508 (102)Single 1290 (137)

Age at first intercourse y16 3269 (348)17ndash18 2579 (274)19thorn 3550 (378)

Lifetime No of partners1 4934 (525)2ndash3 1947 (331)4thorn 1358 (144)

Lifetime No of pregnancies0 723 (77)1ndash2 2588 (275)3ndash4 2475 (263)5thorn 3620 (385)

Current smokerNo 8398 (893)Yes 1003 (107)

Ever use oral contraceptionYes 5677 (777)No 1634 (224)

Enrollment Pap resultNormal 7906 (841)ASC-US 4227 (84)LSIL 471 (50)High-grade 232 (25)

Enrollment cervicographyNegative 8176 (869)Atypical 868 (92)Low-grade 316 (34)High-gradecancer 46 (05)

Enrollment HPV resultNegative for carcinogenic types 6683 (711)Positive (not HPV16) 2314 (246)HPV16 393 (42)

Missing test results not shown but counted in totals such that percentages do

not sum to 100 ASC-US frac14 atypical squamous cells of undetermined signifi-

cance HPV frac14 human papillomavirus LSIL frac14 low-grade squamous intraepithelial

lesion

AR

TIC

LE

L Hu et al | 5

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Some trends emerged based on analysis of the heat mapshighlighting regions of the image that most influence the algo-rithm prediction (Supplementary Figures 3ndash7 available online)The model usually based its prediction primarily on the regionsurrounding the os and performed best when the cervix wasoriented directly towards the camera It was more affected byregions with visible texture than by smooth regions In generalthe model was not excited by glare from the camerarsquos light orother distractors although in some cases the presence of cottonswabs or other noncervix objects did distract the model

Analysis of the long-term follow-up of the cohort includinglinkage to the Costa Rican Tumor Registry revealed 39 cancersMost were in the training set 17 were not and had enrollmentseverity scores (Table 3) Women with cancer missed by auto-mated visual evaluation tended to be diagnosed by the tumorregistry merge (mainly older women diagnosed after 7 years ofactive follow-up)

The automated visual evaluation detection algorithm couldtheoretically be used to triage women testing positive for HPVrather than for primary screening Training another automatedvisual evaluation model restricted to HPV-positive women (datanot shown) a sensitivity of 100 (1313) was achieved with aspecificity of 575 (103179) Therefore primary HPV testing us-ing self-sampling if it matched the HPV test performance of theearly assays we used in 1993ndash1994 followed by the automatedvisual evaluation algorithm restricted to positives could achievethe same aggregate sensitivity while more than halving thenumber of women requiring cervical examinations

Discussion

Using a deep learning approach called Faster R-CNN with exten-sive image augmentation based on a pretrained model wetrained and validated an image analyzer that performsldquoautomated visual evaluationrdquo of the cervix As a primaryscreening method the algorithm trained using digitized cervi-grams from the Guanacaste Cohort achieved excellent sensitiv-ity for detection of CIN2thorn in the age group at highest risk ofprecancers The performance surpassed colposcopist evaluatorinterpretations of the same images (cervicography) and com-pared favorably to conventional Pap smears (and alternativekinds of cytology) while matching the screening accuracy of anearly version of PCR-based HPV testing The additional positiveevaluations among women that were negative for HPV need fur-ther study before they can be believed logically it is hard to ac-cept and there was an indication that the cases were anunusually young group with incident CIN2 suggesting somepossibility of misclassification of both visual appearance andoutcome

Restricted to the age group at which risk of precancer peaksto achieve nearly perfect sensitivity for cases occurring up to 7years after examination generated a large number of false posi-tives among screened noncases More balanced cutpoints forpositivity might be chosen to limit excessive treatments al-though sensitivity would drop Another possibility for improv-ing specificity while retaining high sensitivity might becombination screening (commonly called ldquocotestingrdquo) with

recnaC3NIC2NIC

0

025

050

075

100

Cas

e pr

obab

ility

Figure 3 Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial

neoplasia (CIN) 2 CIN3 or cancer These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm

Therefore we combined all into a single case group We show a box and whisker plot of 32 CIN2 38 CIN3 and seven cancers giving quartiles means and outliers of

case probability for each diagnosis within the cohort

AR

TIC

LE

6 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 3: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

Automated Visual Evaluation Algorithm to EvaluateCervical Images

Three controls per case were chosen from among women whodid not present with CIN2thorn during active surveillance fre-quency matched to time of diagnosis of case (enrollmentfol-low-up) A random approximately 70 (197279) of cases andfrequency-matched controls were chosen for training and theremaining approximately 30 were reserved as the initial vali-dation test set We randomly chose a single image from eachpair of images available per prediagnostic case visit and controlvisit Of note after the initial validation (SupplementaryMethods available online) the main analysis focused onimages taken at enrollment visits

The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online) The au-tomated visual evaluation detection algorithm performs twomain functions detecting (locating) the cervix within the inputimage and predicting the probability that the image representsa case of CIN2thorn The algorithm based on Faster R-CNN performsobject (cervix) detection feature extraction (calculating the fea-tures of the object) and classification (predicting the case prob-ability score) We explored several different techniques asdescribed in the Supplementary Methods (available online) andFaster R-CNN algorithm provided the best combination of speedand accuracy (27) The inputs to train the model were the cervi-gram images cervix location (rectangle encompassing the

cervix) and the class ground truth (case of CIN2thorn or controlltCIN2)

To create the cervix locator function of the automated visualevaluation algorithm we first trained an independent cervix lo-cator algorithm using Faster R-CNN to separate the cervix fromthe background This involved annotating 2000 images (manu-ally drawing a rectangle around the cervix) using a softwaretool shown in Supplementary Figure 2 (available online) devel-oped for this task By incorporating the trained locator functionthe only input required for the automated visual evaluation al-gorithm was the cervigram image The final model providedboth the predicted cervix location and the case probability

CNN-based methods are data driven and require large quan-tities of training data to perform well however our datasetfrom a population cohort study was limited to the smaller num-ber of cases that actually occurred To compensate we per-formed transfer learning (28) by first initializing the CNNarchitecture with pretrained weights from a model trained withthe ImageNet dataset (29) a database of millions of (noncervi-gram) images of all kinds of natural objects We then retrainedwith the cervigram images in the training set We also aug-mented the image data (30) artificially increasing the numberof cervical images by performing minor distortions to the origi-nal images including rotation mirroring sheering and gammatransformation We used the faster R-CNN end-to-end trainingconfiguration of stochastic gradient descent optimization withthe parameters in Supplementary Table 2 (available online)

Women included in analysis (N=9406)

Total CIN2+ cases N=279 Total ltCIN2 controls N=9127

Training set cases randomly selected (N=189)

Training set controls randomly selected (N=555)

CIN2+ cases not in training (N=90)Validaon set does not include eight cancer upgrades aer random drawScreening set included only enrollment images five cases did not have enrollment images available

Controls not used due to lack of enrollment image (N=378)

Training set is one image per woman (N=744 189 cases and 555 controls)

Screening set is enrollment image for women (N=8259 85 cases and 8174

controls)

Validaon and screening set control pool (N= 8194)

Validaon set used last image during follow-up (N=242)

Screening set used enrollment image note 20 validaon set

controls did not have an enrollment image (N=8174)

Validaon Set is one image per woman (N=324 82 cases and 242 controls)

Enrolled in Guanacaste Costa Rica cohort (N=10080)

Excluded (N=674)No Image collected (N=630)Mulple colpo sessions (N=31)Inadequate histology (N=13)

Analysis sets

Figure 1 Cervical images used for training and validation The images were drawn from the Proyecto Epidemiologico Guanacaste a longitudinal cohort study of human

papillomavirus infection other screening tests and risk of cervical precancercancer (1993ndash2001) The training and initial validation made use of the last images taken

prior to case diagnosis and approximately 31 controls frequency matched to cases on time of study (enrollmentfollow-up) The main analysis focused on images (ex-

cluding all images from women in the training set) from cohort enrollment and examined how the automated image analysis of enrollment screening images per-

formed in prediction of cases found over the course of the entire cohort study In an analysis that counted the absolute numbers of cases detected and controls

referred it was necessary to reweight that is to multiply the findings in the randomly selected validation test by the inverse of the 30 sampling fraction to estimate

numbers for all cases and matched controls (see Methods) CIN frac14 cervical intraepithelial neoplasia

AR

TIC

LE

L Hu et al | 3

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

These techniques are further described in the SupplementaryMethods (available online)

National Library of Medicine (National Institutes of HealthUSA) collaborators independently checked the technique Thisseries of confirmatory experiments was not designed to standalone rather it was meant to increase credibility of the originalfindings by replication outside of the inventing group They firstfollowed the architecture of the proposed method to evaluatethe performance of the cervix region locator and of the visualevaluation algorithm independently evaluating different sub-sets of the Guanacaste cohort images for model training thanthe original They also tested the trained network with testimages that were altered to represent a different camera zoomand moderate image resizing

To visualize what the algorithm bases its prediction on wealso implemented a system that generates a heat map showingwhich regions of the image most strongly influence the predic-tion This system is described in Supplementary Methods (avail-able online)

Statistical Analyses

The automated visual evaluation detection algorithm yields ascore (ranging from 00 to 10) that predicts whether the imagerepresents a case of histologic CIN2thorn Examining first the repro-ducibility of the scores we compared 322 replicate pairs ofimages included for this purpose in the validation set thePearson correlation within pairs was 097 (95 confidence

interval [CI]frac14 097 to 098) the very high agreement justified us-ing only one image per pair for subsequent analyses

We conducted the analyses presented here using cohort en-rollment images from the 30 of women in the initial validationset plus enrollment images from the ldquoleftoverrdquo women in theGuanacaste cohort with ltCIN2 not chosen for either the train-ing or initial validation set (Figure 1) The main validation analy-sis focused on images taken at cohort enrollment visitsevaluated accuracy of the automated visual evaluation algo-rithm compared with the originally performed baseline screen-ing tests (cervicography cytology) and HPV testing introducedfor validation but not used as a screening test at that time

We evaluated the accuracy of the automated visual evalua-tion detection algorithm for identification of CIN2thorn cases in thevalidation set using Receiver Operating Characteristic (ROC)curves and its summary statistic area under the curve (AUC)We created ROC-like curves for the ordinal categories of theoriginal screening tests and a ROC curve of the continuous auto-mated visual evaluation empirically by the trapezoid methodThese AUC values were tested for statistical significance againstthe AUC for automated visual evaluation by two-sided chi-squared tests (31) For analyses requiring a categorical positivenegative result for automated visual evaluation we chose thecutpoint in the continuous score distribution specific to thatage group in age-stratified analyses that maximized Youdenrsquosindex (sensitivity thorn specificity ndash 1) (32)

In a separate analysis to estimate the impact of a singlescreening round for the full Guanacaste cohort we filled in forimages from women used in the training set and therefore

Figure 2 The system architecture of the automated visual evaluation algorithm Two models are trained a cervix locator (top) and the automated visual evaluation de-

tection algorithm (bottom) The final validation algorithm incorporated both cervix locator and automated visual evaluation

AR

TIC

LE

4 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

excluded from the validation of the algorithm To do so wereweighted specifically because the initial validation set was arandom sample of the training validation group we could mul-tiply the results for women in the validation set by the inverseof the sampling fraction (30) to represent all women used fortraining and validation We then added the remaining noncasesto reconstitute the full cohort

Cases that resulted in differences between automated visualevaluation algorithm results and original evaluator interpretationof the same cervigram images were rereviewed without maskingby an expert gynecologic oncologist and colposcopist (ME) to noteany subjective patterns that might explain discrepancies

Results

The cohort study population enrolled in 1993ndash1994 (Table 1) wasa random sample of adult women in Guanacaste Province rang-ing in age from 18 to 94 years (median 35 years) The provincewas mixed rural and small town Most women were married(761) and parity was high (923) Smoking was unusual (107were current smokers) but oral contraceptive use (present orpast) was common (777 had ever used oral contraceptives)The prevalence of carcinogenic HPV types was less than 20and declined sharply with age

There were 241 histologically confirmed cases of precancer(CIN2CIN3) and 38 cases of cancer observed among 9406women followed for up to 7 years in the population-basedGuanacaste cohort Glandular lesions were very uncommonand categorized with squamous lesions of equivalent severity(eg rare AIS was included with CIN3) The mean automated vi-sual evaluation algorithm result scores at enrollment wereequivalent for CIN2 vs CIN3 (070 and 069 Pfrac14 51) and statisti-cally nonsignificantly higher for the uncommon cancers (080Pfrac14 42) (Figure 3) In subsequent analyses we combined CIN2CIN3 and cancer in a single case group

The AUC for automated visual evaluation of the enrollmentcervical images for women of all ages was 091 (95 CIfrac14 089 to093) (Figure 4) Automated visual evaluation was statisticallysignificantly more accurate than cervigram result (AUCfrac14 06995 CIfrac14 063 to 074 Plt 001)

As shown in Figure 4 automated visual evaluation perfor-mance was also statistically significantly more accurate thanconventional Pap smears (AUCfrac14 071 95 CIfrac14 065 to 077Plt 001) liquid-based cytology (AUCfrac14 079 95 CIfrac14 073 to 084Plt 001) first-generation neural network-based cytology(AUCfrac14 070 95 CIfrac14 063 to 076 Plt 001) and HPV testing(AUCfrac14 082 95 CIfrac14 077 to 087 Plt 001)

We examined all 16 discrepant findings that were algorithmpositiveHPV negative The additional positives by automatedvisual evaluation tended to occur among younger women(medianfrac14 265 Pfrac14 06 by Wilcoxon test compared with the restof the cases in the cohort whose median age was 35 years)They were likely to be diagnosed with CIN2 (1216 [750] com-pared with 2061 [328] of other cases Pfrac14 008)

In anticipation of designing screening programs we consid-ered three distinct age ranges 18ndash24 years 25ndash49 years and 50thornyears with the intermediate age group of primary interest be-cause it coincides with the majority (130228) of CIN2-CIN3cases and higher sensitivity (127130 Plt 001 compared withyounger women) (Table 2) We estimated that a single auto-mated visual evaluation screening round targeting women atthe prime screening ages of 25ndash49 years could identify 557(127228) of precancers (CIN2CIN3AIS) diagnosed cumulatively

in the entire adult population To achieve this level of sensitiv-ity in the entire population by a single round of screeningwomen aged 25ndash49 years would require referring 110 (9828906) of the entire population (and 180 [9825460] of thoseaged 25ndash49 years) for treatment

Review of enrollment images from cases with discrepant hu-man vs automated visual evaluations (and a random tenth ofdiscrepant noncases) revealed that many of the automated vi-sual evaluation algorithm result-positivecervigram-negativecases of CIN2thorn (additional true positives) had suboptimalimages (eg poor focus or washed-out color during scanning offilm) or there were obstructing vaginal folds or blood No clearpatterns emerged from review of images with discrepant resultsamong women with ltCIN2

Table 1 Selected demographic features of the 9406 women in thisanalysis of the Guanacaste cohort and their enrollment screeningresults

Feature or test result No

Enrollment age y18ndash29 2598 (276)30ndash49 4357 (463)50ndash94 2451 (261)

Marital statusMarried 7157 (761)Separatedwidowed 508 (102)Single 1290 (137)

Age at first intercourse y16 3269 (348)17ndash18 2579 (274)19thorn 3550 (378)

Lifetime No of partners1 4934 (525)2ndash3 1947 (331)4thorn 1358 (144)

Lifetime No of pregnancies0 723 (77)1ndash2 2588 (275)3ndash4 2475 (263)5thorn 3620 (385)

Current smokerNo 8398 (893)Yes 1003 (107)

Ever use oral contraceptionYes 5677 (777)No 1634 (224)

Enrollment Pap resultNormal 7906 (841)ASC-US 4227 (84)LSIL 471 (50)High-grade 232 (25)

Enrollment cervicographyNegative 8176 (869)Atypical 868 (92)Low-grade 316 (34)High-gradecancer 46 (05)

Enrollment HPV resultNegative for carcinogenic types 6683 (711)Positive (not HPV16) 2314 (246)HPV16 393 (42)

Missing test results not shown but counted in totals such that percentages do

not sum to 100 ASC-US frac14 atypical squamous cells of undetermined signifi-

cance HPV frac14 human papillomavirus LSIL frac14 low-grade squamous intraepithelial

lesion

AR

TIC

LE

L Hu et al | 5

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Some trends emerged based on analysis of the heat mapshighlighting regions of the image that most influence the algo-rithm prediction (Supplementary Figures 3ndash7 available online)The model usually based its prediction primarily on the regionsurrounding the os and performed best when the cervix wasoriented directly towards the camera It was more affected byregions with visible texture than by smooth regions In generalthe model was not excited by glare from the camerarsquos light orother distractors although in some cases the presence of cottonswabs or other noncervix objects did distract the model

Analysis of the long-term follow-up of the cohort includinglinkage to the Costa Rican Tumor Registry revealed 39 cancersMost were in the training set 17 were not and had enrollmentseverity scores (Table 3) Women with cancer missed by auto-mated visual evaluation tended to be diagnosed by the tumorregistry merge (mainly older women diagnosed after 7 years ofactive follow-up)

The automated visual evaluation detection algorithm couldtheoretically be used to triage women testing positive for HPVrather than for primary screening Training another automatedvisual evaluation model restricted to HPV-positive women (datanot shown) a sensitivity of 100 (1313) was achieved with aspecificity of 575 (103179) Therefore primary HPV testing us-ing self-sampling if it matched the HPV test performance of theearly assays we used in 1993ndash1994 followed by the automatedvisual evaluation algorithm restricted to positives could achievethe same aggregate sensitivity while more than halving thenumber of women requiring cervical examinations

Discussion

Using a deep learning approach called Faster R-CNN with exten-sive image augmentation based on a pretrained model wetrained and validated an image analyzer that performsldquoautomated visual evaluationrdquo of the cervix As a primaryscreening method the algorithm trained using digitized cervi-grams from the Guanacaste Cohort achieved excellent sensitiv-ity for detection of CIN2thorn in the age group at highest risk ofprecancers The performance surpassed colposcopist evaluatorinterpretations of the same images (cervicography) and com-pared favorably to conventional Pap smears (and alternativekinds of cytology) while matching the screening accuracy of anearly version of PCR-based HPV testing The additional positiveevaluations among women that were negative for HPV need fur-ther study before they can be believed logically it is hard to ac-cept and there was an indication that the cases were anunusually young group with incident CIN2 suggesting somepossibility of misclassification of both visual appearance andoutcome

Restricted to the age group at which risk of precancer peaksto achieve nearly perfect sensitivity for cases occurring up to 7years after examination generated a large number of false posi-tives among screened noncases More balanced cutpoints forpositivity might be chosen to limit excessive treatments al-though sensitivity would drop Another possibility for improv-ing specificity while retaining high sensitivity might becombination screening (commonly called ldquocotestingrdquo) with

recnaC3NIC2NIC

0

025

050

075

100

Cas

e pr

obab

ility

Figure 3 Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial

neoplasia (CIN) 2 CIN3 or cancer These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm

Therefore we combined all into a single case group We show a box and whisker plot of 32 CIN2 38 CIN3 and seven cancers giving quartiles means and outliers of

case probability for each diagnosis within the cohort

AR

TIC

LE

6 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 4: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

These techniques are further described in the SupplementaryMethods (available online)

National Library of Medicine (National Institutes of HealthUSA) collaborators independently checked the technique Thisseries of confirmatory experiments was not designed to standalone rather it was meant to increase credibility of the originalfindings by replication outside of the inventing group They firstfollowed the architecture of the proposed method to evaluatethe performance of the cervix region locator and of the visualevaluation algorithm independently evaluating different sub-sets of the Guanacaste cohort images for model training thanthe original They also tested the trained network with testimages that were altered to represent a different camera zoomand moderate image resizing

To visualize what the algorithm bases its prediction on wealso implemented a system that generates a heat map showingwhich regions of the image most strongly influence the predic-tion This system is described in Supplementary Methods (avail-able online)

Statistical Analyses

The automated visual evaluation detection algorithm yields ascore (ranging from 00 to 10) that predicts whether the imagerepresents a case of histologic CIN2thorn Examining first the repro-ducibility of the scores we compared 322 replicate pairs ofimages included for this purpose in the validation set thePearson correlation within pairs was 097 (95 confidence

interval [CI]frac14 097 to 098) the very high agreement justified us-ing only one image per pair for subsequent analyses

We conducted the analyses presented here using cohort en-rollment images from the 30 of women in the initial validationset plus enrollment images from the ldquoleftoverrdquo women in theGuanacaste cohort with ltCIN2 not chosen for either the train-ing or initial validation set (Figure 1) The main validation analy-sis focused on images taken at cohort enrollment visitsevaluated accuracy of the automated visual evaluation algo-rithm compared with the originally performed baseline screen-ing tests (cervicography cytology) and HPV testing introducedfor validation but not used as a screening test at that time

We evaluated the accuracy of the automated visual evalua-tion detection algorithm for identification of CIN2thorn cases in thevalidation set using Receiver Operating Characteristic (ROC)curves and its summary statistic area under the curve (AUC)We created ROC-like curves for the ordinal categories of theoriginal screening tests and a ROC curve of the continuous auto-mated visual evaluation empirically by the trapezoid methodThese AUC values were tested for statistical significance againstthe AUC for automated visual evaluation by two-sided chi-squared tests (31) For analyses requiring a categorical positivenegative result for automated visual evaluation we chose thecutpoint in the continuous score distribution specific to thatage group in age-stratified analyses that maximized Youdenrsquosindex (sensitivity thorn specificity ndash 1) (32)

In a separate analysis to estimate the impact of a singlescreening round for the full Guanacaste cohort we filled in forimages from women used in the training set and therefore

Figure 2 The system architecture of the automated visual evaluation algorithm Two models are trained a cervix locator (top) and the automated visual evaluation de-

tection algorithm (bottom) The final validation algorithm incorporated both cervix locator and automated visual evaluation

AR

TIC

LE

4 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

excluded from the validation of the algorithm To do so wereweighted specifically because the initial validation set was arandom sample of the training validation group we could mul-tiply the results for women in the validation set by the inverseof the sampling fraction (30) to represent all women used fortraining and validation We then added the remaining noncasesto reconstitute the full cohort

Cases that resulted in differences between automated visualevaluation algorithm results and original evaluator interpretationof the same cervigram images were rereviewed without maskingby an expert gynecologic oncologist and colposcopist (ME) to noteany subjective patterns that might explain discrepancies

Results

The cohort study population enrolled in 1993ndash1994 (Table 1) wasa random sample of adult women in Guanacaste Province rang-ing in age from 18 to 94 years (median 35 years) The provincewas mixed rural and small town Most women were married(761) and parity was high (923) Smoking was unusual (107were current smokers) but oral contraceptive use (present orpast) was common (777 had ever used oral contraceptives)The prevalence of carcinogenic HPV types was less than 20and declined sharply with age

There were 241 histologically confirmed cases of precancer(CIN2CIN3) and 38 cases of cancer observed among 9406women followed for up to 7 years in the population-basedGuanacaste cohort Glandular lesions were very uncommonand categorized with squamous lesions of equivalent severity(eg rare AIS was included with CIN3) The mean automated vi-sual evaluation algorithm result scores at enrollment wereequivalent for CIN2 vs CIN3 (070 and 069 Pfrac14 51) and statisti-cally nonsignificantly higher for the uncommon cancers (080Pfrac14 42) (Figure 3) In subsequent analyses we combined CIN2CIN3 and cancer in a single case group

The AUC for automated visual evaluation of the enrollmentcervical images for women of all ages was 091 (95 CIfrac14 089 to093) (Figure 4) Automated visual evaluation was statisticallysignificantly more accurate than cervigram result (AUCfrac14 06995 CIfrac14 063 to 074 Plt 001)

As shown in Figure 4 automated visual evaluation perfor-mance was also statistically significantly more accurate thanconventional Pap smears (AUCfrac14 071 95 CIfrac14 065 to 077Plt 001) liquid-based cytology (AUCfrac14 079 95 CIfrac14 073 to 084Plt 001) first-generation neural network-based cytology(AUCfrac14 070 95 CIfrac14 063 to 076 Plt 001) and HPV testing(AUCfrac14 082 95 CIfrac14 077 to 087 Plt 001)

We examined all 16 discrepant findings that were algorithmpositiveHPV negative The additional positives by automatedvisual evaluation tended to occur among younger women(medianfrac14 265 Pfrac14 06 by Wilcoxon test compared with the restof the cases in the cohort whose median age was 35 years)They were likely to be diagnosed with CIN2 (1216 [750] com-pared with 2061 [328] of other cases Pfrac14 008)

In anticipation of designing screening programs we consid-ered three distinct age ranges 18ndash24 years 25ndash49 years and 50thornyears with the intermediate age group of primary interest be-cause it coincides with the majority (130228) of CIN2-CIN3cases and higher sensitivity (127130 Plt 001 compared withyounger women) (Table 2) We estimated that a single auto-mated visual evaluation screening round targeting women atthe prime screening ages of 25ndash49 years could identify 557(127228) of precancers (CIN2CIN3AIS) diagnosed cumulatively

in the entire adult population To achieve this level of sensitiv-ity in the entire population by a single round of screeningwomen aged 25ndash49 years would require referring 110 (9828906) of the entire population (and 180 [9825460] of thoseaged 25ndash49 years) for treatment

Review of enrollment images from cases with discrepant hu-man vs automated visual evaluations (and a random tenth ofdiscrepant noncases) revealed that many of the automated vi-sual evaluation algorithm result-positivecervigram-negativecases of CIN2thorn (additional true positives) had suboptimalimages (eg poor focus or washed-out color during scanning offilm) or there were obstructing vaginal folds or blood No clearpatterns emerged from review of images with discrepant resultsamong women with ltCIN2

Table 1 Selected demographic features of the 9406 women in thisanalysis of the Guanacaste cohort and their enrollment screeningresults

Feature or test result No

Enrollment age y18ndash29 2598 (276)30ndash49 4357 (463)50ndash94 2451 (261)

Marital statusMarried 7157 (761)Separatedwidowed 508 (102)Single 1290 (137)

Age at first intercourse y16 3269 (348)17ndash18 2579 (274)19thorn 3550 (378)

Lifetime No of partners1 4934 (525)2ndash3 1947 (331)4thorn 1358 (144)

Lifetime No of pregnancies0 723 (77)1ndash2 2588 (275)3ndash4 2475 (263)5thorn 3620 (385)

Current smokerNo 8398 (893)Yes 1003 (107)

Ever use oral contraceptionYes 5677 (777)No 1634 (224)

Enrollment Pap resultNormal 7906 (841)ASC-US 4227 (84)LSIL 471 (50)High-grade 232 (25)

Enrollment cervicographyNegative 8176 (869)Atypical 868 (92)Low-grade 316 (34)High-gradecancer 46 (05)

Enrollment HPV resultNegative for carcinogenic types 6683 (711)Positive (not HPV16) 2314 (246)HPV16 393 (42)

Missing test results not shown but counted in totals such that percentages do

not sum to 100 ASC-US frac14 atypical squamous cells of undetermined signifi-

cance HPV frac14 human papillomavirus LSIL frac14 low-grade squamous intraepithelial

lesion

AR

TIC

LE

L Hu et al | 5

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Some trends emerged based on analysis of the heat mapshighlighting regions of the image that most influence the algo-rithm prediction (Supplementary Figures 3ndash7 available online)The model usually based its prediction primarily on the regionsurrounding the os and performed best when the cervix wasoriented directly towards the camera It was more affected byregions with visible texture than by smooth regions In generalthe model was not excited by glare from the camerarsquos light orother distractors although in some cases the presence of cottonswabs or other noncervix objects did distract the model

Analysis of the long-term follow-up of the cohort includinglinkage to the Costa Rican Tumor Registry revealed 39 cancersMost were in the training set 17 were not and had enrollmentseverity scores (Table 3) Women with cancer missed by auto-mated visual evaluation tended to be diagnosed by the tumorregistry merge (mainly older women diagnosed after 7 years ofactive follow-up)

The automated visual evaluation detection algorithm couldtheoretically be used to triage women testing positive for HPVrather than for primary screening Training another automatedvisual evaluation model restricted to HPV-positive women (datanot shown) a sensitivity of 100 (1313) was achieved with aspecificity of 575 (103179) Therefore primary HPV testing us-ing self-sampling if it matched the HPV test performance of theearly assays we used in 1993ndash1994 followed by the automatedvisual evaluation algorithm restricted to positives could achievethe same aggregate sensitivity while more than halving thenumber of women requiring cervical examinations

Discussion

Using a deep learning approach called Faster R-CNN with exten-sive image augmentation based on a pretrained model wetrained and validated an image analyzer that performsldquoautomated visual evaluationrdquo of the cervix As a primaryscreening method the algorithm trained using digitized cervi-grams from the Guanacaste Cohort achieved excellent sensitiv-ity for detection of CIN2thorn in the age group at highest risk ofprecancers The performance surpassed colposcopist evaluatorinterpretations of the same images (cervicography) and com-pared favorably to conventional Pap smears (and alternativekinds of cytology) while matching the screening accuracy of anearly version of PCR-based HPV testing The additional positiveevaluations among women that were negative for HPV need fur-ther study before they can be believed logically it is hard to ac-cept and there was an indication that the cases were anunusually young group with incident CIN2 suggesting somepossibility of misclassification of both visual appearance andoutcome

Restricted to the age group at which risk of precancer peaksto achieve nearly perfect sensitivity for cases occurring up to 7years after examination generated a large number of false posi-tives among screened noncases More balanced cutpoints forpositivity might be chosen to limit excessive treatments al-though sensitivity would drop Another possibility for improv-ing specificity while retaining high sensitivity might becombination screening (commonly called ldquocotestingrdquo) with

recnaC3NIC2NIC

0

025

050

075

100

Cas

e pr

obab

ility

Figure 3 Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial

neoplasia (CIN) 2 CIN3 or cancer These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm

Therefore we combined all into a single case group We show a box and whisker plot of 32 CIN2 38 CIN3 and seven cancers giving quartiles means and outliers of

case probability for each diagnosis within the cohort

AR

TIC

LE

6 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 5: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

excluded from the validation of the algorithm To do so wereweighted specifically because the initial validation set was arandom sample of the training validation group we could mul-tiply the results for women in the validation set by the inverseof the sampling fraction (30) to represent all women used fortraining and validation We then added the remaining noncasesto reconstitute the full cohort

Cases that resulted in differences between automated visualevaluation algorithm results and original evaluator interpretationof the same cervigram images were rereviewed without maskingby an expert gynecologic oncologist and colposcopist (ME) to noteany subjective patterns that might explain discrepancies

Results

The cohort study population enrolled in 1993ndash1994 (Table 1) wasa random sample of adult women in Guanacaste Province rang-ing in age from 18 to 94 years (median 35 years) The provincewas mixed rural and small town Most women were married(761) and parity was high (923) Smoking was unusual (107were current smokers) but oral contraceptive use (present orpast) was common (777 had ever used oral contraceptives)The prevalence of carcinogenic HPV types was less than 20and declined sharply with age

There were 241 histologically confirmed cases of precancer(CIN2CIN3) and 38 cases of cancer observed among 9406women followed for up to 7 years in the population-basedGuanacaste cohort Glandular lesions were very uncommonand categorized with squamous lesions of equivalent severity(eg rare AIS was included with CIN3) The mean automated vi-sual evaluation algorithm result scores at enrollment wereequivalent for CIN2 vs CIN3 (070 and 069 Pfrac14 51) and statisti-cally nonsignificantly higher for the uncommon cancers (080Pfrac14 42) (Figure 3) In subsequent analyses we combined CIN2CIN3 and cancer in a single case group

The AUC for automated visual evaluation of the enrollmentcervical images for women of all ages was 091 (95 CIfrac14 089 to093) (Figure 4) Automated visual evaluation was statisticallysignificantly more accurate than cervigram result (AUCfrac14 06995 CIfrac14 063 to 074 Plt 001)

As shown in Figure 4 automated visual evaluation perfor-mance was also statistically significantly more accurate thanconventional Pap smears (AUCfrac14 071 95 CIfrac14 065 to 077Plt 001) liquid-based cytology (AUCfrac14 079 95 CIfrac14 073 to 084Plt 001) first-generation neural network-based cytology(AUCfrac14 070 95 CIfrac14 063 to 076 Plt 001) and HPV testing(AUCfrac14 082 95 CIfrac14 077 to 087 Plt 001)

We examined all 16 discrepant findings that were algorithmpositiveHPV negative The additional positives by automatedvisual evaluation tended to occur among younger women(medianfrac14 265 Pfrac14 06 by Wilcoxon test compared with the restof the cases in the cohort whose median age was 35 years)They were likely to be diagnosed with CIN2 (1216 [750] com-pared with 2061 [328] of other cases Pfrac14 008)

In anticipation of designing screening programs we consid-ered three distinct age ranges 18ndash24 years 25ndash49 years and 50thornyears with the intermediate age group of primary interest be-cause it coincides with the majority (130228) of CIN2-CIN3cases and higher sensitivity (127130 Plt 001 compared withyounger women) (Table 2) We estimated that a single auto-mated visual evaluation screening round targeting women atthe prime screening ages of 25ndash49 years could identify 557(127228) of precancers (CIN2CIN3AIS) diagnosed cumulatively

in the entire adult population To achieve this level of sensitiv-ity in the entire population by a single round of screeningwomen aged 25ndash49 years would require referring 110 (9828906) of the entire population (and 180 [9825460] of thoseaged 25ndash49 years) for treatment

Review of enrollment images from cases with discrepant hu-man vs automated visual evaluations (and a random tenth ofdiscrepant noncases) revealed that many of the automated vi-sual evaluation algorithm result-positivecervigram-negativecases of CIN2thorn (additional true positives) had suboptimalimages (eg poor focus or washed-out color during scanning offilm) or there were obstructing vaginal folds or blood No clearpatterns emerged from review of images with discrepant resultsamong women with ltCIN2

Table 1 Selected demographic features of the 9406 women in thisanalysis of the Guanacaste cohort and their enrollment screeningresults

Feature or test result No

Enrollment age y18ndash29 2598 (276)30ndash49 4357 (463)50ndash94 2451 (261)

Marital statusMarried 7157 (761)Separatedwidowed 508 (102)Single 1290 (137)

Age at first intercourse y16 3269 (348)17ndash18 2579 (274)19thorn 3550 (378)

Lifetime No of partners1 4934 (525)2ndash3 1947 (331)4thorn 1358 (144)

Lifetime No of pregnancies0 723 (77)1ndash2 2588 (275)3ndash4 2475 (263)5thorn 3620 (385)

Current smokerNo 8398 (893)Yes 1003 (107)

Ever use oral contraceptionYes 5677 (777)No 1634 (224)

Enrollment Pap resultNormal 7906 (841)ASC-US 4227 (84)LSIL 471 (50)High-grade 232 (25)

Enrollment cervicographyNegative 8176 (869)Atypical 868 (92)Low-grade 316 (34)High-gradecancer 46 (05)

Enrollment HPV resultNegative for carcinogenic types 6683 (711)Positive (not HPV16) 2314 (246)HPV16 393 (42)

Missing test results not shown but counted in totals such that percentages do

not sum to 100 ASC-US frac14 atypical squamous cells of undetermined signifi-

cance HPV frac14 human papillomavirus LSIL frac14 low-grade squamous intraepithelial

lesion

AR

TIC

LE

L Hu et al | 5

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Some trends emerged based on analysis of the heat mapshighlighting regions of the image that most influence the algo-rithm prediction (Supplementary Figures 3ndash7 available online)The model usually based its prediction primarily on the regionsurrounding the os and performed best when the cervix wasoriented directly towards the camera It was more affected byregions with visible texture than by smooth regions In generalthe model was not excited by glare from the camerarsquos light orother distractors although in some cases the presence of cottonswabs or other noncervix objects did distract the model

Analysis of the long-term follow-up of the cohort includinglinkage to the Costa Rican Tumor Registry revealed 39 cancersMost were in the training set 17 were not and had enrollmentseverity scores (Table 3) Women with cancer missed by auto-mated visual evaluation tended to be diagnosed by the tumorregistry merge (mainly older women diagnosed after 7 years ofactive follow-up)

The automated visual evaluation detection algorithm couldtheoretically be used to triage women testing positive for HPVrather than for primary screening Training another automatedvisual evaluation model restricted to HPV-positive women (datanot shown) a sensitivity of 100 (1313) was achieved with aspecificity of 575 (103179) Therefore primary HPV testing us-ing self-sampling if it matched the HPV test performance of theearly assays we used in 1993ndash1994 followed by the automatedvisual evaluation algorithm restricted to positives could achievethe same aggregate sensitivity while more than halving thenumber of women requiring cervical examinations

Discussion

Using a deep learning approach called Faster R-CNN with exten-sive image augmentation based on a pretrained model wetrained and validated an image analyzer that performsldquoautomated visual evaluationrdquo of the cervix As a primaryscreening method the algorithm trained using digitized cervi-grams from the Guanacaste Cohort achieved excellent sensitiv-ity for detection of CIN2thorn in the age group at highest risk ofprecancers The performance surpassed colposcopist evaluatorinterpretations of the same images (cervicography) and com-pared favorably to conventional Pap smears (and alternativekinds of cytology) while matching the screening accuracy of anearly version of PCR-based HPV testing The additional positiveevaluations among women that were negative for HPV need fur-ther study before they can be believed logically it is hard to ac-cept and there was an indication that the cases were anunusually young group with incident CIN2 suggesting somepossibility of misclassification of both visual appearance andoutcome

Restricted to the age group at which risk of precancer peaksto achieve nearly perfect sensitivity for cases occurring up to 7years after examination generated a large number of false posi-tives among screened noncases More balanced cutpoints forpositivity might be chosen to limit excessive treatments al-though sensitivity would drop Another possibility for improv-ing specificity while retaining high sensitivity might becombination screening (commonly called ldquocotestingrdquo) with

recnaC3NIC2NIC

0

025

050

075

100

Cas

e pr

obab

ility

Figure 3 Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial

neoplasia (CIN) 2 CIN3 or cancer These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm

Therefore we combined all into a single case group We show a box and whisker plot of 32 CIN2 38 CIN3 and seven cancers giving quartiles means and outliers of

case probability for each diagnosis within the cohort

AR

TIC

LE

6 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 6: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

Some trends emerged based on analysis of the heat mapshighlighting regions of the image that most influence the algo-rithm prediction (Supplementary Figures 3ndash7 available online)The model usually based its prediction primarily on the regionsurrounding the os and performed best when the cervix wasoriented directly towards the camera It was more affected byregions with visible texture than by smooth regions In generalthe model was not excited by glare from the camerarsquos light orother distractors although in some cases the presence of cottonswabs or other noncervix objects did distract the model

Analysis of the long-term follow-up of the cohort includinglinkage to the Costa Rican Tumor Registry revealed 39 cancersMost were in the training set 17 were not and had enrollmentseverity scores (Table 3) Women with cancer missed by auto-mated visual evaluation tended to be diagnosed by the tumorregistry merge (mainly older women diagnosed after 7 years ofactive follow-up)

The automated visual evaluation detection algorithm couldtheoretically be used to triage women testing positive for HPVrather than for primary screening Training another automatedvisual evaluation model restricted to HPV-positive women (datanot shown) a sensitivity of 100 (1313) was achieved with aspecificity of 575 (103179) Therefore primary HPV testing us-ing self-sampling if it matched the HPV test performance of theearly assays we used in 1993ndash1994 followed by the automatedvisual evaluation algorithm restricted to positives could achievethe same aggregate sensitivity while more than halving thenumber of women requiring cervical examinations

Discussion

Using a deep learning approach called Faster R-CNN with exten-sive image augmentation based on a pretrained model wetrained and validated an image analyzer that performsldquoautomated visual evaluationrdquo of the cervix As a primaryscreening method the algorithm trained using digitized cervi-grams from the Guanacaste Cohort achieved excellent sensitiv-ity for detection of CIN2thorn in the age group at highest risk ofprecancers The performance surpassed colposcopist evaluatorinterpretations of the same images (cervicography) and com-pared favorably to conventional Pap smears (and alternativekinds of cytology) while matching the screening accuracy of anearly version of PCR-based HPV testing The additional positiveevaluations among women that were negative for HPV need fur-ther study before they can be believed logically it is hard to ac-cept and there was an indication that the cases were anunusually young group with incident CIN2 suggesting somepossibility of misclassification of both visual appearance andoutcome

Restricted to the age group at which risk of precancer peaksto achieve nearly perfect sensitivity for cases occurring up to 7years after examination generated a large number of false posi-tives among screened noncases More balanced cutpoints forpositivity might be chosen to limit excessive treatments al-though sensitivity would drop Another possibility for improv-ing specificity while retaining high sensitivity might becombination screening (commonly called ldquocotestingrdquo) with

recnaC3NIC2NIC

0

025

050

075

100

Cas

e pr

obab

ility

Figure 3 Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial

neoplasia (CIN) 2 CIN3 or cancer These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm

Therefore we combined all into a single case group We show a box and whisker plot of 32 CIN2 38 CIN3 and seven cancers giving quartiles means and outliers of

case probability for each diagnosis within the cohort

AR

TIC

LE

6 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 7: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

automated visual evaluation and HPV testing when such testsbecome more affordable and more widely available than theycurrently are

To permit widespread use of automated visual evaluationdetection algorithms the method would need to be transferredfrom digitized cervigrams to contemporary digital cameras be-cause cervicography is obsolete and discontinued If the transferis achieved health workers would still need to visualize the cer-vix and take well-lit in-focus images for analysis with the

cervix in full view The minimal required equipment (in addi-tion to the algorithm software) would be acetic acid (vinegar)disposable specula (or sterilization equipment) and the imagingsystem such as a dedicated smart phone or digital camera

Images that resulted in discrepancies between the auto-mated visual evaluation algorithm and clinician interpretationrevealed that the performance was affected by image qualityand obstructions as well as traditional discordant evaluationsdue to human subjectivity So rather than focusing on training

Figure 4 Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical

intraepithelial neoplasia 2thorn ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation

(two-sided chi-squared tests) The thresholds are listed on each curve showing the sensitivity and 1-specificity applicable to that cutpoint Automated visual evalua-

tion was as accurate or more than all of the screening tests used in the cohort study including A) automated visual evaluation B) cervicography area under the curve

(AUC) C) conventional Pap smear D) liquid-based cytology E) first-generation neural network-based cytology and F) MY09-MY11 PCR-based human papillomavirus

(HPV) testing ASC-US frac14 atypical squamous cells of undetermined significance HSIL frac14 high-grade squamous intraepithelial lesion LSIL frac14 low-grade squamous intrae-

pithelial lesion

AR

TIC

LE

L Hu et al | 7

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 8: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

Table 2 Estimated comparison of automated visual evaluation performance by age groups

Automated visual evaluation by age CIN2thorn No ltCIN2 No Total No Age-specific sensitivity Age-specific specificity

lt25 ythorn 46 226 272 821 772 10 765 775Age-specific total 56 991 1047

25ndash49 ythorn 127 855 982 977 840 3 4475 4478Age-specific total 130 5330 5460

50thorn ythorn 39 399 438 929 832 3 1969 1972Age-specific total 42 2368 2410

Total 228 8689 8917 mdash mdash

Table 3 Severity scores from automatic visual evaluation algorithm and screening results from enrollment visit of the Guanacaste cohortstudy for cases of invasive cancer

Years to diagnosis Enrollment age y HPV type result Pap smear result Cervigram result Algorithm severity score

Enrollment 21 16 52 Cancer Cancer TrainingEnrollment 26 16 18 51 Normal High-grade TrainingEnrollment 29 18 31 HSIL (CIN2) High-grade TrainingEnrollment 34 16 Microinvasive High-grade TrainingEnrollment 35 31 45 Microinvasive Cancer 098Enrollment 38 16 HSIL (CIN2) Cancer TrainingEnrollment 41 16 ASC-US Cancer 078Enrollment 47 18 Microinvasive Cancer TrainingEnrollment 54 16 Microinvasive Ccancer TrainingEnrollment 71 53 82v Microinvasive Negative 013Enrollment 73 33 Microinvasive Cancer 096Enrollment 74 35 HSIL (CIN3) Negative 035Enrollment 42 Equivocal Microinvasive Cancer 0841 y 37 18 Normal Negative Training1 y 61 Negative Normal Negative Training2 y 23 16 ASC-US Low-grade 0872 y 64 45 51 58 Normal Negative 0303 y 49 16 ASC-US Negative Training5 y 29 18 Normal Negative Training5 y 36 16 Normal Negative 0716 y 38 Negative HSIL (CIN2) Negative Training6 y 45 31 Normal Missing Missing6 y 48 Negative Normal Negative Training6 y 66 56 Microinvasive Negative 0387 y 50 Negative Normal Negative Training7 y 63 Negative Normal Negative Training8 y 35 Negative Normal Negative 0028 y 52 16 Normal Negative 0708 y 63 16 Normal Negative Training8 y 81 Negative Normal Negative 0349 y 28 18 Normal Negative Training10 y 65 16 HSIL (CIN3) Negative Training10 y 75 85 Microinvasive Negative Training11 y 75 Negative Normal Negative 01612 y 68 Negative Normal Negative 00213 y 56 Negative Normal Negative 07515 y 78 Missing Missing Atypical Training16 y 52 Negative Normal Negative Training17 y 21 Negative Inadequate Atypical 004

ASC-US frac14 atypical squamous cells of undetermined significance CIN2frac14 cervical intraepithelial neoplasia 2 CIN3frac14 cervical intraepithelial neoplasia 3 HSIL frac14 high-

grade squamous intraepithelial lesion

AR

TIC

LE

8 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 9: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

for the subtleties of visual appearance which is a difficult skillthe training for automated visual evaluation could highlightmore easily acquired skills of improving image quality andlighting and removing obstructions

We also anticipated a second approach to using the auto-mated visual evaluation detection algorithm specifically as atriage method restricted to women testing HPV positive if HPVtesting is the primary screen HPV testing offers the possibilityof cervicovaginal self-sampling (33) with only a small minorityof women testing HPV positive and requiring gynecologicexaminations Moreover HPV testing has proven very long-term negative predictive value that is reassurance when nega-tive (3435) The combination of self-sampled HPV screeningand triage use of automated visual evaluation detection couldgreatly reduce the need for speculum examinations Choice ofthe automated visual evaluation algorithm screening strategy(general screening vs triage of HPV positive women) will dependon setting and cost-effectiveness analyses Currently HPV testsstill take several hours and cost too much for many settings butlow-cost point-of-care and batch testing methods applicable toself-sampling are in advanced research and development andlikely will be available within a very few years

It is a strength of this study that it was conducted in a ran-dom population sample of a previously poorly screened popula-tion that resembles HPV-positive women in other low- andmiddle-income country (LMIC) target screening populationsDefinition of precancer was very good based on repeatedscreening during extended follow-up and panel review ofhistopathology

The major limitations are the following We studied a rela-tively small numbers of cases from a single cohort study Weincluded CIN2 cases in the case group whereas it would beideal to train on more definite cases of precancer (ie CIN3and AIS caused by types of HPV prevalent in invasive cervicalcancers) The images were captured by a small team of highlytrained nurses and not a wide variety of health workers Thework relied on images taken with a discontinued film cameratechnique rather than contemporary digital imagetechnology

Nonetheless the proof-of-principle strongly supports fur-ther evaluation of automated visual evaluation Rather thanresurrecting obsolete film camera technology to achieve the ob-served results we are currently working to transfer automatedvisual evaluation to images from contemporary phone camerasand other digital image capture devices to create an accurateand affordable point-of-care screening method that would sup-port the recently announced World Health Organization initia-tive to accelerate cervical cancer control

Our findings extend and improve upon the results obtainedin more preliminary reports listed in the Supplementary Table 1(available online) Internationally a sizable number of commer-cial and academic research groups are now exploring deeplearning algorithms for cervical screening In collaborative workmoving ahead we will be considering the following topicsassessing the international variability in cervical appearance(eg inflammatory changes) and need for region-specific train-ing examining which camera characteristics affect the algo-rithm and which devices are adequate for image capturetraining the camera to take an image automatically when a fo-cused adequately lit and visualized cervix is visible training analgorithm by segmentation studies focused on the squamoco-lumnar junction to assess when ablational therapy is possibleor whether (mainly endocervical) extent of lesion suggests needfor excision and evaluating how best to combine automated

visual evaluation with HPV testing and eventually with vacci-nation to accelerate control of cervical cancer

Funding

This work was supported by the National Institutes ofHealth NCI and National Library of Medicine IntramuralResearch Programs and the Global Good Fund

Notes

Affiliations of authors Intellectual Ventures Global Good FundBellevue WA (LH DB MPH NG BW MSJ) National Library ofMedicine NIH Bethesda MD (SA ZX LRL) Division of CancerEpidemiology and Genetics National Cancer Institute NIHBethesda MD (KY MD JCG NW MS) Information ManagementServices Calverton MD (BB) Early Detection and PreventionSection International Agency for Research on Cancer LyonFrance (RH) Rutgers New Jersey Medical School Newark NJ(MHE) Albert Einstein College of Medicine Bronx NY (RDB)Consultant to National Cancer Institute NIH Bethesda MD(ACR)

The funding sources had no role in the design of the studyinterpretation of the data the collection analysis and interpre-tation of the data or the writing of the manuscript

No author reports a conflict of interest with regard to thisresearch

We acknowledge the expert advice of Drs Emily Wu andCorey Casper who helped define the clinical outcomes in theinitial stage of the project

References1 de Martel C Plummer M Vignat J Franceschi S Worldwide burden of cancer

attributable to HPV by site country and HPV type Int J Cancer 2017141(4)664ndash670

2 Bray F Jemal A Grey N Ferlay J Forman D Global cancer transitions accord-ing to the Human Development Index (2008-2030) a population-based studyLancet Oncol 201213(8)790ndash801

3 Schiffman M Doorbar J Wentzensen N et al Carcinogenic human papillo-mavirus infection Nat Rev Dis Primers 20162(2)16086

4 Bagcchi S India launches plan for national cancer screening programmeBMJ 2016355i5574

5 WHO Guidelines Approved by the Guidelines Review Committee WHOGuidelines for Screening and Treatment of Precancerous Lesions for Cervical CancerPrevention Geneva World Health Organization 2013

6 PEPFAR 2019 Country Operational Plan Guidance for all PEPFAR Countries(Draft) Renewed Partnership to Help End AIDS and Cervical Cancer in Africahttpswwwpepfargovdocumentsorganization288160pdfpage=318

7 Shastri SS Mittra I Mishra GA et al Effect of VIA screening by primary healthworkers randomized controlled study in Mumbai India J Natl Cancer Inst2014106(3)dju009

8 Denny L Kuhn L De Souza M Pollack AE Dupree W Wright TC Jr Screen-and-treat approaches for cervical cancer prevention in low-resource settingsa randomized controlled trial JAMA 2005294(17)2173ndash2181

9 Sankaranarayanan R Nene BM Shastri SS et al HPV screening for cervicalcancer in rural India N Engl J Med 2009360(14)1385ndash1394

10 Catarino R Schafer S Vassilakos P Petignat P Arbyn M Accuracy of combi-nations of visual inspection using acetic acid or lugol iodine to detect cervicalprecancer a meta-analysis BJOG 2017125(5)545ndash553

11 Jeronimo J Schiffman M Colposcopy at a crossroads Am J Obstet Gynecol2006195(2)349ndash353

12 Perkins MJ RB Smith KM Gage JCet al ASCCP Colposcopy Standards Risk-Based Colposcopy Practice J Low Genit Tract Dis 2017 Oct21(4)230ndash234 doi101097LGT0000000000000334 PubMed PMID 28953111

13 Ting DSW Cheung CY Lim G et al Development and validation of a deeplearning system for diabetic retinopathy and related eye diseases using reti-nal images from multiethnic populations with diabetes JAMA 2017318(22)2211ndash2223

14 Xu T Zhang H Xin C et al Multi-feature based benchmark for cervical dys-plasia classification evaluation Pattern Recogn 201763468ndash475

AR

TIC

LE

L Hu et al | 9

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2
Page 10: An Observational Study of Deep Learning and Automated ...The system architecture is summarized in Figure 2 and de-tailed in the Supplementary Methods (available online). The au-tomated

15 Song D Kim E Huang X et al Multimodal entity coreference for cervical dys-plasia diagnosis IEEE Trans Med Imaging 201534(1)229ndash245

16 Ren SQ He KM Girshick R Sun J Faster R-CNN towards real-time object de-tection with region proposal networks IEEE Trans Pattern Anal Mach Intell201739(6)1137ndash1149

17 Bratti MC Rodriguez AC Schiffman M et al Description of a seven-year pro-spective study of human papillomavirus infection and cervical neoplasiaamong 10000 women in Guanacaste Costa Rica Rev Panam Salud Publica200415(2)75ndash89

18 Herrero R Schiffman MH Bratti C et al Design and methods of apopulation-based natural history study of cervical neoplasia in a rural prov-ince of Costa Rica the Guanacaste Project Rev Panam Salud Publica 19971(5)362ndash375

19 Rodriguez AC Avila C Herrero R et al Cervical cancer incidence after screen-ing with HPV cytology and visual methods 18-year follow-up of theGuanacaste cohort Int J Cancer 2017140(8)1926ndash1934

20 Schneider DL Herrero R Bratti C et al Cervicography screening for cervicalcancer among 8460 women in a high-risk population Am J Obstet Gynecol1999180(2)290ndash298

21 Schneider DL Burke L Wright TC et al Can cervicography be improved Anevaluation with arbitrated cervicography interpretations Am J Obstet Gynecol2002187(1)15ndash23

22 Jeronimo J Long LR Neve L Michael B Antani S Schiffman M Digital toolsfor collecting data from cervigrams for research and training in colposcopy JLow Genit Tract Dis 200610(1)16ndash25

23 Hutchinson ML Zahniser DJ Sherman ME et al Utility of liquid-based cytol-ogy for cervical carcinoma screening results of a population-based studyconducted in a region of Costa Rica with a high incidence of cervical carci-noma Cancer 199987(2)48ndash55

24 Sherman ME Schiffman M Herrero R et al Performance of a semiautomatedPapanicolaou smear screening system results of a population-based studyconducted in Guanacaste Costa Rica Cancer 199884(5)273ndash280

25 Castle PE Schiffman M Gravitt PE et al Comparisons of HPV DNA detectionby MY0911 PCR methods J Med Virol 200268(3)417ndash423

26 Bouvard V Baan R Straif K et al A review of human carcinogensmdashPart B bio-logical agents Lancet Oncol 200910(4)321ndash322

27 He H Garcia EA Learning from imbalanced data IEEE Trans Knowl Data Eng200921(9)1263ndash1284

28 Yosinski J Clune J Bengio Y Lipson H How transferable are features indeep neural networks In Proceedings of the 27th InternationalConference on Neural Information Processing Systems Volume 2December 8ndash13 2014 Montreal Canada

29 Russakovsky O Deng J Su H et al ImageNet large scale visual recognitionchallenge Int J Comput Vis 2015115(3)211ndash252

30 Wong SC Gatt A Stamatescu V McDonnell MD Understanding data aug-mentation for classification when to warp In 2016 InternationalConference on Digital Image Computing Techniques and Applications(Dicta) November 30ndashDecember 2 201659ndash64 Gold Coast Australia

31 DeLong ER DeLong DM Clarke-Pearson DL Comparing the areas under twoor more correlated receiver operating characteristic curves a nonparametricapproach Biometrics 198844(3)837ndash845

32 Youden WJ Index for rating diagnostic tests Cancer 19503(1)32ndash3533 Verdoodt F Jentschke M Hillemanns P Racey CS Snijders PJ Arbyn M

Reaching women who do not participate in the regular cervical cancerscreening programme by offering self-sampling kits a systematic review andmeta-analysis of randomised trials Eur J Cancer (Oxf Engl 1990) 201551(16)2375ndash2385

34 Schiffman M Glass AG Wentzensen N et al A long-term prospective studyof type-specific human papillomavirus infection and risk of cervical neopla-sia among 20000 women in the Portland Kaiser Cohort Study CancerEpidemiol Biomarkers Prev 201120(7)1398ndash1409

35 Chen HC Schiffman M Lin CY et al Persistence of type-specific human pap-illomavirus infection and increased long-term risk of cervical cancer J NatlCancer Inst 2011103(18)1387ndash1396

AR

TIC

LE

10 | JNCI J Natl Cancer Inst 2019

Dow

nloaded from httpsacadem

icoupcomjnciadvance-article-abstractdoi101093jncidjy2255272614 by SuU

B Bremen user on 29 January 2019

  • djy225-TF1
  • djy225-TF2