effect of malignancy rates on cost-effectiveness of

12
Effect of malignancy rates on cost-effectiveness of routine gene expression classifier testing for indeterminate thyroid nodules James X. Wu, MD, a Raymond Lam, a Mary Levin, b Jianyu Rao, MD, b Peggy S. Sullivan, MD, b and Michael W. Yeh, MD, a Los Angeles, CA Background. The value of gene expression classifier (GEC) testing for cytologically indeterminate thyroid nodules lies in its negative predictive value, which is influenced by the prevalence of malignancy. We incorporated actual GEC test performance data from a tertiary referral center into a cost-effectiveness analysis of GEC testing. Methods. We evaluated consecutive patients who underwent GEC testing for Bethesda category III and IV nodules from 2012 to 2014. Routine GEC testing was compared with conventional management by the use of a decision tree model. Additional model variables were determined via literature review. A cost- effectiveness threshold of $100,000 per quality-adjusted life-year (QALY) was used. Results. The prevalence of malignancy was 24.3% (52/214). Sensitivity and specificity of GEC testing were 96% and 60%. Conventional management cost $11,119 and yielded 22.15 QALYs. Routine GEC testing was more effective and more costly, with an incremental cost-effectiveness ratio of $119,700/ QALY, making it not cost-effective. At malignancy rates of 15, 25, or 35%, routine GEC testing became cost-effective when the cost of GEC testing fell below $3,167, $2,595, or $2,023. Conclusion. The cost-effectiveness of routine GEC testing varies inversely with the underlying prevalence of malignancy in the tested population. The value of routine GEC testing should be assessed within the context of institution-specific malignancy rates. (Surgery 2015;j:j-j.) From the Section of Endocrine Surgery a and Department of Pathology and Laboratory Medicine, b UCLA David Geffen School of Medicine, Los Angeles, CA PALPABLE THYROID NODULES ARE PRESENT IN 4–7% OF THE US POPULATION and carry a malignancy risk of 5%. 1 Fine-needle aspiration can help distinguish benign from malignant nodules, but 15–30% of as- pirates yield an indeterminate result (Bethesda cat- egories III and IV). 1 Indeterminate thyroid nodules carry a 5–30% risk of malignancy 2 and are managed conventionally with diagnostic thy- roid lobectomy. This is suboptimal, because the majority of patients suffer the cost, discomfort, and risk of thyroid resection for nodules ultimately found to be benign. Recently, a highly sensitive, although not highly specific, gene expression clas- sifier (GEC) test for indeterminate thyroid nodules has been developed. 3 The high negative predictive value of GEC testing permits the avoidance of sur- gery in many patients with GEC-benign nodules. Previous cost-effectiveness analyses have found routine GEC testing to be cost-effective compared with morphologic cytological examination of inde- terminate thyroid nodules alone 4,5 ; however, the underlying prevalence of malignancy (malignancy rate) among indeterminate thyroid nodules varies widely from 5 to 49%. 6 As a given center’s malig- nancy rate increases, the negative predictive value of routine GEC testing diminishes, calling into question whether GEC testing is cost-effective in a high malignancy rate setting. The aims of this study were (1) to evaluate the cost-effectiveness of routine GEC testing using prospectively collected GEC performance data from a tertiary referral center, and (2) to assess Funded by the H. H. Lee Research Award (Dr James Wu). Presented at the 36th Annual Meeting of the American Asso- ciation of Endocrine Surgeons, Nashville, TN, May 17 19, 2015. Accepted for publication May 9, 2015. Reprint requests: James X. Wu, MD, Section of Endocrine Sur- gery, 10833 Le Conte Ave, 72-228 CHS, Los Angeles, CA 90095. E-mail: [email protected]. 0039-6060/$ - see front matter Ó 2015 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.surg.2015.05.035 SURGERY 1 ARTICLE IN PRESS

Upload: others

Post on 26-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

ARTICLE IN PRESS

Funded

Presenteciation o

Accepte

Reprintgery, 1CA 9009

0039-60

� 2015

http://d

Effect of malignancy rates oncost-effectiveness of routine geneexpression classifier testing forindeterminate thyroid nodulesJames X. Wu, MD,a Raymond Lam,a Mary Levin,b Jianyu Rao, MD,b Peggy S. Sullivan, MD,b andMichael W. Yeh, MD,a Los Angeles, CA

Background. The value of gene expression classifier (GEC) testing for cytologically indeterminate thyroidnodules lies in its negative predictive value, which is influenced by the prevalence of malignancy. Weincorporated actual GEC test performance data from a tertiary referral center into a cost-effectivenessanalysis of GEC testing.Methods. We evaluated consecutive patients who underwent GEC testing for Bethesda category III andIV nodules from 2012 to 2014. Routine GEC testing was compared with conventional management bythe use of a decision tree model. Additional model variables were determined via literature review. A cost-effectiveness threshold of $100,000 per quality-adjusted life-year (QALY) was used.Results. The prevalence of malignancy was 24.3% (52/214). Sensitivity and specificity of GEC testingwere 96% and 60%. Conventional management cost $11,119 and yielded 22.15 QALYs. RoutineGEC testing was more effective and more costly, with an incremental cost-effectiveness ratio of $119,700/QALY, making it not cost-effective. At malignancy rates of 15, 25, or 35%, routine GEC testing becamecost-effective when the cost of GEC testing fell below $3,167, $2,595, or $2,023.Conclusion. The cost-effectiveness of routine GEC testing varies inversely with the underlying prevalenceof malignancy in the tested population. The value of routine GEC testing should be assessed within thecontext of institution-specific malignancy rates. (Surgery 2015;j:j-j.)

From the Section of Endocrine Surgerya and Department of Pathology and Laboratory Medicine,b UCLADavid Geffen School of Medicine, Los Angeles, CA

PALPABLE THYROID NODULES ARE PRESENT IN 4–7% OF

THE US POPULATION and carry a malignancy risk of5%.1 Fine-needle aspiration can help distinguishbenign from malignant nodules, but 15–30% of as-pirates yield an indeterminate result (Bethesda cat-egories III and IV).1 Indeterminate thyroidnodules carry a 5–30% risk of malignancy2 andare managed conventionally with diagnostic thy-roid lobectomy. This is suboptimal, because themajority of patients suffer the cost, discomfort,

by the H. H. Lee Research Award (Dr James Wu).

d at the 36th Annual Meeting of the American Asso-f Endocrine Surgeons, Nashville, TN, May 17�19, 2015.

d for publication May 9, 2015.

requests: James X. Wu, MD, Section of Endocrine Sur-0833 Le Conte Ave, 72-228 CHS, Los Angeles,5. E-mail: [email protected].

60/$ - see front matter

Elsevier Inc. All rights reserved.

x.doi.org/10.1016/j.surg.2015.05.035

and risk of thyroid resection for nodules ultimatelyfound to be benign. Recently, a highly sensitive,although not highly specific, gene expression clas-sifier (GEC) test for indeterminate thyroid noduleshas been developed.3 The high negative predictivevalue of GEC testing permits the avoidance of sur-gery in many patients with GEC-benign nodules.

Previous cost-effectiveness analyses have foundroutine GEC testing to be cost-effective comparedwith morphologic cytological examination of inde-terminate thyroid nodules alone4,5; however, theunderlying prevalence of malignancy (malignancyrate) among indeterminate thyroid nodules varieswidely from 5 to 49%.6 As a given center’s malig-nancy rate increases, the negative predictive valueof routine GEC testing diminishes, calling intoquestion whether GEC testing is cost-effective ina high malignancy rate setting.

The aims of this study were (1) to evaluate thecost-effectiveness of routine GEC testing usingprospectively collected GEC performance datafrom a tertiary referral center, and (2) to assess

SURGERY 1

Fig 1. Simplified decision-tree model.

ARTICLE IN PRESSSurgeryj 2015

2 Wu et al

the influence of malignancy rates on the cost-effectiveness of routine GEC testing.

METHODS

GEC test performance characteristics. Afterinstitutional review board approval, we evaluatedprospectively the electronic medical records ofpatients with cytologically indeterminate thyroidnodules (Bethesda categories III and IV) whounderwent GEC testing within UCLA Health be-tween August 2012 and July 2014. Bethesda cate-gory III thyroid nodules are lesions reported asatypia of undetermined significance or follicularlesion of undetermined significance, and Bethesdacategory IV thyroid nodules are lesions reported assuspicious for follicular neoplasm, or follicularneoplasm.7 GEC testing of thyroid nodule aspi-rates was ordered routinely by the pathologydepartment as a reflex test for Bethesda categoryIII and IV cytologic results.

Variables analyzed included Bethesda diagnosticcategory, GEC test result, whether the patientunderwent subsequent thyroid resection, and pres-ence of malignancy in the index nodule. GECtesting recorded the location of the tested lesion in1 of the following sectors of the thyroid: leftsuperior, left mid, left inferior, right superior, rightmid, or right inferior. When histopathologyrevealed papillary thyroid microcarcinoma, ie, afocus or foci of papillary thyroid carcinomameasuring less than 1 cm in diameter, the thyroidnodule was assessed as malignant if and only if thetumor was found in the same thyroid sector. Themalignancy rate of indeterminate thyroid noduleswas defined as the number of histopathologicallymalignant lesions divided by the total number ofall thyroid nodules tested. Patients who did notundergo operative resection were assumed to havebenign lesions and managed with interval ultraso-nography examinations.

Decision tree model: Reference patient sce-nario. The reference case was defined as a healthy45-year-old adult with an indeterminate thyroidnodule less than 4 cm in diameter, no history ofradiation exposure, a negative family history, andno definitive clinical or sonographic features ofmalignancy. Our purpose was to model a patientwho would likely undergo thyroid resection after asuspicious GEC result but would be observed aftera benign GEC result.

Decision tree model: Model perspective andassumptions. We constructed a Markov transitionstate model using decision analysis software(TreeAge Pro, Williamstown, MA) to compare

conventional management using routinemorphologic cytology alone compared withroutine GEC testing of indeterminate thyroidnodules (Fig 1). The model was constructedwith the use of a limited societal perspectivethat used only direct medical costs. A time hori-zon of 38 years was used, which was equivalentto the average life expectancy of a 45-year-oldwoman.8 The preferred strategy was defined asthat which produced the greatest utility withoutexceeding a threshold of $100,000 per quality-adjusted life-year (QALY).9

The model included several assumptions. Whenpossible, we attempted to bias the model in favor ofroutine GEC testing, because routine GEC testingwas found to be cost-effective in previous reports.First, we assumed that all patients with GEC-suspicious nodules would undergo thyroid lobec-tomy and that all GEC-benign nodules would beobserved. Second, we assumed that all patients withfalse-negative GEC testing would eventually beidentified during follow-up and undergo totalthyroidectomy. Third, we assumed that all patientsundergoing surgery would receive initial diagnosticlobectomy, followed by completion thyroidectomy ifmalignant histopathology was found. Finally, weassumed that the total cost of observation of benignnodules was equivalent to the total cost of post-operative surveillance for resected benign thyroidlesions. After resection of benign thyroid nodules,patients generally require endocrinology follow-up,thyroid-stimulating hormone testing, and surveil-lance with ultrasonography of the contralaterallobe. Similarly, observation of GEC benign thyroidnodules requires endocrinology follow-up and pe-riodic surveillance with ultrasonography. Becausethese costs are not well characterized and sharesignificant overlap, we chose to assume there is nosignificant difference.

Table I. Model variables: Probabilities

Variable name Index valueRange for MonteCarlo simulation References

Underlying prevalence of malignancy ofindeterminate thyroid nodules (Bethesdacategories III and IV)

24.3% 6�48% 3,6,19–21

Proportion of thyroid nodules observed withoutGEC testing

20% 11�32% 4,10

Proportion of GEC-suspicious thyroid nodulesobserved

0% 0�29% 4,10

Probability of thyroid lobectomy (vs totalthyroidectomy) without GEC testing

100% 50�100% 22

Probability of thyroid lobectomy (vs totalthyroidectomy) with GEC testing: suspicious

100% 50�100% n/a

Sensitivity of GEC testing 96% 74�100% 3–5,16,17

Specificity of GEC testing 60% 10�75% 3–5,16,17

Probability of ‘‘no result’’ on GEC testing 7.47% 3.74–11.2% 3–5,16

Probability of time-limited complicationfollowing thyroid lobectomy

25% 12.5–37.5% 4,23–26

Probability of permanent complication afterthyroid lobectomy

1% 0.5–1.5% 4,23–26

Probability of time-limited complication aftercompletion thyroidectomy

37% 19�56% 4,23–26

Probability of permanent complication aftercompletion thyroidectomy

7% 3.5–10.5% 4,23–26

Probability of time-limited complication aftertotal thyroidectomy

37% 19�56% 4,23–26

Probability of permanent complication aftertotal thyroidectomy

7% 3.5–10.5% 4,23–26

GEC, Gene expression classifier.

Table II. Model variables: Costs

Variable name Index valueRange for MonteCarlo simulation References

Thyroid lobectomy $8,667 $4,333�13,000 4,5,12,27

Completion thyroidectomy $11,010 $5,505�16,515 4,5,12,27

Total thyroidectomy $10,840 $5,420�16,260 4,5,12,27

Annual supply levothyroxine $205 $103�308 13

Time-limited complication after thyroid lobectomy $826 $413�1,239 4,5

Permanent complication after thyroid lobectomy $5,685 $2,842�8,527 4,5

Time-limited complication after completion or total thyroidectomy $826 $413�1,239 4,5

Permanent complication after completion or total thyroidectomy $6,763 $3,382�10,145 4,5

GEC testing $3,500 $1,750�$5,250 4,5

GEC, Gene expression classifier.

ARTICLE IN PRESSSurgeryVolume j, Number j

Wu et al 3

Decision tree model: Derivation of model vari-ables. Index values for malignancy rate, GEC testsensitivity and specificity, and likelihood of insuf-ficient GEC test result were obtained from pro-spective chart review as described previously. Othermodel probabilities were abstracted from literaturereview (Table I). For conventional managementrelying on morphologic cytology alone, 80% of pa-tients with indeterminate nodules were assumed to

undergo surgery. This value approximates clinicalpractice and has been used in prior cost-effectiveness analyses.4,10

Direct medical costs were obtained from theMedicare Physician Fee schedule, the HealthcareCost and Utilization Project, and literature review(Table II).11,12 Costs of levothyroxine were basedon wholesale prices listed in a commonly-usedphysician reference.13 Costs are expressed in

Table III. Model variables: HRQoL factors

Variable name Index valueRange for MonteCarlo simulation References

Status after thyroid lobectomy 0.99 0.985–0.995 4,5

Status after total or completion thyroidectomy 0.97 0.95–0.995 4,5

Time-limited complication after lobectomy ortotal thyroidectomy (1 yr)

0.94 0.90–0.99 4,5,15

Permanent unilateral RLN injury 0.63 0.56–0.69 4,5,15

Permanent bilateral RLN injury 0.21 0.18–0.23 4,5,15

Permanent hypoparathyroidism 0.78 0.70–0.86 4,5,15

HRQoL, Health-related quality of life; RLN, recurrent laryngeal nerve.

ARTICLE IN PRESSSurgeryj 2015

4 Wu et al

2,013 US dollars. Past costs were adjusted to 2,013dollars by use of the average increase in consumerprice index for medical services.14 All future costswere discounted 3% annually.

Health-related quality of life (HRQoL) factorswere obtained from a review of the literature(Table III). The effectiveness of each clinicaloutcome in our model was measured in QALYs,obtained by multiplying the HRQoL factor of a dis-ease state by the duration of time a patient spent inthat disease state, expressed in years. HRQoLs forhealth-states associated with complications afterthyroid resection were derived from a survey of109 healthy patients by use of the time trade-offmethod.15 All future QALYs were discounted 3%annually.

Sensitivity analysis. One-way sensitivity analysiswas performed for each outcome probability bythe use of a $100,000/QALY threshold. Sensitivityanalyses of model probabilities and HRQoL factorswere performed for values between 0 and 1, or themaximum values compatible with the model. Forsensitivity analysis of costs, we allowed values tovary from $0 to twice the index value. Two-waysensitivity analyses were performed to examine therelationship between the malignancy rate of inde-terminate thyroid nodules and the cost of routineGEC testing.

For probabilistic sensitivity analysis, all modelvariables (probabilities, costs, utilities) were setas static with triangular frequency distributions.We performed 10,000 separate Monte Carlosimulations during which each model variablewas assigned a different value within its trian-gular distribution. The parameters for eachprobability’s triangular distribution are listed inTables I–III. The range used for probabilisticsensitivity analysis was determined by the rangeof values found in the literature review or amaximum variance of +/�50% of the indexcase value.

RESULTS

GEC testing performance data. From August2012 to July 2014, a total of 214 indeterminatethyroid nodules in 199 patients underwent GECtesting at the study institution. Morphologiccytology of FNA aspirates was Bethesda categoryIII in 187 (87.4%) patients, and Bethesda categoryIV in 27 (12.6%) patients. GEC results were suspi-cious in 107 cases (50%), benign in 90 (42.5%),and no result in 16 (7.5%). Among patients with asuspicious GEC test result, 81 (75.7%) underwentsurgery, with 50 of 81 (61.7%) found to be malig-nant. Of the 26 patients (24.3%) who did notundergo surgery despite a suspicious GEC result, 7were lost to follow-up, observation was recommen-ded in 8 patients because of reassuring sono-graphic or clinical characteristics, and theremaining 11 patients did not undergo surgerybecause of patient preference. Among patientswith a benign GEC test result, 14 (15.5%) under-went surgery with 2 of 14 found to be malignant(14.3%). The malignancy rate for all indetermi-nate nodules was 24.3%. The sensitivity andspecificity of GEC testing were 96% and 60%.The negative predictive value and positive predic-tive value of GEC testing were 97.8% and 44.9%.GEC test performance data subdivided by Be-thesda category are reported in Table IV.

Reference case. The expected cost of conven-tional management without GEC testing was$11,119 and produced 22.15 QALYs. Compara-tively, management of indeterminate thyroid nod-ules with routine GEC testing cost $1,197 moreand yielded 0.01 additional QALYs. Althoughroutine GEC was the more effective strategy, theincremental cost-effectiveness ratio (ICER) was$119,700/QALY, slightly over the $100,000/QALYthreshold. Thus, using base-case assumptions, con-ventional management was the preferred treat-ment strategy. When the model is applied toBethesda III and IV nodules separately, routine

Table V. One-way sensitivity analyses

Variable name Index value

Values where routine GECtesting becomes cost-effective,$100,000/QALY threshold

ProbabilitiesUnderlying malignancy rate of resected indeterminate thyroidnodules (Bethesda categories III and IV)

24.3% <9.2%

Proportion of thyroid nodules observed without GEC testing 20% <11.5%Proportion of GEC-suspicious thyroid nodules observed 0% >15.8%Probability of thyroid lobectomy (vs total thyroidectomy) withoutGEC testing

100% <96.9%

Specificity of GEC testing 60% >71.3%Probability of time-limited complication after thyroid lobectomy 25% >75.4%

CostsGEC test $3,500 <$2,640Thyroid lobectomy $8,667 >$12,160

HRQoL factorsStatus post thyroid lobectomy 0.99 <0.98Time-limited complication after thyroid lobectomy (1 yr) 0.94 <0.82

GEC, Gene expression classifier; HRQoL, health-related quality of life; QALY, quality-adjusted life years.

Table IV. Actual GEC test performance

VariableBethesda categoryIII (AUS/FLUS)

Bethesda categoryIV (suspicious for FN)

Bethesda categoriesIII and IV

Total no. 187 27 214Prevalence of malignancy 43 (23%) 9 (33.3%) 52 (24.3%)GEC sensitivity 95.3% 100% 96%GEC specificity 62.5% 50% 60%GEC negative predictive value 97.6% 100% 97.8%GEC positive predictive value 45.5% 52% 44.9%

AUS, Atypia of undetermined significance; FLUS, follicular lesion of undetermined significance; FN, follicular neoplasm; GEC, gene expression classifier.

ARTICLE IN PRESSSurgeryVolume j, Number j

Wu et al 5

GEC was more effective but more costly thanconventional management in both cases, and theICERs for routine GEC testing were $94,000/QALY and $241,500/QALY, respectively.

Sensitivity analysis. The 3 model variables withthe greatest potential impact on model out-comes were (1) the specificity of GEC testing,(2) the underlying prevalence of malignancy,and (3) the proportion of indeterminate thyroidnodules managed with observation during con-ventional management. Routine GEC testingbecame the preferred treatment strategy whenthe specificity of GEC testing exceeded 71.3%.Routine GEC testing also became the preferredtreatment strategy when the malignancy rate ofindeterminate nodules fell below 9.2%. Finally,routine GEC testing became the preferred treat-ment strategy when the rate of observation forcytologically indeterminate nodules in the con-ventional management arm fell below 15.4%.One-way sensitivity analyses of model

probabilities capable of independently changingthe model outcome appear in Table V. The ICERtornado diagram (Fig 2) depicts one-way sensi-tivity analyses of the 6 most influential probabil-ities and costs.

With respect to costs, one-way sensitivity analysisidentified 2 costs that could independently changethe model outcome: the cost of GEC testing andthe cost of thyroid lobectomy. Routine GEC testingbecame the preferred treatment strategy when thecost of GEC testing fell below $2,640, or when thecost of thyroid lobectomy exceeded $12,160. TwoHRQoL factors were capable of independentlychanging the model outcome: the HRQoL associ-ated with thyroid lobectomy and the HRQoLassociated with time-limited complication afterthyroid lobectomy. Routine GEC testing becamethe preferred treatment strategy when the HRQoLafter thyroid lobectomy fell below 0.98, or whenthe HRQoL associated with time-limited complica-tion after a thyroid lobectomy fell below 0.82.

Fig 2. ICER tornado diagram, routine GEC testing compared with conventional management for indeterminate thyroidnodules. Each horizontal bar depicts the incremental cost-effectiveness ratio of routine GEC testing compared with con-ventional management across the range of values for each variable used in Monte Carlo simulations, noted in Tables I–III.Variables with the widest bars have the greatest potential impact on model outcome.

Fig 3. Two-way sensitivity analysis of cost of GEC testingcompared with underlying malignancy rate of indetermi-nate thyroid nodules. The white area represents all com-binations of cost of GEC testing and malignancy ratewhere conventional management is more cost-effective.The shaded (gray) area represents all combinations ofcost of GEC testing and malignancy rate where routineGEC testing is more cost-effective. The strategy thatyields the greatest amount of utility, measured in QALYs,at a cost less than $100,000/QALY is considered morecost-effective.

ARTICLE IN PRESSSurgeryj 2015

6 Wu et al

Two-way sensitivity analysis was performed toassess the relationship between malignancy rateand the cost-effectiveness of routine GEC testing.We found that as underlying malignancy rateincreased, the test cost threshold below whichroutine GEC testing became cost-effectivedecreased (Fig 3). At malignancy rates of 10, 15,20, 25, 30, and 35%, routine GEC testing becamecost-effective when the cost of GEC testing fellbelow $3,453, $3,167, $2,881, $2,595, $2,309, or$2,023, respectively. Probabilistic sensitivity analysis

demonstrated that conventional management wasthe preferred strategy compared with routineGEC testing in 53.2% of 10,000 Monte Carlo simu-lations (Fig 4).

DISCUSSION

At the index institution, the underlying preva-lence of malignancy was 24.3%. GEC testing had asensitivity of 96% and specificity of 60%. Whenthese values were applied to our cost-effectivenessanalysis, the decision model found that routineGEC testing was more effective but also more costlythan conventional management. The incrementalcost-effectiveness ratio was $119,700/QALYgained, making routine GEC testing not cost-effective based upon a $100,000/QALY threshold.

During probabilistic sensitivity analysis, modelvariables were sampled randomly along a prede-fined distribution of values to generate eachunique simulation, thereby applying our modelto 10,000 different possible clinical scenarios. Thisanalysis revealed that conventional managementwas the preferred strategy compared with routineGEC testing in 53.2% of 10,000 Monte Carlosimulations. Put another way, within the universeof possible clinical scenarios, conventional man-agement of indeterminate thyroid nodules wouldbe preferred over routine GEC testing in a littleover half of instances. Thus, our cost-effectivenessanalysis cannot be confidently applied to otherinstitutions whose site-specific probabilities andcosts do not match our own, leading us toconclude that each individual institution mustassess its own mosaic of probabilities and costs todetermine the most cost-effective local strategy.

Fig 4. Probabilistic sensitivity analysis, routine GEC testing compared with conventional management for indeterminatethyroid nodules. Data points represent unique combinations of model probabilities, costs, and health utilities, meant tosimulate possible clinical scenarios. Index values refer to the combination of variables used in the base case analysis.

ARTICLE IN PRESSSurgeryVolume j, Number j

Wu et al 7

Previous cost-effectiveness analyses have foundroutine GEC testing for indeterminate thyroidnodules to be cost-effective compared with con-ventional management.4,5 We note some impor-tant differences in model variables used by Liet al4 compared with our own. First, Li et alassumed the specificity of GEC testing for indeter-minate thyroid nodules to be 75%, with a range of64–91%. This high level of specificity exceedsvalues reported in the literature, which rangefrom 10 to 63%.3,16,17 Second, the permanentcomplication rate after thyroid lobectomy, ie, therate of permanent recurrent laryngeal nerve injury,was assumed to be 5%. This exceeds the rate ofpermanent RLN injury associated with both high-volume surgeons (<1%) and unselected surgeons(1–3%) reported in large-scale studies.23-25 Bothof these assumptions heavily bias their model infavor of routine GEC testing. A second cost-effectiveness analysis of routine GEC testing for Be-thesda category III thyroid nodules by Lee et alassumed GEC sensitivity and specificity to be 90%and 53.1%, more in keeping with values reportedin current literature.3,5 In their probabilistic sensi-tivity analysis, conventional management was thepreferred treatment strategy in approximately20% of possible clinical scenarios, routine GEC

testing was the preferred strategy in less than10%, and the 3 remaining alternate strategies us-ing gene mutation panel testing were eachpreferred in 20–35% of cases. No single strategywas superior in more than 35% of possible clinicalscenarios.5 These findings are consistent with ourown findings, which emphasize that the preferredtreatment strategy for indeterminate thyroid nod-ules likely depends on local factors.

Taken together, these reports and ours demon-strate that small deviations in certain variables canheavily influence the cost-effectiveness of routineGEC testing. The 2 probabilities with the greatestpotential impact on the model outcome were GECtest specificity and the underlying prevalence ofmalignancy. Only 2 costs could independently altermodel outcomes: the cost of GEC testing and thecost of thyroid lobectomy. Intuitively, the ICER ofroutine GEC testing varies directly with the cost ofthe GEC test and varies inversely with the cost ofthyroid lobectomy. Also, the quality of life afterwith thyroid lobectomy and the quality of lifeassociated with a time-limited complication ofthyroid lobectomy were the only 2 HRQoL factorsthat could independently alter model outcomes.Both of these HRQoL factors varied inversely withthe ICER of routine GEC testing.

ARTICLE IN PRESSSurgeryj 2015

8 Wu et al

We attribute the strong influence of GEC testspecificity on our model to the wide range of valuesfor GEC specificity found in the literature and tothe fact that all false-positive GEC test results driveup the rate of unnecessary surgery, a major contrib-utor to costs. We note that, over time, the specificityvalues reported in the literature are convergingtowards the 50–60% range as the quality andsample size of published studies increases. At ourinstitution, GEC testing of 214 indeterminate nod-ules had a specificity of 60%. This finding iscongruent with the large-scale, multicenter studyby Alexander et al3 published in 2012, in which theyexamined 577 Bethesda category III, IV, and V thy-roid nodules and reported a GEC test specificity of52%. By definition, specificity is intrinsic to test per-formance does not vary with the prevalence of dis-ease; hence we expect that the body of literaturewill progressively approach the true specificity ofGEC testing moving forward.

The underlying prevalence of malignancyamong indeterminate nodules has been previouslyrecognized to be an important determinant ofcost-effectiveness.4,5

Conceptually, a treatment center with a highmalignancy rate approaching 50% will receive ahigh proportion of GEC-suspicious results, whichdoes not add value to patient care, as the next stepin management (diagnostic thyroid lobectomy) isunchanged from conventional management ofindeterminate nodules. Such a center may electto bypass GEC testing altogether in favor ofproceeding directly to thyroid lobectomy aftermorphologic cytology. Inversely, a center with arelatively low malignancy rate (<10%) would likelyfavor routine GEC testing, which would allow alarge proportion of patients to avoid unnecessarysurgery after a negative GEC result. With anunderlying malignancy rate of 24.3%, routineGEC testing for at our institution became thepreferred treatment strategy only when the costof GEC testing fell below $2,640. The actual cost ofGEC testing is $3,500.

Our study has several limitations. The decisionwhether to resect or observe a thyroid nodule is acomplex medical decision that cannot be fullycaptured in a necessarily simplified decision tree.For example, we assumed that all patients withGEC-suspicious thyroid nodules underwent sur-gery, all GEC-benign thyroid nodules wereobserved initially, and all patients with false-negative GEC tests would later be identified ashaving malignant disease and treated appropri-ately. These assumptions, which were made topurposefully bias the model in favor of routine

GEC testing, reinforce our conclusion that routineGEC testing is not cost-effective under base-caseassumptions. Furthermore, our model was con-structed to best match actual practice at ourinstitution, and may not reflect practice elsewhere.Although we recognize repeat biopsy is recom-mended for Bethesda III nodules in lieu of GECtesting by the guidelines of the 2009 AmericanThyroid Association, this has not been performedroutinely at our institution, even prior to theadoption of GEC testing. This practice was notincorporated into our model, but would add thecost of an additional FNA procedure to conven-tional management of Bethesda III nodules,approximately $253.18

Finally, we considered foci of micropapillarythyroid carcinoma (microPTC) found in thesame sector of the thyroid where GEC testing wasperformed to be evidence of malignancy. This is animperfect method, given that the presence of anincidental microPTC in the same sector as abiopsied nodule may introduce inaccuracies. Inthe first year of data collection, we attempted toexamine needle-track scars microscopically todetermine whether the microPTC and the previ-ously biopsied nodule were one and the same, butneedle track scars could not be identified in somecases. Thus, we found subdividing the thyroid into6 sectors was the most reliable method of resolvingthese cases.

We conclude that the cost-effectiveness ofroutine GEC testing varies inversely with the un-derlying prevalence of malignancy in the testedpopulation. In our experience, GEC testing hasproven to be valuable to a large number of indi-vidual patients; however, the overall strategy ofroutine GEC testing has not proven cost-effective.In the absence of a significant decrease in the costof GEC testing, improved risk stratification toselect low-risk patients to undergo GEC testingwhile channeling high-risk patients directly tosurgery may emerge as the superior treatmentparadigm. Individual institutions should developa working knowledge of the underlying prevalenceof malignancy within their patient population toaccurately assess the cost-effectiveness of routineGEC testing for indeterminate thyroid nodules intheir local environment.

REFERENCES

1. Heged€us L. The thyroid nodule. N Engl J Med 2004;351:1764-71.

2. Baloch ZW, LiVolsi VA, Asa SL, Rosai J, Merino MJ,Randolph G, et al. Diagnostic terminology and morpho-logic criteria for cytologic diagnosis of thyroid lesions: A

ARTICLE IN PRESSSurgeryVolume j, Number j

Wu et al 9

synopsis of the National Cancer Institute Thyroid Fine-Needle Aspiration State of the Science Conference. DiagnCytopathol 2008;36:425-37.

3. Alexander EK, Kennedy GC, Baloch ZW, Cibas ES,Chudova D, Diggans J, et al. Preoperative diagnosis ofbenign thyroid nodules with indeterminate cytology.N Engl J Med 2012;367:705-15.

4. Li H, Robinson KA, Anton B, Saldanha IJ, Ladenson PW.Cost-effectiveness of a novel molecular test for cytologicallyindeterminate thyroid nodules. J Clin Endocrinol Metab2011;96:E1719-26.

5. Lee L, How J, Tabah RJ, Mitmaker EJ. Cost-effectiveness ofmolecular testing for thyroid nodules with atypia of unde-termined significance cytology. J Clin Endocrinol Metab2014;99:2674-82.

6. Wang CC, Friedman L, Kennedy GC, Wang H, Kebebew E,Steward DL, et al. A large multicenter correlation study ofthyroid nodule cytopathology and histopathology. Thyroid2011;21:243-51.

7. Cibas ES, Ali SZ. The Bethesda system for reporting thyroidcytopathology. Am J Clin Pathol 2009;132:658-65.

8. Hoyert DL, Xu J. Deaths: preliminary data for 2011. Natl Vi-tal Stat Rep 2012;61:1-51.

9. Grosse SD. Assessing cost-effectiveness in healthcare: historyof the $50,000 per QALY threshold. Expert Rev Pharmacoe-con Outcomes Res 2008;8:165-78.

10. Duick DS, Klopper JP, Diggans JC, Friedman L, Kennedy GC,Lanman RB, et al. The impact of benign gene expressionclassifier test results on the endocrinologist–patient decisionto operate on patients with thyroid nodules with indetermi-nate fine-needle aspiration cytopathology. Thyroid 2012;22:996-1001.

11. Physician Fee Schedule Search Tool. 2014. Available from:http://www.cms.gov/apps/physician-fee-schedule/search/search-criteria.aspx. Accessed August 4, 2014.

12. HCUP National Inpatient Sample (NIS). Rockville, MD:Agency for Healthcare Research and Quality; 2012.

13. Levothyroxine: Drug Information. 2015 [subscriptionrequired]. Available from: http://www.uptodate.com/contents/levothyroxine-drug-information. Accessed August4, 2014.

14. Consumer Price Index - Medical Care. Bureau of Labor Sta-tistics; 2014. Available from: http://data.bls.gov/timeseries/CUUR0000SAM?output_view=pct_12mths. Ac-cessed August 4, 2014.

15. Sejean K, Calmus S, Durand-Zaleski I, Bonnichon P,Thomopoulos P, Cormier C, et al. Surgery versus medicalfollow-up in patients with asymptomatic primary hyperpara-thyroidism: a decision analysis. Eur J Endocrinol 2005;153:915-27.

16. McIver B, Castro MR, Morris JC, Bernet V, Smallridge R,Henry M, et al. An independent study of a gene expressionclassifier (Afirma) in the evaluation of cytologically indeter-minate thyroid nodules. J Clin Endocrinol Metab 2014;99:4069-77.

17. Chudova D, Wilde JI, Wang ET, Wang H, Rabbee N,Egidio CM, et al. Molecular classification of thyroid nodulesusing high-dimensionality genomic data. J Clin EndocrinolMetab 2010;95:5296-304.

18. Khalid AN, Hollenbeak CS, Quraishi SA, Fan CY, Stack BC.The cost-effectiveness of iodine 131 scintigraphy, ultraso-nography, and fine-needle aspiration biopsy in the initialdiagnosis of solitary thyroid nodules. Arch OtolaryngolHead Neck Surg 2006;132:244-50.

19. Alexander EK, Schorr M, Klopper J, Kim C, Sipos J,Nabhan F, et al. Multicenter clinical experience with theAfirma gene expression classifier. J Clin Endocrinol Metab2013;99:119-25.

20. Kiernan CM, Broome JT, Solorzano CC. The Bethesda sys-tem for reporting thyroid cytopathology: a single-centerexperience over 5 years. Ann Surg Oncol 2014;21:3522-7.

21. VanderLaan PA, Marqusee E, Krane JF. Clinical outcome foratypia of undetermined significance in thyroid fine-needleaspirations: should repeated fna be the preferred initialapproach? Am J Clin Pathol 2011;135:770-5.

22. Cooper DS, Doherty GM, Haugen BR, Kloos RT, Lee SL,Mandel SJ, et al. Revised American Thyroid Associationmanagement guidelines for patients with thyroid nodulesand differentiated thyroid cancer: the American ThyroidAssociation (ATA) guidelines taskforce on thyroid nodulesand differentiated thyroid cancer. Thyroid 2009;19:1167-214.

23. Bergenfelz A, Jansson S, Kristoffersson A, Martensson H,Reihner E, Wallin G, et al. Complications to thyroid surgery:results as reported in a database from a multicenter auditcomprising 3,660 patients. Langenbeck’s Arch Surg 2008;393:667-73.

24. Sosa JA, Bowman HM, Tielsch JM, Powe NR, Gordon TA,Udelsman R. The importance of surgeon experience forclinical and economic outcomes from thyroidectomy. AnnSurg 1998;228:320.

25. Dralle H, Sekulla C, Haerting J, Timmermann W,Neumann HJ, Kruse E, et al. Risk factors of paralysis andfunctional outcome after recurrent laryngeal nerve moni-toring in thyroid surgery. Surgery 2004;136:1310-22.

26. Rosato L, Avenia N, Bernante P, De Palma M, Gulino G,Nasi PG, et al. Complications of thyroid surgery: analysisof a multicentric study on 14,934 patients operated on inItaly over 5 years. World J Surg 2004;28:271-6.

27. Cassibba S, Pellegrino M, Gianotti L, Baffoni C, Baralis E,Attanasio R, et al. Silent renal stones in primary hyperpara-thyroidism: prevalence and clinical features. Endocr Pract2014;20:1137-42.

DISCUSSIONDr Paul G. Gauger (Ann Arbor, MI): My ques-

tion is simple. As I look at your model and wonderabout the patients who went into it, we have foundthat sometimes you will spend the money on theGEC testing and patients will still make a decisionto have an operation based on local symptoms orfamily history or anxiety that persists, so I don’tthink it’s just the background malignancy ratethat threatens the model, but tell me about the va-garies of patient decision-making in your patientsand how that would have affected it.

Dr James X. Wu: First, to clarify, no actual pa-tient data go into the model except for GEC per-formance data. All the model data come fromliterature review. We used cost from the Medicarephysician fee schedule as well as the HealthcareCost and Utilization Project inpatient cost data.To get more to the point of your question, whenwe build our model, we have to make a conscious

ARTICLE IN PRESSSurgeryj 2015

10 Wu et al

decision of where we are going to bias it. We inten-tionally biased it toward favoring routine GECtesting because that is what was shown to be mostcost-effective in 2 previous cost-effective analyses.Thus, in our model, all patients who are suspiciousfor GEC underwent surgery. All patients who wereGEC-benign were observed. There were no pa-tients who, despite being GEC-benign, receivedsurgery anyway, which would have increased costsfor that arm. I think it only speaks more stronglytoward a model that routine GEC testing is not uni-versally cost effective for all centers.

Dr Paul G. Gauger: Do you share that concern,though, in your own practice?

Dr James X. Wu: I think it’s a definite concernthat many real world clinical factors will result inless than perfect use of GEC testing, invariablydecreasing its cost-effectiveness. It should benoted, however, that this cost-effectiveness model,as well as all cost-effectiveness models, are notmeant to guide decision-making for an individualpatient but may guide future studies of bestpractices.

Dr Salem Noureldine (Baltimore, MD): I haveno disclosures. I have 2 questions. Did you corre-late the cytology with radiology and histology re-ports to determine whether the nodule wasindeed malignant or not? The second question isthat you say that the malignancy rate at your insti-tution for indeterminate nodules was 24% andtherefore GEC testing was not cost-effective. How-ever, I would like to point out that you may havenot captured the true malignancy rate at your insti-tution because you only know the outcome of thesuspicious indeterminate nodules. You didn’tknow the outcome for the other indeterminatenodules, the benign, for example, the Afirmabenign ones or the Quest benign ones. Therefore,the malignancy rate at your institution may as wellbe less than 24%. As you showed, the malignancyrate for the atypia of undetermined significance(AUS) nodules per the Bethesda classification isbetween 5 and 15%. Are you saying that routineor reflective GEC testing for every AUS nodule isokay or cost effective? Again, I would like to echothe point of what’s the value in getting this testin every patient? For example, a patient who hascompressive symptoms of this nodule but has anAUS cytology, or a patient that you see clinical sus-picion or radiological suspicion for malignancy,why would you get a rule-out test?

Dr James X. Wu: Something that I didn’texpressly get into, but when we make thesemodels, we always first define a reference case, anindex case that all of the data entry, all of the

model, is based around. We try to base it arounda clinical scenario in which we have a 45-year-oldwoman who has a 2-cm lesion, no suspicious find-ings on ultrasonography, and your real clinical de-cision-making is going to be based solelydependent on the GEC test itself because that’sreally the scenario of what GEC testing may giveyou the most added value. For the first question,we simply looked at the final operative pathologyof all the patients who underwent operation. Forall the patients who were GEC-benign and didnot undergo operation, we chose to consider thosenodules benign. I know that’s a limitation and ashortcoming of many recent series looking at theperformance of Veracyte data. We did include astudy by McIver that was an independent study ofVeracyte performance, where they calculated ma-lignancy rate and GEC performance data usingonly patients that had histologic confirmation.That has its own biases, because if you are oper-ating on benign nodules, there’s probably some-thing telling you that you should do surgeryother clinical features present that are suggestiveof possible malignancy. They found GEC test spec-ificity of 10%. We did incorporate that into ourmodel as a possible value. In terms of the malig-nancy rate, that kind of gets at the heart of thesame question.

Dr Mira Milas (Portland, OR): No disclosures.The question for which I came to the microphonehas in some shape already been raised, but may beI would like still a clearer answer or confirmationthat your definition of ‘‘routine’’ is that you wouldget the GEC automatically at the time of the firstFNA on all patients.

Dr James X. Wu: Right. We did our best tomodel what is actually happening at our institu-tion. If there is a fine-needle aspiration withBethesda category 3 or 4, the pathologists reflex-ively order the Veracyte on those lesions.

Dr Mira Milas: I would just then propose anapproach of caution. None of these tests are forserum potassium levels or serum calcium levels.Even on the audience response question, each ofthose molecular markers may be appropriate forthe context. Even before you did your modeling,which may be relevant for other reasons, I thinkfuture work would really benefit from realizingthat all of these tests need to be ordered in thecontext of what the patients need.

Dr James X. Wu: I think that’s a great point.Right now, we are currently looking into situationswhere reflex GEC testing is wasteful or inappro-priate, and how GEC testing can be better utilized.When we audited our own Veracyte data, we

ARTICLE IN PRESSSurgeryVolume j, Number j

Wu et al 11

noticed that for patients with lesions greater than4 cm or things nodules with very suspicious ultraso-nograpic characteristics, some are advised to un-dergo surgical resection despite a benign GECresult. If GEC testing will not impact management,it should not be ordered. On the other hand, italso made us question whether surgery despite aGEC benign result is appropriate, even when thenodule is greater than 4 cm. Was the Veracyte accu-rate despite those worrisome characteristics? Canyou trust benign GEC result? I think that’s goingto be a focus of future studies.

Dr Carrie C. Lubitz (Boston, MA): No disclo-sure. I have a couple of specific questions aboutthe inputs that you put into your data. Specifically,how did you inform your quality of life or utilityweights, and where did you obtain that data fromor did you have your own primary data? Second,what did you do with those patients, or did youinput this into the model, that they would poten-tially have a false negative Afirma? Did you followthose patients? Were those patients able to recurin your model?

Dr James X. Wu: Let me address the secondquestion first. In terms of false negatives, we didenter in specificity/sensitivity into the model, soof course there were going to be some false nega-tives that resulted. To simplify our model, wemade the assumption that all false negatives wouldbe caught in the subsequent year and then receiveappropriate surgical intervention without thosenodules themselves proceeding into moreadvanced disease. So that is a simplification ofour model.

Dr Carrie C. Lubitz: Did you integrate the recur-rent costs of another fine-needle aspiration andanother ultrasonography?

Dr James X. Wu: Yes, all those added costs wereaccounted for in. In terms of the quality of lifedata, I think that raises an important point. Thehealth-related quality of life factors that we usedmostly came from prior cost-effectiveness studiesby Ladenson, Mitmaker, also by Zanocco and Stur-geon. I think that is a big area where there is a defi-ciency in the data. I think that’s a big area thatneeds to be filled in.

Dr Sally E. Carty (Pittsburgh, PA): A point, aquestion, and a request. The American Thyroid As-sociation is bringing out in July, in Thyroid, the im-plementation of molecular markers for surgeons, aconsensus statement that addresses the point thatyou made partially, which is that the malignancyrate in a given cytologic category varies with a lotmore than referral pattern. It varies with the will-ingness of the cytologist to call it benign or cancer,

it varies with the regional cancer prevalence, and itvaries with other things as well. You have a meth-odologic issue, I think, in this study because theindeterminate categories are not just Bethesda IIIand IV; it’s Bethesda V that’s indeterminate aswell, suspicious follicular neoplasm and follicularlesion of undetermined significance. A programneeds to know its malignancy rate in each of thosecategories to assess whether GEC or any other testis valuable. My question is, because your abstractstates that routine test GEC is cost effective whenthe malignancy rate of indeterminate nodules isless than 14%, you have shown, then, haven’tyou, that it’s never cost effective? Because lessthan 14% is highly unlikely. That’s my question.Then my request is, Dr. Kloos, who has valiantlybeen an endocrinologist at our meeting all along,has a really quick question when you’re done.

Dr James X. Wu: With regard to including Be-thesda category V nodules, I agree that those arealso indeterminate. But I think that at least at Uni-versity of California, Los Angeles, their risk of ma-lignancy is considered high enough that theendocrine surgeons would feel strongly aboutjust proceeding to diagnostic lobectomy withoutany further testing for category V. This is why welimited GEC testing to our analyses categories IIIand IV. Regarding the low threshold value for ma-lignancy rate where routine GEC testing becomescost-effective in our model, there are several sce-narios where routine GEC testing can be cost-effective. First, certain community centers mayhave malignancy rates below our cost-effectivenessthreshold, and simply need to audit their ownGEC test performance to confirm it is indeedcost-effective. For centers with higher malignancyrates, these institutions can stratify who they aretesting so that they are reducing their pretest prob-ability to less than 14%. Please also keep in mindthat when we set our threshold, $100,000 per qual-ity adjusted life-year, that was rather arbitrary. It’sbased on the amount that Medicare will pay fordialysis and adjusted for inflation. Actual healthcare spending in the United States is probably$300,000 per quality adjusted life-year. Dependingon how wealthy the patient is, they may be willingto spend a lot more for that little gain in quality. Ithink our results must be viewed in that context.

Dr Richard T. Kloos (South San Francisco, CA):My disclosure is that I am the Senior Medical Di-rector of Endocrinology at Veracyte, Inc. The is-sues of cost-effectiveness then include a long listof costs, the direct list of costs, and we didn’t giveyou time to show the many factors that you wouldhave had to have figured in for that cost. In

ARTICLE IN PRESSSurgeryj 2015

12 Wu et al

looking at that literature, I see a lot of these wherethe list is incomplete, where there are many factorsleft off, adding up into your direct costs. In addi-tion to direct costs, there’s the indirect costs ofhow long the patient is out from work, unable todo his or her normal function. I don’t know ifyour analysis will have included time missed fromthose activities of working or care of other jobsthat they have to do. Then, the third factor, I think,was mentioned a little bit was how do you value tothe patient avoiding an operation so that if eachperson in the room is offered a chance of ‘‘Ihave a 50-50 chance to get you out of an operation;what’s that worth to you personally?’’ Is that type ofvalue figured into your analysis?

Dr James X. Wu: To answer the first comment,you’re right, we did only address direct medicalcosts. Indirect costs such as lost productivity orintangible costs such as anxiety about surgery oranxiety about having an unknown nodule are not

included in our model. I think the reason whythose are not included is because those concerns,to address your second comment, are not thatwell characterized yet in this patient population.I think that’s a big area that could be lookedinto. Not that they are perfectly parallel, but I’malso involved with a lot of work currently lookingat nonoperative management of appendicitis. Sofar, with that early study, we found that a lot of pa-tients, even though you tell them, take antibiotics,you will feel better, you can go home, you’re cured,even though they have no symptoms, a lot willcome back requesting an interval appendectomybecause they are worried while patient anxietiesabout undergoing surgery and recovering fromsurgery are very real, conversely, the patient’sconcern about having a mass with an unknown,albeit low, malignant potential in their body isequally real and can cause an equal amount ofanxiety.