liver biopsy analysis has a low level of performance for ......liver biopsy analysis has a low level...

14
Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,* GILLES LENAOUR,* JEAN CHRISTOPHE VAILLANT,* FREDERIQUE CAPRON,* MONA MUNTEANU, DANIEL EYRAUD,* YEN NGO, HELMI M’KADA,* VLAD RATZIU,* LAURENT HANNOUN,* and FREDERIC CHARLOTTE* *Assistance Publique Hôpitaux de Paris, Université Pierre et Marie Curie Liver Center; and Biopredictive, Paris, France BACKGROUND & AIMS: There is controversy about the performance of noninvasive tests such as FibroTest in diagnos- ing intermediate stages of fibrosis. We investigated whether this controversy results from limitations of biopsy analysis for in- termediate-stage fibrosis and inappropriate determination of the standard area under the receiver-operator characteristic curve (AUROC). METHODS: To determine whether biopsy has a lower diagnostic performance for fibrosis stage F2 (few septa) vs F1 (fibrosis without septa), compared with its perfor- mance for F1 vs F0 or F4 vs F3, we determined the fibrotic areas of large surgical samples collected from 20 consecutive patients with chronic liver disease or normal liver tissue that surrounded tumors. We analyzed digitized images of 27,869 virtual biopsies of increasing length and also analyzed data from 6500 patients with interpretable FibroTest results who also underwent biopsy analysis. RESULTS: The overall performance of biopsy anal- ysis (by Obuchowski measure) increased with biopsy length from 0.885 for 5-mm to 0.912 for 30-mm samples (P .0001). The performance of biopsy was lower for the diagnosis of F2 vs F1 samples (weighted AUROC [wAUROC] 0.505) than for F1 vs F0 (wAUROC 0.773; 53% difference; P .0001) or F4 vs F3 (wAUROC 0.700; 39% difference; P .0001), even when 30-mm biopsy samples were used. The performance of Fi- broTest was also lower for the diagnosis of F2 vs F1 samples (wAUROC 0.512) than for F1 vs F0 samples (wAUROC 0.626; 22% difference; P .0001) or F4 vs F3 (wAUROC 0.628; 23% difference; P .0001). However, the FibroTest had smaller percentage differences among wAUROC values than biopsy. CONCLUSIONS: Biopsy has a low level of diag- nostic performance for fibrosis stages F2 and F1. The rec- ommendation for biopsy analysis, instead of a validated biomarker panel such as FibroTest, for the diagnosis of intermediate stages of fibrosis is therefore misleading. Keywords: Biomarkers; FibroTest; FibroSure; Gray Zone; Obu- chowski Measure. T wo noninvasive fibrosis biomarkers have been extensively validated 1,2 : FibroTest (Biopredictive, Paris, France) 3–7 and liver stiffness measurement by elastography (Fibroscan; Echo- sens, Paris France). 4,8 They are widely used in countries where they are available for the staging of patients with chronic liver diseases. 8,9 However, there is still controversy concerning their performance for the diagnosis of intermediate fibrosis stages in chronic liver disease, 1,3 with 2 opposing recommendations. The majority of reviews, from the first in 2002 10 to the most recent in 2011, 1 have stated that “serum markers are best at predicting no or minimal fibrosis (METAVIR score F0F1) or at predicting extensive fibrosis/cirrhosis (score F3F4), and are poor predictors of intermediate levels of fibrosis (score F1F2).” Therefore, the recommendation of these authors is to still perform biopsy after biomarkers such as FibroTest had predicted F1 or F2. Independent from the tests’ inventors, a very small number of other authors recommend validated biomarkers such as FibroTest as first-line estimates of fibrosis and that decisions be made for all predicted fibrosis stages, discerning no difference between stage F1 or F2 compared with stages F0, F3, or F4. This is the strategy that has been recommended by the French Health Authority since 2006. 3 Our group has published several multivariate fibrosis tests since 1991, including FibroTest in 2001. 3 During this period, biopsy was considered to be a near perfect gold standard, and we supported the recommendation that biopsy be performed for the intermediate stages. Subsequently, cumulative evidence- based data convinced us that this recommendation was mis- leading because of the limitations of biopsy, which is not a gold standard, as well as inappropriate interpretation of the stan- dard area under the receiver-operator characteristic curves (AUROC). 3,11–15 The aim of the present study was to demonstrate that rela- tive to biopsy performance, a biomarker such as FibroTest does not have lower diagnostic performance for METAVIR stage F2 (few septa) vs F1 (fibrosis without septa) compared with its performance for the diagnosis of F1 vs F0 or F4 vs F3. To support this claim, we first demonstrated, by using large sur- gical biopsies as a true gold standard, that biopsy had a lower diagnostic performance for F2 vs F1 compared with F1 vs F0 and F4 vs F3. We then demonstrated, by using a large integrated patient database, that FibroTest had a profile similar to that of biopsy for the diagnosis of F2 vs F1 compared with F1 vs F0 and F4 vs F3, but with a relatively lower decrease in the so-called gray zone. Abbreviations used in this paper: AF, area of fibrosis; ALD, alcoholic liver disease; AUROC, area under the receiver-operator characteristic curve; CHB, chronic hepatitis B; CHC, chronic hepatitis C; HBV, hepa- titis B virus; HCV, hepatitis C virus; NAFLD, nonalcoholic fatty liver disease; sAUROC, standard area under the receiver-operator charac- teristic curve; wAUROC, weighted area under the receiver-operator characteristic curve. © 2012 by the AGA Institute 1542-3565/$36.00 http://dx.doi.org/10.1016/j.cgh.2012.01.023 CLINICAL GASTROENTEROLOGY AND HEPATOLOGY 2012;10:657– 663

Upload: others

Post on 24-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

picttc

3b(00sbnobi

Kc

tdpc

CLINICAL GASTROENTEROLOGY AND HEPATOLOGY 2012;10:657–663

Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis ofIntermediate Stages of Fibrosis

THIERRY POYNARD,* GILLES LENAOUR,* JEAN CHRISTOPHE VAILLANT,* FREDERIQUE CAPRON,* MONA MUNTEANU,‡

DANIEL EYRAUD,* YEN NGO,‡ HELMI M’KADA,* VLAD RATZIU,* LAURENT HANNOUN,* and FREDERIC CHARLOTTE*

*Assistance Publique Hôpitaux de Paris, Université Pierre et Marie Curie Liver Center; and ‡Biopredictive, Paris, France

bwfblsd(

BACKGROUND & AIMS: There is controversy about theerformance of noninvasive tests such as FibroTest in diagnos-

ng intermediate stages of fibrosis. We investigated whether thisontroversy results from limitations of biopsy analysis for in-ermediate-stage fibrosis and inappropriate determination ofhe standard area under the receiver-operator characteristicurve (AUROC). METHODS: To determine whether biopsy

has a lower diagnostic performance for fibrosis stage F2 (fewsepta) vs F1 (fibrosis without septa), compared with its perfor-mance for F1 vs F0 or F4 vs F3, we determined the fibrotic areasof large surgical samples collected from 20 consecutive patientswith chronic liver disease or normal liver tissue that surroundedtumors. We analyzed digitized images of 27,869 virtual biopsiesof increasing length and also analyzed data from 6500 patientswith interpretable FibroTest results who also underwent biopsyanalysis. RESULTS: The overall performance of biopsy anal-ysis (by Obuchowski measure) increased with biopsy lengthfrom 0.885 for 5-mm to 0.912 for 30-mm samples (P � .0001).The performance of biopsy was lower for the diagnosis of F2 vsF1 samples (weighted AUROC [wAUROC] � 0.505) than for F1vs F0 (wAUROC � 0.773; 53% difference; P � .0001) or F4 vs F3(wAUROC � 0.700; 39% difference; P � .0001), even when

0-mm biopsy samples were used. The performance of Fi-roTest was also lower for the diagnosis of F2 vs F1 sampleswAUROC � 0.512) than for F1 vs F0 samples (wAUROC �.626; 22% difference; P � .0001) or F4 vs F3 (wAUROC �.628; 23% difference; P � .0001). However, the FibroTest hadmaller percentage differences among wAUROC values thaniopsy. CONCLUSIONS: Biopsy has a low level of diag-ostic performance for fibrosis stages F2 and F1. The rec-mmendation for biopsy analysis, instead of a validatediomarker panel such as FibroTest, for the diagnosis of

ntermediate stages of fibrosis is therefore misleading.

eywords: Biomarkers; FibroTest; FibroSure; Gray Zone; Obu-howski Measure.

Two noninvasive fibrosis biomarkers have been extensivelyvalidated1,2: FibroTest (Biopredictive, Paris, France)3–7 and

liver stiffness measurement by elastography (Fibroscan; Echo-sens, Paris France).4,8 They are widely used in countries wherehey are available for the staging of patients with chronic liveriseases.8,9 However, there is still controversy concerning theirerformance for the diagnosis of intermediate fibrosis stages inhronic liver disease,1,3 with 2 opposing recommendations.

The majority of reviews, from the first in 200210 to the mostrecent in 2011,1 have stated that “serum markers are best at

predicting no or minimal fibrosis (METAVIR score F0�F1) or

at predicting extensive fibrosis/cirrhosis (score F3�F4), and arepoor predictors of intermediate levels of fibrosis (scoreF1�F2).” Therefore, the recommendation of these authors is tostill perform biopsy after biomarkers such as FibroTest hadpredicted F1 or F2.

Independent from the tests’ inventors, a very small numberof other authors recommend validated biomarkers such asFibroTest as first-line estimates of fibrosis and that decisions bemade for all predicted fibrosis stages, discerning no differencebetween stage F1 or F2 compared with stages F0, F3, or F4. Thisis the strategy that has been recommended by the FrenchHealth Authority since 2006.3

Our group has published several multivariate fibrosis testssince 1991, including FibroTest in 2001.3 During this period,

iopsy was considered to be a near perfect gold standard, ande supported the recommendation that biopsy be performed

or the intermediate stages. Subsequently, cumulative evidence-ased data convinced us that this recommendation was mis-

eading because of the limitations of biopsy, which is not a goldtandard, as well as inappropriate interpretation of the stan-ard area under the receiver-operator characteristic curves

AUROC).3,11–15

The aim of the present study was to demonstrate that rela-tive to biopsy performance, a biomarker such as FibroTest doesnot have lower diagnostic performance for METAVIR stage F2(few septa) vs F1 (fibrosis without septa) compared with itsperformance for the diagnosis of F1 vs F0 or F4 vs F3. Tosupport this claim, we first demonstrated, by using large sur-gical biopsies as a true gold standard, that biopsy had a lowerdiagnostic performance for F2 vs F1 compared with F1 vs F0and F4 vs F3. We then demonstrated, by using a large integratedpatient database, that FibroTest had a profile similar to that ofbiopsy for the diagnosis of F2 vs F1 compared with F1 vs F0 andF4 vs F3, but with a relatively lower decrease in the so-calledgray zone.

Abbreviations used in this paper: AF, area of fibrosis; ALD, alcoholicliver disease; AUROC, area under the receiver-operator characteristiccurve; CHB, chronic hepatitis B; CHC, chronic hepatitis C; HBV, hepa-titis B virus; HCV, hepatitis C virus; NAFLD, nonalcoholic fatty liverdisease; sAUROC, standard area under the receiver-operator charac-teristic curve; wAUROC, weighted area under the receiver-operatorcharacteristic curve.

© 2012 by the AGA Institute1542-3565/$36.00

http://dx.doi.org/10.1016/j.cgh.2012.01.023

Page 2: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

fifics(wfivo

caN

658 POYNARD ET AL CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 10, No. 6

MethodsPatientsLarge surgical samples of resected livers from 20 con-

secutive patients with chronic liver diseases or normal liversurrounding tumors were prospectively included (Supplemen-tary Table 1). These patients must have accepted and signed aconsent form and undergone hepatectomy or liver transplanta-tion. Resections were performed mainly because of liver tumors,and samples were taken from a distance at least 3 cm away fromthe tumor.

We combined new individual data with a previous inte-grated database16 after exclusion of duplicates to assess theperformance of FibroTest between adjacent stages. In thesestudies informed consent was obtained for all patients. Fi-broTest was performed according to published recommen-dations by using the standard cutoffs.9 This study was con-ducted according to the principles expressed in theDeclaration of Helsinki.

BiopsiesLiver surgical samples were fixed in formalin and

paraffin embedded. Six paraffin blocks were chosen for eachcase. One 3-�m–thick section was taken from each block toproduce 3 glass microscope slides. Each of these slides wasbuilt from 2 side-by-side sections of liver tissue, yielded anaverage area of 4 � 1.7 cm, and was stained with picrosiriusred, which was performed in the same batch experiment.These sections were used for determination of the referencevalue of fibrosis and image analysis. According to the META-VIR scoring system,11 the study included 5 cases without

brosis (F0), 2 with fibrosis without septa (F1), 4 with portalbrosis and few septa (F2), 2 with septal fibrosis withoutirrhosis (F3), and 7 with cirrhosis (F4). The METAVIRcoring system was used for patients with chronic hepatitis CCHC) and chronic hepatitis B (CHB); for nonviral diseasese used an “extrapolated METAVIR” with stage F1 “minimalbrosis regardless of its location but without septa” as pre-iously described6,7,17,18 (Supplementary Materials and Meth-ds).

Image AnalysisThree slides from each large surgical specimen were

scanned to produce digital slides (0.25 �m/pixel at 40� mag-nification) by using the Aperio Slide Scanning System (Scan-Scope CS; Aperio Technologies Inc, Vista, CA). The slides wereviewed on a Barco Coronis fusion 6MP high-definition screen(Barco NV, Kortrijk, Belgium) to determine the detectionthresholds and were analyzed with ICS software (Tribvn SA,Chatillon, France). The surface of the section occupied by theparenchyma was used as the measuring frame (reference area).The surface occupied by fibrosis (area of fibrosis [AF]) wasmeasured within the earlier defined frame. The reference valueof AF was measured on the total surgical sample. The wholedigital section was overlaid with rectangular regions (5 � 1mm). Multiple virtual biopsies were reconstructed by digitalcutting out and juxtaposition of these elementary rectangularregions, mimicking needle biopsies. The same analysis wasperformed for biopsy specimens from 5 mm to 30 mm in length

(Supplementary Materials and Methods).

Statistical AnalysisThe main end point was the comparison between the

performance of a 30-mm biopsy and FibroTest for the diagnosisof METAVIR F2 vs F1.

The performances were assessed by both the standardAUROC (sAUROC) and the weighted AUROCs (wAUROCs)derived from the Obuchowski measure to overcome both thespectrum effect and ordinal scale.14 This measure compares 2biomarkers with a single test, avoiding appropriate correctionfor the type I error when comparing different stages (Supple-mentary Materials and Methods). The sAUROC was estimatedby the method of Delong et al19 after exclusion of the nonad-jacent stages. These measures were compared by using the Ztest.

To directly compare the AUROCs, a meta-analysis of biopsyand FibroTest results was performed, stratified by each adjacentstage’s comparison by using a random effect model. The statis-tical comparisons used Cochran Q heterogeneity test (Q).

To express the FibroTest performance for intermediatestages (F2 vs F1) compared with its performance for extremestages (F1 vs F0 and F4 vs F3) relative to biopsy performance,the following estimates were assessed: (1) the difference inAUROCs relative to F2 vs F1, calculated as [AUROC(Fx vs Fx-1)� AUROC(F2 vs F1)/AUROC(F2 vs F1)], and (2) the relativedecrease of FibroTest vs biopsy, calculated as [Difference rela-tive to F2 vs F1 (FibroTest) � Difference relative to F2 vs F1(Biopsy)/Difference relative to F2 vs F1 (Biopsy)]. Strengths ofcorrelation between FibroTest and AF (expressed as log10) ac-ording to biopsy length used Pearson correlation coefficientnd were compared with the Z test. Analyses were performed onCSS software (Kaysville, UT)20 and on R software.

ResultsPatientsTwenty patients were included (Supplementary Table 1)

for the image analysis. Five patients had normal liver, 4 nonal-coholic fatty liver disease (NAFLD), 4 CHB, 3 CHC, 3 alcoholicliver disease (ALD), and 1 primary biliary cirrhosis.

A total of 6500 patients were included with interpretableFibroTest and simultaneous biopsy: 3720 patients with CHC,1625 patients with CHB, 246 patients with ALD, and 909patients with NAFLD (Supplementary Table 2).

Image AnalysisA total of 27,864 virtual biopsy specimens were ob-

tained, with 4644 specimens for each of the 6 lengths from 5mm to 30 mm; these included 8094 (6 � 1349) specimens forthe intermediate stages of F1 and F2. The reference value of AFwas between 1.6% and 23.7% for individual cases (Supplemen-tary Table 1). The cutoffs for each METAVIR stage were 6.2% forF1, 6.8% for F2, 8.9% for F3, and 16.5% for F4 (Table 1).

Biopsy PerformanceAn increase of overall performance was observed ac-

cording to biopsy length (Table 2). However, the only compar-ison without a significant increase according to length was thediagnosis of F2 vs F1; sAUROC was 0.547 for 5-mm and 0.549for 30-mm biopsies. Similarly, wAUROCs were 0.509 for 5-mm

and 0.505 for 30-mm biopsies.
Page 3: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

Tst

Fw

e

(Pb

FFFFF

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 659

The dispersion of AF decreased sharply with the increasinglength of the specimen (Figure 1, Supplementary Figures 1 and2). Despite significant differences between means, these dotblots demonstrated the large overlap of AF between F2 and F1,even for a 30-mm biopsy.

By using sAUROC, biopsy performance for F2 vs F1 (0.549)was not lower compared with F1 vs F0 (0.562) but was signif-icantly lower when compared with F4 vs F3 (0.862; P � .0001).

he sAUROCs for advanced fibrosis and cirrhosis increasedignificantly according to biopsy length (Table 2) (Supplemen-ary Figure 3).

By using wAUROC, biopsy performance was lower for F2 vs1 (0.505) compared both with F1 vs F0 (0.773; P � .001) andith F4 vs F3 (0.700; P � .001).

Performance of Fibrotest for Diagnosis ofEach Fibrosis StageThe same profiles were observed for FibroTest and for

biopsy.By using sAUROC, FibroTest performance was not signifi-

cantly lower for F2 vs F1 (0.666) compared with F1 vs F0 (0.697)

Table 1. Correspondence Between AF as Assessed byImage Analysis and METAVIR Stage

METAVIRscorestage

Virtualbiopsies,no. (%)

Area of fibrosis by image analysis

Mean (95% confidenceinterval) Reference range

0 1026 3.3 (3.1�3.5) 0.0�6.21 407 6.6 (6.2�7.0) 6.2�6.82 942 7.0 (6.8�7.3) 6.8�8.93 540 9.5 (8.9�10.2) 8.9�16.54 1734 18.2 (16.5�17.6) �16.5

Table 2. Performance of Biopsy for Diagnosis of Fibrosis Sta

Length

Pair-wise: adjacent stage

F1 vs F0 F2 vs F1 F3 v

Number 1428 1349 14Standard AUROC, mean (standard error)

5 mm 0.541 (0.016) 0.547 (0.017) 0.595 (10 mm 0.557 (0.016) 0.547 (0.017) 0.619 (15 mm 0.561 (0.017) 0.548 (0.017) 0.639 (20 mm 0.563 (0.017) 0.549 (0.017) 0.656 (25 mm 0.562 (0.017) 0.549 (0.017) 0.670 (30 mm 0.562 (0.017) 0.549 (0.017) 0.680 (

Weighted AUROC, mean (standard error)

5 mm 0.713 (0.013) 0.509 (0.013) 0.503 (10 mm 0.723 (0.013) 0.510 (0.013) 0.504 (15 mm 0.756 (0.013) 0.505 (0.013) 0.526 (20 mm 0.766 (0.013) 0.511 (0.013) 0.537 (25 mm 0.772 (0.013) 0.515 (0.013) 0.551 (30 mm 0.773 (0.013) 0.505 (0.013) 0.552 (

aTotal is 4644 per class of length, because 5 virtual biopsies withou

bStatistical significance of Z test comparing the measure between 2 cons

but was significant compared with F4 vs F3 (0.680; P � .0001)(Table 3).

By using wAUROC, FibroTest performance was lower for F2vs F1 (0.512) compared both with F1 vs F0 (0.626; P � .001) andwith F4 vs F3 (0.628; P � .001).

The same trends were observed in patients with hepatitis Cvirus (HCV) and hepatitis B virus (HBV) but not in patientswith ALD and NAFLD. In these diseases, the performance ofFibroTest for F2 vs F1 was still lower than for F4 vs F3 but wasgreater than for the diagnosis of F1 vs F0, contrary to CHC andCHB.

Comparison Between Performances ofFibroTest and Biopsy for Adjacent StagesThe comparisons of performances according to meta-

analyses of AUROCs stratified by fibrosis stage are given inFigure 2A for sAUROC and in Figure 2B for wAUROCs.

The wAUROCs for both biopsy and FibroTest for the diag-nosis of F2 vs F1 were not significant vs random and weresignificantly lower (all P � .001) than the wAUROCs for thextreme stage comparisons of F0 vs F1 and F4 vs F3.

For biopsy there was a heterogeneity between the sAUROCsQ � 219; P � .0001) and between the wAUROCs (Q � 117;� .0001). For FibroTest there was no significant heterogeneity

etween the sAUROCs (Q � 3; P � .42) but a significantheterogeneity between the wAUROCs (Q � 48; P � .0001).

The decrease in diagnostic performance for the intermediatestages (F2 vs F1) compared with the extreme stages (F1 vs F0and F4 vs F3) was lower for FibroTest than for biopsy. Thedecrease in the wAUROC was 22% for FibroTest and 53% forbiopsy compared with F1 vs F0 and 23% for FibroTest and 39%for biopsy compared with F4 vs F3. Relative to biopsy, thedecrease in FibroTest wAUROCs for F2 vs F1 was 58% lower[(22%�53%)/53%] compared with F1 vs F0 and 41% lower

ccording to Length of Specimen

Comparison

Stages combined

F4 vs F3 F234 vs F01 F4 vs F0123

2274 4644a 4644a

5) 0.787 (0.011) 0.862 (0.006) 0.884 (0.005)5) 0.819 (0.010) 0.882 (0.005) 0.915 (0.004)5) 0.836 (0.009) 0.893 (0.005) 0.930 (0.004)5) 0.848 (0.009) 0.900 (0.005) 0.939 (0.004)5) 0.857 (0.009) 0.905 (0.004) 0.944 (0.003)4) 0.862 (0.009) 0.908 (0.004) 0.947 (0.003)

Obuchowski measure (standard error)b

All pair-wise comparisons4) 0.631 (0.010) 0.885 (0.002) P � .00014) 0.644 (0.010) 0.896 (0.002) P � .00014) 0.667 (0.010) 0.906 (0.002) P � .0074) 0.693 (0.010) 0.910 (0.002) P � .094) 0.703 (0.010) 0.912 (0.002) P � .714) 0.700 (0.010) 0.912 (0.002)

tiguous specimens were excluded.

ge A

s

s F2

82

0.010.010.010.010.010.01

0.010.010.010.010.010.01

t con

ecutive lengths (P value).
Page 4: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

0

lss

psFsto

cFgwa

g to b

660 POYNARD ET AL CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 10, No. 6

[(23%�39%)/39%] compared with F4 vs F3 (SupplementaryTable 3).

An increased correlation coefficient was observed betweenFibroTest and AF according to biopsy length from R � 0.685 to

.777 (P � .001) (Table 4).

DiscussionWe acknowledge that biopsy provides much more in-

formation than just quantification of fibrosis. Biopsy is the onlydirect estimate of hepatic features and is still useful whenindirect estimates are noninterpretable or discordant. The pointof the present study was to revisit the misleading usual state-ment that there is a gray zone for biomarkers relative to biopsy.

If biopsy were a true gold standard (no false positives/negatives), the analyses of published studies would concludethat biomarkers such as FibroTest had lower diagnostic perfor-mance for intermediate stages F2 vs F1 than for extreme stagesF4 vs F3 and F1 vs F0. Because biopsy has an inherent 25%error, other methods are needed for interpreting these results.

The results of the present study confirm the a priori hypoth-esis that relative to biopsy, FibroTest has no gray zone. The

Figure 1. Distribution of AF (log10) accordin

apparent gray zone for the diagnosis of intermediate-stage F2 vs m

F1 is an artifact. This artifact, repeated in almost all reviews onbiomarkers for the last 10 years,1,2,10,21,22 is mainly due to theower diagnostic performance of biopsy for the intermediatetage and to the inappropriate interpretation of the standardtatistic, sAUROC.

Limitations of BiopsyIn their landmark study with large surgical biopsies as

a gold standard, Bedossa et al12 already observed that theerformance of biopsy was lower for the intermediate adjacenttages of F2 vs F1 and better for the extreme stages of F0 and4. Our results confirm these observations, and we demon-trated that the magnitude of the differences between the in-ermediate and extreme stages for biopsy was greater than thosebserved for FibroTest.

Pathologist variability is rarely discussed but probably alsoontributes to the lower diagnostic performance of biopsy for2 vs F1 compared with the extreme stages. There was a trueray zone with a U-shaped curve for the kappa agreement,hich was lower for F1 (0.39) and F2 (0.37) than for F0 (0.52)nd F4 (0.86).13 The most commonly noted causes of disagree-

iopsy length and METAVIR scoring system.

ent between pathologists for fibrosis staging concerned the

Page 5: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

o

t

T

F

B

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 661

diagnosis of F1 or F2: “uncertain distinction between no portalfibrosis and mild portal fibrosis, difficult assessment of truebridging fibrosis versus some normal large portal tract exten-sion, and incompletely represented septum located in specimenperiphery.” Therefore, regardless of the variability of a bio-marker, the variability of the reference (biopsy) among theintermediate stages mathematically induced a decrease in theperformance of the biomarker for these stages compared withthe extreme stages.

Inappropriate Interpretation of Area Underthe Receiver-Operator Characteristic CurveThe use of the sAUROC raises 2 issues. First, its use is

based on the assumption that the gold standard is binary,whereas staging uses an ordinal scale. This difference impliesthat fibrosis stages in the study sample have to be aggregatedinto 2 groups, a process that can lead to discordant conclusionsdepending on how the groups are combined.3,14,15 Second, sAU-ROC can also be biased in the way the proportion of eachfibrosis stage in the sample fits the distribution in the referencepopulation to which the indexes are applied. As a result, thecomparison of sAUROCs based on samples with different stagedistributions might be flawed (spectrum effect).3,14,15

An example of inappropriate interpretation is the followingstatement: “an AUROC �0.90 is excellent; an AUROC between0.60 and 0.70 is poor”.21 False. These statements are misleading,because according to the prevalence of each stage, the sAUROCfor advanced fibrosis (F2 F3 F4 vs F0 F1) of the same test in thesame disease varied from 0.67 to 0.98.3,14,15

Advantages of the StudyThis is a specific study of the performance of biopsy for

able 3. Performance of FibroTest for Diagnosis of Each Fibr

Disease

Between adjac

F1 vs F0 F2 vs F1

ibroTest, alln 2766 vs 943 1120 vs 2766sAUROC 0.697 (0.010) 0.666 (0.010)wAUROC 0.626 (0.009) 0.512 (0.013)

y diseaseFibroTest HCV

n 1550 vs 319 708 vs 1550sAUROC 0.657 (0.016) 0.662 (0.012)wAUROC 0.606 (0.015) 0.505 (0.011)

FibroTest HBVn 801 vs 194 288 vs 801sAUROC 0.685 (0.022) 0.649 (0.019)wAUROC 0.616 (0.020) 0.520 (0.018)

FibroTest ALDn 65 vs 16 48 vs 65sAUROC 0.500 (0.77) 0.654 (0.053)wAUROC 0.561 (0.064) 0.615 (0.045)

FibroTest NAFLDn 350 vs 414 76 vs 350sAUROC 0.607 (0.020) 0.671 (0.036)wAUROC 0.537 (0.013) 0.561 (0.032)

the diagnosis of F2 vs F1 vs the other comparisons between f

adjacent stages. We used more recent imaging techniques andother liver diseases than CHC,12 a larger database, with moreappropriate statistical methods than previously published.1,2,10

Limitations of the StudyBiomarkers such as FibroTest have been validated by

using biopsy as a reference. Because biopsy is not a perfectreference, the true performance of FibroTest is unknown by thestandard methods that used biopsy as the reference. To makeprogress, there are 2 options; one is to use a true gold standard,and the other is to use methods without a gold standard.

To improve the estimate of fibrosis tests toward their trueperformances, we previously validated a methodology, thestrength of concordance; we also recently applied the latentclass method to assess the performance of biopsy, FibroTest,and Fibroscan.16,23 In the present study we used this concept ofthe strength of concordance between one imperfect “indirect”estimate of liver fibrosis, ie, FibroTest, and AF of virtual biopsyas a second imperfect “direct” estimate. As expected, thestrength of concordance of FibroTest with AF increased accord-ing to the length of biopsies. However, this method was limitedbecause the “patient’s factor” was artificially exaggerated as aresult of the small number of patients per disease, comparedwith the number of virtual biopsies. Other fibrosis biomarkersshould also be tested.

One limitation is that the METAVIR scoring system wascreated for viral hepatitis. However, published evidence hasdemonstrated the applicability of the extrapolated score usedfor NAFLD and ALD7,17,18 (Supplementary Materials and Meth-

ds).The progress in the treatment for HBV and HCV has reduced

he importance of the diagnosis of F2, which was the threshold

Stage

Comparison

tages only All pair-wisecomparisons,

Obuchowski measureF3 vs F2 F4 vs F3

780 vs 1120 891 vs 780 65000.669 (0.012) 0.680 (0.013)0.522 (0.011) 0.628 (0.013) 0.866 (0.002)

543 vs 708 600 vs 543 37200.644 (0.016) 0.698 (0.015)0.510 (0.018) 0.631 (0.015) 0.860 (0.003)

171 vs 288 171 vs 171 16250.723 (0.025) 0.534 (0.031)0.559 (0.023) 0.567 (0.029) 0.851 (0.005)

23 vs 48 94 vs 23 2460.663 (0.069) 0.826 (0.054)0.587 (0.056) 0.783 (0.050) 0.888 (0.011)

43 vs 76 26 vs 43 9090.716 (0.050) 0.558 (0.075)0.532 (0.046) 0.500 (0.066) 0.860 (0.005)

osis

ent s

or initiating treatment.22 However, because of the cost of HBV

Page 6: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

Smsap

l

11223

662 POYNARD ET AL CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 10, No. 6

and HCV treatments, it remains useful to discuss the efficiencyof treating patients who have no progression of their baselinestage F0 or F1. In HCV patients re-treated by pegylated inter-feron and ribavirin, there was independent prognostic value ofF4 vs F3 and F3 vs F2,3 and it could be the same for tri-therapies.

Recent publications demonstrated that the identification ofF2 is still important, at least in US guidelines and in manyrecommendations for CHC.1 The American Association for the

tudy of Liver Diseases guidelines in 2009 still stated: “Thesearkers are useful for establishing the two ends of the fibrosis

pectrum (minimal fibrosis and cirrhosis) but are less helpful inssessing the mid-ranges of fibrosis or for tracking fibrosisrogression”.22 False.

The present results demonstrate that FibroTest is a contin-uous quantitative biomarker of fibrosis and is therefore usefulwhere biopsy is not. It would permit to assess the dynamic of

Figure 2. Meta-analysis of performance of biopsy and FibroTest fordiagnosis of adjacent fibrosis stages by using AUROCs, standardized(A) and weighted (B). Horizontal lines indicate 95% confidence intervalfor mean difference between FibroTest and biopsy AUROCs and ran-dom (0.500). Vertical lines indicate equivalence line. When horizontal

ine crosses vertical line, there is no significant difference. Ave, average.

fibrosis progression or regression, particularly for estimatingthe impact of new treatments.

For ALD and NAFLD there are no approved treatments.These patients have multiple organ injuries, and FibroTest ispermitted to prioritize the different risks. In our experience,most of ALD and NAFLD patients have steatosis presumedwith SteatoTest with minimal fibrosis by using FibroTest,3 andtherefore the priority is the alcohol dependence for ALD or thecardiovascular risk for NAFLD.

ConclusionTo prevent misleading statements concerning the diag-

nostic performance of biomarkers for the intermediate stages,we suggest the following recommendations:

1. To take into account that the risk of biopsy’s error isgreater between stage F2 vs F1 than for the extreme stagesF1 vs F0 and F4 vs F3.

2. To assess the performance of a biomarker by Obuchowskimeasure. As for biopsy, lower wAUROCs are expected forthe intermediate stages, but this is not proof of a grayzone relative to biopsy.

3. To avoid the presentation of sAUROCs of multiple stagecombinations in a table (ie, F4 vs F2 F3 F4; F4 F3 vs F2F1 F0; F4 F3 F2 vs F1 F0) to prevent the spectrum effect.

Finally, guidelines for the indications of fibrosis tests shouldtake into consideration these evidence-based data. At least forFibroTest, there is no scientific reason for recommending abiopsy for the diagnosis of intermediate fibrosis stages.1–3

Supplementary MaterialNote: To access the supplementary material accompa-

nying this article, visit the online version of Clinical Gastroen-terology and Hepatology at www.cghjournal.org, and at doi:10.1016/j.cgh.2012.01.023.

References

1. Nguyen D, Talwalkar JA. Noninvasive assessment of liver fibrosis.Hepatology 2011;53:2107–2110.

2. European Association for the Study of the Liver. EASL ClinicalPractice Guidelines: management of hepatitis C virus infection.J Hepatol 2011;55:245–264.

Table 4. Strength of Correlation Between FibroTest and AF(Expressed in Log10), According to Biopsy Length

Biopsy length,n � 4374 per

lengthaPearson

coefficient95% Confidence

intervalSignificance

(P value)

5 mm 0.685 0.669�0.700 �.00010 mm 0.728 0.714�0.742 .045 mm 0.750 0.736�0.762 .060 mm 0.763 0.750�0.776 .195 mm 0.772 0.759�0.783 .130 mm 0.777 0.765�0.789

aTotal number of virtual biopsies per length without missing referenceFibroTest was 4374 and not 4644, because 1 patient had no Fi-broTest assessment.

3. Poynard T. First-line assessment of patients with chronic liver

Page 7: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

1

1

1

1

1

1

1

1

1

1

2

2

2

2

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 663

disease with non-invasive techniques and without recourse toliver biopsy. J Hepatol 2011;54:586–587.

4. Vergniol J, Foucher J, Terrebonne E, et al. Non-invasive tests forfibrosis and liver stiffness predict 5-year outcomes of patientswith chronic hepatitis C. Gastroenterology 2011;140:1970–1979.

5. Poynard T, Ngo Y, Munteanu M, et al. Noninvasive markers ofhepatic fibrosis in chronic hepatitis B. Curr Hepatol Rep 2011;10:87–97.

6. Naveau S, Gaudé G, Asnacios A, et al. Diagnostic and prognosticvalues of noninvasive biomarkers of fibrosis in patients withalcoholic liver disease. Hepatology 2009;49:97–105.

7. Poynard T, Lassailly G, Diaz E, et al. Performance of biomarkersFibroTest, ActiTest, SteatoTest, and NashTest in patients withsevere obesity: meta analysis of individual patient data. Plos One2012;7:e30325.

8. Castéra L, Foucher J, Bernard PH, et al. Pitfalls of liver stiffnessmeasurement: a 5-year prospective study of 13,369 examina-tions. Hepatology 2010;51:828–835.

9. Poynard T, Munteanu M, Deckmyn O, et al. Applicability andprecautions of use of liver injury biomarker FibroTest: a reap-praisal at 7 years of age. BMC Gastroenterol 2011;11:39.

0. Gebo KA, Herlong HF, Torbenson MS, et al. Role of liver biopsy inmanagement of chronic hepatitis C: a systematic review. Hepa-tology 2002;36:S161–S172.

1. Bedossa P, Poynard T. An algorithm for the grading of activity inchronic hepatitis C: the METAVIR Cooperative Study Group. Hepa-tology 1996;24:289–293.

2. Bedossa P, Dargère D, Paradis V. Sampling variability of liverfibrosis in chronic hepatitis C. Hepatology 2003;38:1449–1457.

3. Rousselet MC, Michalak S, Dupré F, et al. Sources of variabilityin histological scoring of chronic viral hepatitis. Hepatology2005;41:257–264.

4. Lambert J, Halfon P, Penaranda G, et al. How to measure thediagnostic accuracy of noninvasive liver fibrosis indices: the areaunder the ROC curve revisited. Clin Chem 2008;54:1372–1378.

5. Guha IN, Myers RP, Patel K, et al. Biomarkers of liver fibrosis:what lies beneath the receiver operating characteristic curve?Hepatology 2011;54:1454–1462.

6. Poynard T, De Ledinghen V, Zarski JP, et al. Relative perfor-mances of FibroTest, Fibroscan and biopsy for assessing the

stage of liver fibrosis in patients with chronic hepatitis C: a step

toward the truth in the absence of a gold standard. J Hepatol2012;56:541–548.

7. Poynard T, Mathurin P, Lai CL, et al. A comparison of fibrosisprogression in chronic liver diseases. J Hepatol 2003;38:257–265.

8. Michalak S, Rousselet MC, Bedossa P, et al. Respective roles ofporto-septal fibrosis and centrilobular fibrosis in alcoholic liverdisease. J Pathol 2003;201:55–62.

9. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the ar-eas under two or more correlated receiver operating characteris-tic curves: a nonparametric approach. Biometrics 1988;44:837–845.

0. Hintze JL. NCSS 2007 user guide. Kaysville, UT: NumberCruncher Statistical Systems software, 2007.

1. Sebastiani G, Gkouvatsos K, Plebani M. Non-invasive assess-ment of liver fibrosis: it is time for laboratory medicine. Clin ChemLab Med 2011;49:13–32.

2. Ghany MG, Strader DB, Thomas DL, et al. Diagnosis, manage-ment, and treatment of hepatitis C: an update. Hepatology 2009;49:1335–1374.

3. Poynard T, Ingiliz P, Elkrief L, et al. Concordance in a worldwithout a gold standard: a new non-invasive methodology forimproving accuracy of fibrosis markers. Plos One 2008;3:e3857.

Reprint requestsAddress requests for reprints to: Thierry Poynard, MD, PhD, 47 Boule-

vard de l’Hôpital, 75651 Paris Cedex 13, France. e-mail: [email protected]; fax: 33-1-42-16-14-27.

Conflicts of interestThese authors disclose the following: Thierry Poynard is the inventor

of FibroTest (FibroSURE in United States), with a capital interest inBiopredictive, the company marketing the test. The patents belong tothe public organization Assistance Publique Hôpitaux de Paris. MonaMunteanu and Yen Ngo are employees of Biopredictive. The remainingauthors disclose no conflicts.

FundingSupported by APHP UPMC Liver Center and Asssociation pour la

Recherche sur les Maladies Virales et Hépatiques.

Page 8: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

scfiscfpwAf

ffih

wb(t

a((vc

vw

f

f

FH1B

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 663.e1

Supplementary Materials and MethodsApplicability of METAVIR Fibrosis ScoringSystem (Five Stages From F0 to F4) forNonalcoholic Fatty Liver Disease andAlcoholic Liver Disease1. The METAVIR scoring system was first used for

patients with CHC and CHB. For nonviral chronic liver diseases(ALD, NAFLD) we used as F1 minimal fibrosis regardless of itslocation but without septa, as previously described for naturalhistory comparisons1 and biomarker validation by using meta-analysis between all causes,2 ALD,3 and NAFLD.4 –7

The details of this “extrapolated common staging” werepublished in the 2003 article.1 A common fibrosis scoringystem similar to the METAVIR scoring system (extrapolatedommon stage) including 5 stages was constructed: stage 0, nobrosis; stage 1, fibrosis whatever its localization but withoutepta; stage 2, few septa; stage 3, many septa; and stage 4,irrhosis. For primary biliary cirrhosis, the initial staging was asollows: 0, fibrosis absent; 1, portal and periportal fibrosis; 2,resence of numerous septa; and 3, cirrhosis. In comparisonith the METAVIR system, a stage with few septa was missing.ll histologic reports were re-read to differentiate few septa

rom portal fibrosis.For ALD, after a preliminary reading of 30 biopsies, the

ollowing rules were adopted for the extrapolation: F0, nobrosis; F1–F2, fibrosis without alcoholic hepatitis; F3, alco-olic hepatitis without cirrhosis; and F4, cirrhosis.

To validate this procedure, a random sample of 504 biopsiesas obtained from the coordinator of each database and re-ready a central pathologist (Antonio Chedid) who staged fibrosiscommon “observed” fibrosis in 5 stages from F0 to F4), blindedo the extrapolated score and clinical data.

The kappa concordance rates between the common observednd the common extrapolated score were all highly significantP, 0:001). The strength of agreement was moderate for 3 stagesF0�F1 vs F2 vs F3�F4; 0.58), substantial for 2 stages (F0�F1s F2�F3�F4; 0.65), and almost perfect for cirrhosis vs non-irrhosis (0.84).

2. The bridging stages in HCV, HBV, NAFLD, and ALD areery similar including cirrhosis and were estimated in the sameay by the METAVIR scoring system for advanced fibrosis.8

3. In NALFD patients, same FibroTest distribution or per-formance was observed for using adapted METAVIR or thestandard scoring system for NAFLD by using stage 1 (F1) �zone 3 perisinusoidal/perivenular fibrosis, stage 2 (F2) � zone3 and periportal fibrosis, stage 3 (F3) � septal/bridging fibro-sis.6

4. Fibrosis stages and pathogenetic mechanisms are verysimilar in NAFLD and ALD.9

5. Among the 20 patients with large liver samples of thepresent study, the mean AF according to each METAVIR stagewas not different when different causes were concerned (casesdescribed in Supplementary Table S1), and the mean AFs mea-sured in 20 patients with HCV by Bedossa et al10 were theollowing:

For stage F0, in the present study 4 cases without liver diseaserom 1.6% to 4.1% and in Bedossa et al study10 4 HCV cases

1.8%�2.2%. For stage F1, 2 NAFLD 5.0% and 7.4% and in Bedossaet al study 3 HCV cases 3.6%�3.6%. For stage F2, 2 NAFLD were

6.9% and 7.9%, 2 HBV were 4.8% and 6.9%, also in the range of the

4 F2 of Bedossa et al for patients with HCV, 4.8%�7%. For stage3, 1 HBV 9.0% and 1 HCV 10.1%, and in Bedossa et al study, 3CV cases from 13.5% to 15.8%. For stage F4, 3 F4 with ALD

8.1%�19.0%, 2 HCV 14.7% and 17.0%, 1 HBV 19.9%, and inedossa et al study 3 HCV cases from 23.9% to 27.3%.

Image AnalysisThree picrosirius red–stained glass slides from each

large surgical specimen were scanned to produce high-qualityresolution digital slides (0.25 �m/pixel at 40�·magnification)by using the Aperio Slide Scanning System. The digital slideswere viewed on a Barco Coronis fusion 6MP high-definitionscreen to determine the detection thresholds with high accuracyand were analyzed with ICS framework image analysis software.

On a given microscopic field, the surface of the sectionoccupied by the hepatic parenchyma was used as the measuringframe (reference area). The surface occupied by fibrosis (AF) wasmeasured within the earlier defined frame.

For each case, the reference value of AF was measured on thetotal cross section of the surgical sample. The whole digital sectionwas overlaid with rectangular regions (5 � 1 mm). Multiple virtualbiopsies were reconstructed by digital cutting out and juxtaposi-tion of these elementary rectangular regions, mimicking needleliver biopsies. For example, all possible 30-mm-long virtual biopsyspecimens from a whole surgical section were obtained by con-structing all possible virtual images made up of the juxtapositionof 6 contiguous microscopic elementary rectangular regions. AFwas measured as described previously on each virtual biopsy spec-imen. The same analysis was performed for virtual biopsy speci-mens from 5 to 30 mm in length.

Statistical Methods: Obuchowski MeasureWith N (�5) categories of the gold standard outcome

(histologic fibrosis stage), the estimate of the AUROC of diagnos-tic tests for differentiating between categories (wAUROC) is aweighted average of the N(N—1)/2 (�10) different AUROCs cor-responding to all the pair-wise comparisons between 2 of the Ncategories. Each pair-wise comparison has been weighted to takeinto account the distance between fibrosis stages (ie, the numberof units on the ordinal scale). A penalty function proportional tothe difference in METAVIR units between grades was defined; itwas 0.25 when the difference between stages was 1 (adjacentstages), 0.50 when the difference was 2, 0.75 when the differencewas 3, and 1 when the difference was 4. We do not recalculate thewAUROC according to a reference prevalence of fibrosis stages,because there is no consensus for the different chronic liver dis-eases.

Only the Obuchowski measure was used for statistical com-parisons of the performance of biopsy for the diagnosis offibrosis stage according to the length of specimen (Table 2).

Obuchowski measure is not equivalent to the sAUROC andis expressed with standard error. Statistical significance used Ztest for comparison of Obuchowski measures between 2 con-secutive lengths (P value) (Table 2).

References

1. Poynard T, Mathurin P, Lai CL, et al. A comparison of fibrosisprogression in chronic liver diseases. J Hepatol 2003;38:257–265.

2. Poynard T, Morra R, Halfon P, et al. Meta-analyses of Fibrotestdiagnostic value in chronic liver disease. BMC Gastroenterology

2007;7:40.
Page 9: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

1

663.e2 POYNARD ET AL CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 10, No. 6

3. Naveau S, Raynard B, Ratziu V. et al. Biomarkers for the predic-tion of liver fibrosis in patients with chronic alcoholic liver dis-ease. Clin Gastroenterol Hepatol 2005;3:167–174.

4. Ratziu V, Massard J, Charlotte F, et al. the LIDO Study Group andthe CYTOL Study Group. Diagnostic value of biochemical markers(FibroTest-FibroSURE) for the prediction of liver fibrosis in pa-tients with non-alcoholic fatty liver disease. BMC Gastroenterol-ogy 2006;6:6.

5. Lassailly G, Caiazzo R, Hollebecque A, et al. Validation of non-invasive biomarkers (FibroTest, SteatoTest and NashTest) forprediction of liver injury in patients with morbid obesity. Eur JGastroenterol Hepatol 2011;23:499–506.

6. Adams LA, George J, Bugianesi E, et al. Complex non-invasive

fibrosis models are more accurate than simple models in non-

alcoholic fatty liver disease. J Gastroenterol Hepatol 2011;26:1536–1543.

7. Poynard T, Lassailly G, Diaz E, et al. Performance of biomarkersFibroTest, ActiTest, SteatoTest, and NashTest in patients withsevere obesity: meta analysis of individual patient data. PlosOne2012;7:e30325.

8. Michalak S, Rousselet MC, Bedossa P, et al. Respective roles ofporto-septal fibrosis and centrilobular fibrosis in alcoholic liverdisease. J Pathol 2003;201:55–62.

9. Lieber CS. CYP2E1: from ASH to NASH. Hepatol Res 2004;28:1–11.

0. Bedossa P, Dargère D, Paradis V. Sampling variability of liverfibrosis in chronic hepatitis C. Hepatology 2003;38:1449–

1457.
Page 10: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 663.e3

0.0

20.0

40.0

60.0

80.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Subjects 1-4 F0, 5-7 F1, 8-11 F2, 12-13 F3, 14-20 F4

% F

ibro

sis

0

20

40

60

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Subjects 1-4 F0, 5-7 F1, 8-11 F2, 12-13 F3, 14-20 F4

% F

ibro

sis

0

20

40

60

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Subjects 1-4 F0, 5-7 F1, 8-11 F2, 12-13 F3, 14-20 F4

% F

ibro

sis

0

20

40

60

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Subjects 1-4 F0, 5-7 F1, 8-11 F2, 12-13 F3, 14-20 F4

% F

ibro

sis

0

20

40

60

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Subjects 1-4 F0, 5-7 F1, 8-11 F2, 12-13 F3, 14-20 F4

% F

ibro

sis

0

20

40

60

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Subjects 1-4 F0, 5-7 F1, 8-11 F2, 12-13 F3, 14-20 F4

% F

ibro

sis

A B

C D

E F

Supplementary Figure 1. Absolute AF according to size of virtual biopsy specimens as measured by image analysis. Dot plots are stratified bybiopsy length for each included case ranked by AF. (A) 5-mm–length biopsy; (B) 10-mm–length biopsy; (C) 15-mm–length biopsy; (D) 20-mm–length

biopsy; (E) 25-mm–length biopsy; (F) 30-mm–length biopsy.
Page 11: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

a

663.e4 POYNARD ET AL CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 10, No. 6

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Log5 Log10 Log15 Log20 Log25 Log30

% F

ibro

sis

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Log5 Log10 Log15 Log20 Log25 Log30

% F

ibro

sis

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Log5 Log10 Log15 Log20 Log25 Log30

% F

ibro

sis

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

Log5 Log10 Log15 Log20 Log25 Log30

% F

ibro

sis

-0.5

0.0

0.5

1.0

1.5

2.0

Log5 Log10 Log15 Log20 Log25 Log30

% F

ibro

sis

A B

C

E

D

Supplementary Figure 2. AF according to biopsy length and METAVIR score. (A) AF (log10) according to biopsy length in patients scored METAVIRF0 (no fibrosis). (B) AF (log10) according to biopsy length in patients scored METAVIR F1 (portal fibrosis). (C) AF (log10) according to biopsy length inpatients scored METAVIR F2 (few septa). (D) AF (log ) according to biopsy length in patients scored METAVIR F3 (many septa). (E) AF (log )

10 10

ccording to biopsy length in patients scored METAVIR F4 (cirrhosis).

Page 12: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 663.e5

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

AUROC

1-Specificity

Sen

sitiv

ity

_5mm_10mm_15mm_20mm_25mm_30mm

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

AUROC

1-Specificity

Sen

sitiv

ity

_5mm_10mm_15mm_20mm_25mm_30mm

A

B

Supplementary Figure 3. (A) AUROC for advanced fibrosis (F2, F3,F4 vs F0, F1) according to biopsy length. (B) AUROC for cirrhosis (F4 vs

F0, F1, F2, F3) according to biopsy length.
Page 13: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

N

C

NwcbH

MAAC

663.e6 POYNARD ET AL CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 10, No. 6

Supplementary Table 1. Reference Values of Fibrosis Area Assessed by METAVIR Score and Image Analysis in 20 LiverSamples

Case no. Age (y) Gender DiagnosisMETAVIR

score FibroTestFibroScan

(kPa)% AFa

mean (n)

o fibrosisb

1 63 Male Colon cancer F0 0.22 9.3 1.6 (246)2 68 Male Colon cancer F0 0.25 5.3 3.9 (192)3 40 Male Adenoma F0 0.26 4.4 4.1 (150)4 44 Female Colon cancer F0 0.13 5.6 3.7 (168)5 56 Female Colon cancer F0 NA NA 3.7 (270)

hronic liver disease6 67 Male NAFLD, colon cancer F1 0.44 7.2 5.0 (137)7 64 Female NAFLD, colon cancer F1 0.33 6 7.4 (270)8 65 Male HBV, HCC F2 0.61 NA 4.8 (228)9 57 Female NAFLD, HCC F2 0.65 12.0 7.9 (270)10 72 Male NAFLD, colon cancer F2 0.77 18.0 8.0 (270)11 45 Male HBV, HCC, transplanted F2 0.39 13.4 6.9 (174)12 60 Male HBV, HCC, transplanted F3 0.67 NA 9.0 (270)13 60 Male HCV, HCC, transplanted F3 0.60 NA 10.1 (270)14 50 Male Biliary cirrhosis, transplanted F4 0.94 NA 14.2 (258)15 58 Female HCV, HCC, transplanted F4 0.89 NA 14.7 (180)16 66 Female HCV, HCC, transplanted F4 0.95 13.0 17.0 (164)17 61 Female ALD, transplanted F4 0.98 61.5 18.1 (264)18 63 Male ALD, HCC, transplanted F4 0.92 42.2 19.0 (240)19 42 Male HBV, HCC, transplanted F4 0.96 NA 19.9 (270)20 47 Male ALD, HCC, transplanted F4 0.98 NA 23.7 (258)

OTE. A total of 22 consecutive patients were preincluded, because a surgical sample had been obtained with assessment of AF. Two patientsere excluded because of final diagnosis of liver disease for which FibroTest had not been validated: 1 extrahepatic cholestasis (pancreaticancer) and 1 hepatic oxalosis. The remaining 20 patients were included. For analysis of strength of concordance, 1 patient was not includedecause of absence of FibroTest measurement.CC, hepatocellular carcinoma; NA, not available.

aReference value for fibrosis estimate, calculated as mean/median AF from all 5-mm samples for patient surgical biopsies.

bThese patients had a liver tumor, without any risk of liver disease and absence of fibrosis in the nontumor liver.

Supplementary Table 2. Characteristics of 6500 Patients Included in Integrated Database According to Each Chronic LiverDisease

Characteristics HCV, n � 3720 HBV, n � 1625 ALD, n � 246 NAFLD, n � 909

ale sex, n (%) 2195 (59) 1170 (72) 189 (77) 354 (39)ge (y), mean (standard deviation) 48.7 (11.1) 39.9 (12.3) 47.5 (10.3) 46.1 (11.9)dvanced fibrosis F2–F4 1851 (58%) 530 (39%) 165 (63%) 145 (16%)

irrhosis F4 600 (16%) 171 (11%) 94 (38%) 26 (2%)
Page 14: Liver Biopsy Analysis Has a Low Level of Performance for ......Liver Biopsy Analysis Has a Low Level of Performance for Diagnosis of Intermediate Stages of Fibrosis THIERRY POYNARD,*

s

w

June 2012 BIOPSY LOW PERFORMANCE FOR INTERMEDIATE STAGES 663.e7

Supplementary Table 3. FibroTest Performance for Intermediate Stages Relative to Biopsy Performance and Performance forExtreme Stages

F1 vs F0 F2 vs F1 F3 vs F2 F4 vs F3

AUROCFibroTest 0.697 (0.010) 0.666 (0.010) 0.669 (0.012) 0.680 (0.013)Difference relative to F2 vs F1a (%) �5 0 �1 �2Biopsy 30-mm 0.562 (0.017) 0.549 (0.017) 0.680 (0.014) 0.862 (0.009)Difference relative to F2 vs F1a (%) �2 0 �24 �57Relative decrease of FibroTest vs biopsyb (%) �150 0 �113 �110

AUROCFibroTest 0.626 (0.009) 0.512 (0.013) 0.522 (0.011) 0.628 (0.013)Difference relative to F2 vs F1a (%) �22 0 �2 �23Biopsy 30-mm 0.773 (0.013) 0.505 (0.013) 0.552 (0.014) 0.700 (0.010)Difference relative to F2 vs F1a (%) �53 0 �9 �39Relative decrease of FibroTest vs biopsy (%) �58 0 �78 �41

a% calculated as [AUROC (Fx vs Fx-1) � AUROC (F2 vs F1)]/AUROC (F2 vs F1].

b% calculated as [Difference relative to F2 vs F1 (FibroTest) � Difference relative to F2 vs F1 (Biopsy)/Difference relative to F2 vs F1 (Biopsy)].