file · web viewmanuscript word count – 2527 (2856 including references) figures – 4. tables...
TRANSCRIPT
Benefit of using motion compensated reconstructions for reducing inter-observer and intra-observer contouring variation for organs at risk in lung cancer patients
Running title:OAR contour variation for lung cancer patients
A McWilliam1,2, L Lee3, M Harris3, H Sheikh3, L Pemberton3, C Faivre-Finn1,2, M van Herk1,2.Joint last authors
1 Division of Molecular and Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, UK
2 Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, UK
3 Department of Clinical Oncology, The Christie NHS Foundation Trust, Manchester, UK
Corresponding author: Alan McWilliam
The Christie NHS Foundation Trust
Radiotherapy Related Research (Dept 58)
Wilmslow Road
Manchester, UK
M20 4BK
0161 918 7480
The authors declare no potential conflicts of interest
Abstract word count – 200
Manuscript word count – 2527 (2856 including references)
Figures – 4
Tables – 2
Keyword – Lung cancer, 4D CT scan reconstruction, inter-observer contour delineation, intra-observer contour delineation
12345
67
89
10
1112
1314
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
3334
Background and Purpose
In lung cancer patients, accuracy in contouring is hampered by image artefacts introduced
by respiratory motion. With the widespread introduction of 4DCT there is additional
uncertainty caused by the use of different reconstruction techniques which will influence
contour definition. This work aims to assess both inter- and intra-observer contour variation
on average and motion compensated (mid-position) reconstructions.
Material and Methods
Eight early stage non-small cell lung cancer patients that received 4DCT were selected and
these scans were reconstructed as average and motion compensated datasets. 5 observers
contoured the organs at risk (trachea, oesophagus, proximal bronchial tree, heart and
brachial plexus) for each patient and each reconstruction. Contours were compared against
a STAPLE volume with distance to agreement metrics. Intra-observer variation was
assessed by redelineation after 4 months.
Results
The inter-observer variation was significantly smaller using the motion compensated
datasets for the trachea (p=0.006) and proximal bronchial tree (p=0.004). For intra-observer
variation, a reduction in contour variation was seen across all organs at risk in using motion
compensated reconstructions.
Conclusions
This work shows that there is benefit in using motion compensated reconstructions for
reducing both inter-observer and intra-observer contouring variation for organs at risk in lung
cancer patients.
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Introduction
Treatment outcomes for lung cancer survival are poor with less than 20% of patients
surviving over 5 years (1, 2). Dose escalation studies have shown promise in improving
outcomes, although recent studies suggest that the limits of this approach have been
reached with standard dose fractionation. The RTOG 0617 trial showed worse outcome in
the dose escalation arm (3) with the multivariate analysis indicating that dose to organs at
risk (OARs) such as heart and lung were associated with poorer patient survival. However,
there remains uncertainty in the reporting of dose statistics to OARs mainly due to variation
in contouring. A secondary analysis of heart contours in the RTOG0617 showed large
variation across observers, creating uncertainty in the dose delivered to OARs (4).
Contouring studies in tumour delineation for lung cancer patients have shown large
variability, particularly in contouring lymph nodes with standard deviations of up to 1.5 cm
(5). The introduction of PET has significantly decreased inter-observer uncertainty (6),
potentially allowing smaller target margins. However, uncertainty remains due to the
respiratory motion in the lungs. 4-Dimensional computed tomography (4DCT) scans are
now standard for radiotherapy planning for the majority of lung cancer patients. These allow
the capture of the tumour motion and the potential to reduce or to personalise margins. The
simplest approach is to contour a motion-adapted GTV that encompasses the extent of the
tumour motion. An alternative is the mid-ventilation approach, where the phase closest to
the mid-position of the respiratory cycle is selected for treatment planning. More recently,
Wolthaus et al. introduced the mid-position concept where all anatomy is deformed to the
true mid-position of the respiratory cycle, i.e. a motion compensated (MC) reconstruction is
made (7). This approach allows patient margins personalised to an individual’s respiratory
amplitude, which in most cases produces margins smaller than a motion-adapted GTV
method (8). This approach also results in better contrast in the scans as all phases are
deformed to the same position and averaged, reducing noise. Motion artefacts due to
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
irregular respiration are also supressed. The clinical use of this methodology has been
shown to be acceptable with patient outcomes independent on motion amplitude (9).
The respiratory motion that results in uncertainty in delineation of the tumour volume will also
cause uncertainty in the delineation of OARs. In current practice OARs are often contoured
on a reconstructed average dataset (AVG), which is considered closest to the mean position,
but where the respiratory motion causes image blur. With a move towards more complex
conformal treatments, dose escalation and the wider adoption of stereotactic ablative body
radiotherapy (SABR), accuracy in OAR delineation becomes more critical. There are also
increasing numbers of structures to be outlined, with a number located close to the
diaphragm where respiration will cause greater uncertainty, and in the mediastinum where
signal-to-noise ratio is low and motion compensation can have additional benefit because
effectively all dose in the 4DCT scan is used.
As for tumour delineation, implementation of MC workflows should allow for less observer
variation in the contouring of OARs when compared to standard approach using an AVG
reconstruction. To our knowledge, there are no current inter- or intra-observer delineation
studies for OARs in lung cancer patients. This paper performs such a study for the first time
with contours compared between AVG and MC reconstructions.
Materials and Methods
Eight representative patients diagnosed with early or locally advanced non-small stage lung
cancer and treated with SABR were randomly selected. Each patient received a 4DCT scan
from which an AVG scan was reconstructed for use in the planning process. The 4DCT
scan consisted of 10 phase bins and were acquired by a Philips Big Bore CT scanner
(Philips Healthcare). The phase bins were used in the creation of the motion compensated
scans utilising ADMIRE (Elekta AB, Stockholm, Sweden) and a Lua script running on a
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
Conquest DICOM server. Additional image handling was done with tools from the Nifty
deformable registration package (Nifty, UCLH).
The following steps were used to create the motion compensated scans.
ADMIRE was used to deformable register each individual phase to a reference phase
and export the DICOM deformation vector field (DVF) to the conquest DICOM server.
The reference phase was chosen to be at exhale to minimise effects of motion
induced image artefacts on the image registration.
Each individual phase dataset and DVF was loaded into the Lua script. This
calculated the mean DVF which was then subtracted from each individual DVF.
These modified DVFs were used to deform their associated dataset to the mid-
position.
The four datasets that show the fastest motion (e.g. those during the inhale and
exhale slopes) were discarded to reduce motion artefacts.
The remaining deformed phase datasets were averaged to create the motion
compensated dataset and exported.
For each patient the original AVG scan and the MC scan were loaded into Pinnacle vr9.8
(Philips Radiation oncology systems, Fitchburg, WI). Scans were blinded so the observers
did not know which AVG and MC scan belonged to the same patient. Five clinical
oncologists specialised in thoracic malignancies delineated the OARs on each scan;
trachea, oesophagus, proximal bronchial tree (PBT), heart and brachial plexus. The
oesophagus, heart and brachial plexus were contoured on the mediastinal level and window.
The trachea and PBT were contoured using both the mediastinal and lung level and window.
OARs were delineated as described in the United Kingdom SABR consortium
guidelines(10). Structure sets were exported as a DICOM RTSTRUCT object for analysis in
ADMIRE.
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
Inter-patient analysis was performed by first creating a STAPLE (Simultaneous Truth and
Performance Level Estimation) volume (11) from the five oncologist contours, for each OAR
on each patient. All individual oncologist contours were compared against the STAPLE
volume, calculating the unsigned mean and max distance to agreement (DTA). The mean
DTA (mDTA) provided a good comparison across the whole volume while the max DTA will
indicate the presence of outliers. This may highlight any differences seen from MC scans
resulting in sharper boundaries, i.e. between the heart and liver. Results were combined for
all observers on the AVG versus MC reconstructions and across all patients for each
structure. A pairwise students t-test was used to test for statistical significance, we
considered each OAR individually and compared the distribution of the mDTA, averaged for
all observers, on AVG and MC reconstructions for each patient. Secondly, contours were
cropped so that each structure started and finished on the same CT slices across all
observers for each patient. This analysis will enhance intra-slice differences, highlighting
improvements between the two reconstruction techniques.
Intra-observer analysis was also performed, each observer was allocated one patient to re-
contour, after a minimum delay of four months. Observers re-contoured the same patient for
both the AVG and MC scans allowing a direct measure of the intra-observer variation on
both reconstruction techniques. ADMIRE was used to calculate unsigned mDTA between
the two sets of contours. A direct comparison was performed of the variation between the
AVG and MC contouring for each observer and each structure across all patients, a pairwise
students t-test was used to test statistical significance.
Finally, in moving to MC reconstructions, we may find that contours report lower volumes. In
removing blurring caused by the respiratory cycle, and increasing contrast, contours may
become smaller. Therefore, any associated, volume based dose statistics reported will show
differences from using the AVG. Volumes of each structure were compared on the AVG and
MC to investigate if this effect is present.
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
Results
Figure 1 shows examples of the MC reconstructions, slices highlighting the improved
definition of the bronchial tree and greater clarity of the boundary between the heart and the
liver.
Results show a significant improvement in using the MC scans for contouring the trachea
(paired t-test, p = 0.04) and some benefit for the remaining OARs, figure 2. After editing to
remove the uncertainty of the superior and inferior extent of the OAR contours the trachea
remains significantly improved (p= 0.006), but also the PBT (p = 0.004) become significant.
There is an overall improvement in the mDTA, particularly for the trachea, PBT and
oesophagus, OARs which are tubular in nature, all are now all sub-mm. Figure 2 and 3 also
show the SD across the observer results as the included error bars. The MC reconstructions
show a smaller variation compared to the AVG, particularly for edited contours, indicating
improved inter-observer agreement. It is worth noting that the brachial plexus showed a
significant improvement in figure 2. However, there remained a large variation between
observers (mDTA of 3.0cm).
The heart shows little change in mDTA values, both between MC and AVG. The heart is a
large organ, with a large semi-vertical border with the lung that is hardly affected by
breathing motion. This boundary will not show much benefit from the MC reconstruction and
will drive the mDTA results. There may be some benefit at the heart-liver boundary as
shown in figure 1, where the clearer definition could result in less variation in observer
contours. Indeed, visual inspection of the contours appeared to show improvement. To
investigate this the max DTA was calculated for the un-edited contours, table 1. Reduction
in observer variation at these boundaries could show as a reduction in max DTA. However,
no significant on the paired t-test was seen for any OAR with the max DTA consistent across
all OARs.
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
Intra-observer results showed an improvement for all OARs, mDTA was improved and the
standard deviation between observers less, figure 5. The heart (p < 0.001) and oesophagus
(p = 0.05) showed significance, the trachea and PBT were approaching significance. The
brachial plexus showed an improvement with the mDTA reduced from 4.6mm to 2.4mm,
however a large, visual, variation remained.
Ratios of volumes in this study are included as table 2, ratios were calculated from the
volumes drawn by each individual observer on the MC and AVG for each patient. We
showed no statistically significant difference in volume between reconstructions. Table 2
also includes the range of intra-observer variation to put these results into a wider context.
These results indicate that a single observer contouring on a MC scan twice produces a
similar uncertainty as contouring on a patient’s scans reconstructed with MC and AVG.
Discussion
This work investigated inter- and intra-observer contour variation of OAR in lung cancer
patients. The OARs investigated were the oesophagus, heart, trachea, PBT and brachial
plexus. Five clinical oncologists specialised in thoracic malignancies contoured all OARs to
allow for inter-observer variation to be investigated. In addition, each oncologist re-
contoured one patient allowing intra-observer variation to be estimated. We performed the
analysis on contours drawn on AVG and MC scans, both for the first time, and shown that
MC scans show, in general, superior agreement.
There has been proven benefit for a MC approach in delineation of lung cancer tumours in
allowing margins to be personalised to an individual patient’s respiratory cycle (8). Such an
approach is beginning to enter standard clinical use. MC scans deform all phases to the
mid-position of the respiratory cycle, in doing so all anatomy is essentially frozen in that
position. The blurriness from the respiratory cycle is removed, resulting in sharper scans
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
with improved contrast. This is most notable at the boundaries between organs, for example
the heart and liver. Such boundaries are indistinct on AVG scans, however, as figure 1
shows, appear clearer on MC. Additionally, any OAR that may be effected by blurriness
caused by the respiration will show benefit, in particular the trachea and PBT, figure 1.
Improved clarity should translate in to an improved contouring experience and less observer
variation, as seen in this study. It is worth noting that the MC approach is based on
respiratory sorting, other motion in the thorax will not be corrected in this manor, for example
cardiac motion. These may also impact the contouring accuracy. However, these are
smaller effects, and more localised, that the respiratory motion.
An additional benefit of using an MC approach will be an improvement in planning
workflows. In this scenario, all contours, tumour as well as OARs, can be generated on the
motion compensated dataset. Full phase information and AVG reconstructions will no longer
be required, a motion-adapted GTV will not be generated therefore no phases need to be
processed on the planning system. This will reduce the chance of errors in selecting
incorrect datasets during the contouring process and eliminate the change of systematic
errors due to errors in contour generation.
Results in this paper were presented in two ways, the un-edited contours (figure 2) and
contours edited to start and end on the same slice across all observers (figure 3). As the
results showed, for the un-edited contours only the PBT showed a significant benefit of
contouring on the MC dataset (p = 0.04). mDTA for all other contours did not show any
significant improvement. However, the edited contours showed improvements for the
trachea (p = 0.006) and the PBT (p = 0.004) with the oesophagus (p= 0.07) approaching
significance. By editing the contours so that across all observers the superior and inferior
extent was constant we removed the uncertainty in where a given structures starts or ends.
Clinically, the border between one structure and the next can be difficult to interpret. In this
study, the border between the trachea and PBT displayed an uncertainty, in the un-edited
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
contours, of 3-4 slices across the five observers (0.9-1.2cm). By editing the contours this
uncertainty has been removed allowing the inter-observer variation related to image quality
to be investigated within a set range of slices. This difference may be due to interpretation
of the protocol between the five clinical oncologists. In addition, table 2 shows that moving
to a MC reconstruction does not change the volumes of the contours significantly.
Therefore, we believe that moving to a MC workflow will not result in dosimetric differences
in the report plan dosimetry for OARs.
Intra-observer variation was also investigated, with a delay of at least 4 months between
contouring. As figure 4 shows, MC scans uniformly provided smaller intra-observer
variation. These results were mostly not statistically significant, most likely due to the small
number of observers. However, the mDTA ranges displayed in the table indicate that there
is a benefit.
Because the mDTA is used in this analysis, the reported benefit is small, yet locally the
differences can be large. For instance, the heart-liver interface shows a visible improvement
which is washed out by the rest of the organ surface which showed little improvement, figure
5 highlights this effect. The max DTA statistics showed no difference between MC and AVG
scans, however the maximum DTA will only show one point across the contours surface and
is by definition only defined by outliers. Further analysis showing local statistics may be
useful in highlighting benefit in these areas.
The brachial plexus showed no improvement in mDTA on using the MC reconstruction
compared to the AVG. This is not unexpected considering the lack of tissue definition for
contouring this OAR on CT. Anatomical surrogates such as blood vessels and muscles are
used to localise the brachial plexus. Moving to MC reconstructions was not expected to
improve the delineations and the results confirm this hypothesis. To improve delineation of
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
the brachial plexus the solution would be to include magnetic resonance imaging in the
planning process.
We are now planning to move this workflow towards clinical implementation in our
department for lung cancer patients. However, a robust quality control method will be
required to ensure the MC reconstruction has been performed correctly. This is most
important where motion artefacts from irregular breathing are present. This may result in the
full extent of the respiratory cycle to be obscured resulting in an incorrect MC dataset which
may obscure relevant anatomy. Careful review of the phase information and MC scans must
be performed.
Our proposed method for calculating the MC reconstruction assumes that all DVF remain in
the same frame of reference (FOR). However, in deforming from the FOR of the reference
phase to the FOR of the MC reconstruction this may result in a change in coordinates.
However, we believe that any residual error will remain small and would only become an
issue where the gradient of the displacement is large. i.e. tissue near the diaphragm. This
assumption is similar to that described by Brehm et al. in creating a cyclic registration
approach for motion compensated cone beam CT [ref Brehm]. Viewing the Jacobian of the
transform as part of the quality control process would highlight patients that may require a
more detailed check.
We will also investigate the use of this technique in further sites where respiratory motion
causes uncertainties, i.e. oesophagus, stomach and lower abdomen oligometastatic
disease. With the introduction of MR guided external beam radiotherapy and the
introduction of MR linacs it may prove advantageous to apply these methodologies in these
emerging technologies.
Conclusion
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
This paper performs an observer study for OAR delineation in lung cancer patients for the
first time. We investigated intra- and inter-observer variation for AVG and MC
reconstructions, where variation was generally consistently smaller on the MC. The results
showed significant benefit for OARs where blurring due to the respiratory cycle is greatest, in
particular trachea and PBT. Some benefit is also evident at horizontal boundaries close to
the diaphragm, for example the heart-liver interface, due to the increased soft tissue
definition.
References
1. Auprin A, Le Pchoux C, Rolland E, et al. Meta-analysis of concomitant versus sequential
radiochemotherapy in locally advanced non - small-cell lung cancer. J. Clin. Oncol.
2010;28:2181–2190.
2. Machtay M, Paulus R, Moughan J, et al. Defining Local-Regional Control and Its
Importance in Locally Advanced Non-small Cell Lung Carcinoma. J. Thorac. Oncol.
2012;7:716–722.
3. Bradley JD, Paulus R, Komaki R, et al. Standard-dose versus high-dose conformal
radiotherapy with concurrent and consolidation carboplatin plus paclitaxel with or without
cetuximab for patients with stage IIIA or IIIB non-small-cell lung cancer (RTOG 0617): A
randomised, two-by-two factorial p. Lancet Oncol. 2015;16:187–199.
4. Gore EM, Hu C, Ad VB, et al. Impact of Incidental Cardiac Radiation on Cardiopulmonary
Toxicity and Survival for Locally Advanced Non-Small Cell Lung Cancer: Reanalysis of NRG
Oncology/RTOG 0617 With Centrally Contoured Cardiac Structures. Int. J. Radiat. Oncol. •
Biol. • Phys. 2017;96:S129–S130.
5. Steenbakkers RJHM, Duppen JC, Fitton I, et al. Observer variation in target volume
delineation of lung cancer related to radiation oncologist-computer interaction: A “Big
Brother” evaluation. Radiother. Oncol. 2005;77:182–190.
6. Steenbakkers RJHM, Duppen JC, Fitton I, et al. Reduction of observer variation using
matched CT-PET for lung cancer delineation: A three-dimensional analysis. Int. J. Radiat.
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
Oncol. Biol. Phys. 2006;64:435–448.
7. Wolthaus JWH, Schneider C, Sonke JJ, et al. Mid-ventilation CT scan construction from
four-dimensional respiration-correlated CT scans for radiotherapy planning of lung cancer
patients. Int. J. Radiat. Oncol. Biol. Phys. 2006;65:1560–1571.
8. Wolthaus JWH, Sonke J-J, van Herk M, et al. Comparison of Different Strategies to Use
Four-Dimensional Computed Tomography in Treatment Planning for Lung Cancer Patients.
Int. J. Radiat. Oncol. 2008;70:1229–1238.
9. Peulen H, Belderbos J, Rossi M, et al. Mid-ventilation based PTV margins in Stereotactic
Body Radiotherapy (SBRT): A clinical evaluation. Radiother. Oncol. 2014;110:511–516.
10. www.sabr.org.uk/consortium/.
11. Warfield, Sk, Kelly, AZ, Wells W. Simultaneous Truth and Performance Level Estimation
(STAPLE): An Algorithm for the Validation of Image Segmentation. IEEE Trans Med
Imaging. 2004;23:903–921.
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347348349350351352353354355356357358359360361362363364365366367368369370371372373
Table 1. The maximum DTA metric is shown for each OAR investigated. The table provides
the mean of the max DTA for each organ at risk and the range across all five observers. A
pairwise t-test was performed showing no statistically significant difference between the
average and motion compensated reconstruction contours.
374375376377378379380381382383
384385386
387
388
389
390391392393394395396397398399400401402403404405406407408409410411412413414415
Table 2. Ratios of volume from contours drawn on average and motion compensated
reconstructions, the range is included. Intra-observer variation is included for the average
and motion compensated reconstructions for comparison.
416417418419420421422423424425
426427428
429
430
431432433434435436437438439440441442443444445446447448449450451452453454455456457
Figure 1. Average and motion compensated reconstructions are shown, with benefits of the
motion compensated approach highlighted. The removal of the respiratory cycle has
sharpened the image with greater definition seen in the mediastinum (proximal bronchial tree
shown) and greater definition of the heart/liver interface.
458459460461462463464465466467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
Figure 2. Results of the analysis for each OAR investigated for each patient. The median of
the mDTA across all 5 observers is shown for the unedited contours, error bars show the
standard deviation between observers. The table summarises the results for each organ at
risk across all patients, results are compared with a pairwise students t-test.
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
Figure 3. Results of the analysis for each OAR investigated for each patient. The median of
the mDTA across all 5 observers is shown for the edited contours, error bars show the
standard deviation between observers (contours edited superior and inferior). The table
summarises the results for each organ at risk across all patients, results are compared with
a pairwise students t-test.
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
Figure 4. The intra-observer results are shown, the mDTA across all observers for each
OAR was improved using MC. Intra-observer variation was analysed with at least a 4-month
gap between contouring. A pairwise t-test was performed between the average and motion
compensated reconstructions.
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
Figure 5. A representative slice of the heart liver boundary is shown for both AVG and MC
reconstructions. The inter-observer variation is displayed for the five observers and the
improved agreement in this region using the MC reconstruction is evident.
554
555
556
557