![Page 1: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/1.jpg)
1
The Pyramid Method at The Pyramid Method at DUC05DUC05
Ani Nenkova
Becky Passonneau
Kathleen McKeown
Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman
![Page 2: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/2.jpg)
2
OverviewOverview
Review of Pyramids (Kathy) Characteristics of the responses Analyses (Ani)
Scores and Significant Differences Reliability of Pyramid scoring
Comparisons between annotators Impact of editing on scores Impact of Weight 1 SCUs Correlation with responsiveness and Rouge
Lessons learned
![Page 3: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/3.jpg)
3
PyramidsPyramids Uses multiple human summaries
Previous data indicated 5 needed for score stability
Information is ranked by its importance Allows for multiple good summaries A pyramid is created from the human
summaries Elements of the pyramid are content units System summaries are scored by comparison with
the pyramid
![Page 4: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/4.jpg)
4
Summarization Content UnitsSummarization Content Units
Near-paraphrases from different human summaries
Clause or less
Avoids explicit semantic representation
Emerges from analysis of human summaries
![Page 5: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/5.jpg)
5
SCU: SCU: A cable car caught fireA cable car caught fire (Weight = 4)(Weight = 4)A. The cause of the fire was unknown.B. A cable car caught fire just after entering a
mountainside tunnel in an alpine resort in Kaprun, Austria on the morning of November 11, 2000.
C. A cable car pulling skiers and snowboarders to the Kitzsteinhorn resort, located 60 miles south of Salzburg in the Austrian Alps, caught fire inside a mountain tunnel, killing approximately 170 people.
D. On November 10, 2000, a cable car filled to capacity caught on fire, trapping 180 passengers inside the Kitzsteinhorn mountain, located in the town of Kaprun, 50 miles south of Salzburg in the central Austrian Alps.
![Page 6: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/6.jpg)
6
SCU: SCU: The cause of the fire is The cause of the fire is unknownunknown (Weight = 1) (Weight = 1)A. The cause of the fire was unknown.B. A cable car caught fire just after entering a
mountainside tunnel in an alpine resort in Kaprun, Austria on the morning of November 11, 2000.
C. A cable car pulling skiers and snowboarders to the Kitzsteinhorn resort, located 60 miles south of Salzburg in the Austrian Alps, caught fire inside a mountain tunnel, killing approximately 170 people.
D. On November 10, 2000, a cable car filled to capacity caught on fire, trapping 180 passengers inside the Kitzsteinhorn mountain, located in the town of Kaprun, 50 miles south of Salzburg in the central Austrian Alps.
![Page 7: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/7.jpg)
7
SCU: SCU: The accident happened in The accident happened in the Austrian Alpsthe Austrian Alps (Weight = 3) (Weight = 3)A. The cause of the fire was unknown.B. A cable car caught fire just after entering a
mountainside tunnel in an alpine resort in Kaprun, Austria on the morning of November 11, 2000.
C. A cable car pulling skiers and snowboarders to the Kitzsteinhorn resort, located 60 miles south of Salzburg in the Austrian Alps, caught fire inside a mountain tunnel, killing approximately 170 people.
D. On November 10, 2000, a cable car filled to capacity caught on fire, trapping 180 passengers inside the Kitzsteinhorn mountain, located in the town of Kaprun, 50 miles south of Salzburg in the central Austrian Alps.
![Page 8: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/8.jpg)
8
Idealized representationIdealized representation
Tiers of differentially weighted SCUs
Top: few SCUs, high weight
Bottom: many SCUs, low weight
W=1
W=2
W=3
![Page 9: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/9.jpg)
9
Creation of pyramids Creation of pyramids
Done for each of 20 out of 50 sets
Primary annotator, secondary checker
Held round-table discussions of problematic constructions that occurred in this data set
Comma separated lists Extractive reserves have been formed for managed harvesting of
timber, rubber, Brazil nuts, and medical plants without deforestation.
General vs. specific Eastern Europe vs. Hungary, Poland, Lithuania, and Turkey
![Page 10: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/10.jpg)
10
Characteristics of the ResponsesCharacteristics of the Responses
Proportion of SCUs of Weight 1 is large 44% (D324) to 81% (D695)
Mean SCU weight: 1.9
Agreement among human responders is quite low
![Page 11: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/11.jpg)
11 SCU Weights
# of SCUs at each weight
![Page 12: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/12.jpg)
12
Pyramids: DUC 2003Pyramids: DUC 2003
100 word summaries (vs. 250 word) 10 500-word articles per cluster (vs. 30 720-
word articles) 3 clusters (vs. 20 clusters)
Mean SCU Weight (7 models) 2005: avg 1.9 2003: avg 2.4
Proportion of SCUs of W=1 2005: avg – 60%, 44% to 81% 2003: avg – 40%, 37% to 47%
![Page 13: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/13.jpg)
13
DUC03 DUC05DUC03 DUC05
.4
.4
![Page 14: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/14.jpg)
14
Computing pyramid scores:Computing pyramid scores:Ideally informative summaryIdeally informative summary
Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well
![Page 15: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/15.jpg)
15
Ideally informative summaryIdeally informative summary
Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well
![Page 16: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/16.jpg)
16
Ideally informative summaryIdeally informative summary
Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well
![Page 17: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/17.jpg)
17
Ideally informative summaryIdeally informative summary
Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well
![Page 18: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/18.jpg)
18
Ideally informative summaryIdeally informative summary
Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well
![Page 19: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/19.jpg)
19
Ideally informative summaryIdeally informative summary
Does not include an SCU from a lower tier unless all SCUs from higher tiers are included as well
![Page 20: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/20.jpg)
20
Original Pyramid ScoreOriginal Pyramid Score
SCORE = D/MAX
D: Sum of the weights of the SCUs in a summary
MAX: Sum of the weights of the SCUs in a ideally informative summary
Measures the proportion of good information in the summary: precision
![Page 21: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/21.jpg)
21
Modified pyramid score Modified pyramid score (recall)(recall) EN = average SCUs in human models
This is the number of content units humans chose to convey about the story
W=Compute the weight of a maximally informative summary of size EN
D/W is the modified pyramid score Shows the proportion of expected good
information
![Page 22: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/22.jpg)
22
Scoring MethodsScoring Methods
Presents scores for the 20 pyramid sets Recompute Rouge for comparison
We compute Rouge using only 7 models 8 and 9 reserved for computing human performance Best because of significant topic effect
Comparisons between Pyramid (original,modified), responsiveness, and Rouge-SU4
Pyramids score computed from multiple humans Responsiveness is just one human’s judgment Rouge-SU4 equivalent to Rouge-2
![Page 23: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/23.jpg)
23
Preview of ResultsPreview of Results
Manual metrics Large differences between humans and machines
No single system the clear winner But a top group identified by all metrics
Significant differences Different predictions from manual and automatic metrics
Correlations between metrics Some correlation but one cannot be substituted for another This is good
![Page 24: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/24.jpg)
24
Human performance/Best sysHuman performance/Best sys
Pyramid Modified Resp ROUGE-SU4
B: 0.5472 B: 0.4814 A: 4.895 A: 0.1722 A: 0.4969 A: 0.4617 B: 4.526 B: 0.1552~~~~~~~~~~~~~~~~~
14: 0.2587 10: 0.2052 4: 2.85 15: 0.139 Best system ~50% of human performance on manual metrics
Best system ~80% of human performance on ROUGE
![Page 25: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/25.jpg)
25
Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097
![Page 26: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/26.jpg)
26
Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097
![Page 27: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/27.jpg)
27
Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097
![Page 28: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/28.jpg)
28
Pyramid original Modified Resp Rouge-SU414: 0.2587 10: 0.2052 4: 2.85 15: 0.139 17: 0.2492 17: 0.1972 14: 2.8 4: 0.134 15: 0.2423 14: 0.1908 10: 2.65 17: 0.1346 10: 0.2379 7: 0.1852 15: 2.6 19: 0.1275 4: 0.2321 15: 0.1808 17: 2.55 11: 0.1259 7: 0.2297 4: 0.177 11: 2.5 10: 0.127816: 0.2265 16: 0.1722 28: 2.45 6: 0.1239 6: 0.2197 11: 0.1703 21: 2.45 7: 0.1213 32: 0.2145 6: 0.1671 6: 2.4 14: 0.1264 21: 0.2127 12: 0.1664 24: 2.4 25: 0.1188 12: 0.2126 19: 0.1636 19: 2.4 21: 0.1183 11: 0.2116 21: 0.1613 6: 2.4 16: 0.1218 26: 0.2106 32: 0.1601 27: 2.35 24: 0.118 19: 0.2072 26: 0.1464 12: 2.35 12: 0.116 28: 0.2048 3: 0.145 7: 2.3 3: 0.1198 13: 0.1983 28: 0.1427 25: 2.2 28: 0.1203 3: 0.1949 13: 0.1424 32: 2.15 27: 0.110 1: 0.1747 25: 0.1406 3: 2.1 13: 0.1097
![Page 29: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/29.jpg)
29
Significant DifferencesSignificant Differences
Manual metrics Few differences between systems
Pyramid: 23 is worse Responsive: 23 and 31 are worse
Both humans better than all systems
Automatic (Rouge-SU4) Many differences between systems One human indistinguishable from 5 systems
![Page 30: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/30.jpg)
30
Multiple and pairwise comparisonsMultiple and pairwise comparisons
Multiple comparisons Tukey’s method Control for the experiment-wise type I error Show fewer significant differences
Pairwise comparisons Wilcoxon paired test Controls the error for individual comparisons Appropriate how your system did for development
![Page 31: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/31.jpg)
31
21
32
6
12
19
11
16
4
15
7
14
17
10
A
B
23
23
23
23
23
23
23
23
23
23
23
23 20
23 20
23 20 30 24 31 1 27 25 28 13 26 3 21 32 6 12 19 11 16 4 15 7 14 17 10
23 20 30 24 31 1 27 25 28 13 26 3 21 32 6 12 19 11 16 4 15 7 14 17 10
Modified pyramid: significant differences• One systems accounts for most of the differences
• Humans significantly better than all systems
Peer Better than
![Page 32: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/32.jpg)
32
26
13
20
3
32
25
7
12
27
6
16
19
24
21
28
11
17
15
10
14
4
B
A
23
23
23
23
23
23
23
23
23
23 31
23 31
23 31
23 31
23 31
23 31
23 31
23 31
23 31 1
23 31 1
23 31 1 30 26 13 20
23 31 1 30 26 13 20 3
23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4
23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4
Responsiveness 1: Significant differences
• Differences primarily between 2 systems
• Differences between humans and each system
![Page 33: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/33.jpg)
33
16
12
15
28
3
7
4
14
17
10
B
A
23
23
23
23
23
23
23
23 31 20
23 31 20
23 31 20
23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4
23 31 1 30 26 13 20 3 32 25 7 12 27 6 16 19 24 21 28 11 17 15 10 14 4
Responsive-2
• Similar shape to original
![Page 34: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/34.jpg)
34
20
31
26
1
32 11
28
13
30
27
3
16
21
12
24
25
7
14
6
19
10
17
4
15
B
A
23
23
23
23
23 20
23 20 31
23 20 31
23 20 31
23 20 31
23 20 31
23 20 31
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1
23 20 31 26 1 32 11 28 13 30 27 3 16 21 12 24 25 7 14 6
23 20 31 26 1 32 11 28 13 30 27 3 16 21 12 24 25 7 14 6 19 10 17 4 15
Skip-bigram: significant differences
• Many more differences between systems than any manual metric
• No difference between human and 5 systems
![Page 35: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/35.jpg)
35
![Page 36: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/36.jpg)
36
Pairwise comparisons: Modified Pairwise comparisons: Modified PyramidPyramid
10
17
14
7
15
4
16
11
19
12
6
32
21
3
26
13
28
25 27 31 24 30 20 23
3 25 27 24 30 20 23
25 27 1 24 30 20 23
13 25 27 31 24 30 20 23
3 25 27 1 24 30 20 23
25 27 31 24 30 20 23
24 30 23
24 30 23
24 30 23
30 23
31 30 23
24 30 20 23
24 30 23
30 23
23
23
30 20 23
![Page 37: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/37.jpg)
37
Agreement between annotatorsAgreement between annotators
Overall Low High
Percent
Agreement
95% 90% 96%
Kappa .57 .46 .62
Alpha .57 .41 .59
Alpha-Dice .67 .49 .68
![Page 38: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/38.jpg)
38
Editing of participant annotationsEditing of participant annotations
To correct obvious errors Ensures uniform checking Predominantly involved correct splitting
unmatching SCUs Average paired differences
Original: 0.0043 Modified: 0.0005
Average magnitude of the difference Original: 0.0115 Modified: 0.0032
![Page 39: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/39.jpg)
39
Excluding weight 1 SCUsExcluding weight 1 SCUs
Removing weight 1 SCUs improves agreement Kappa: 0.64 (was 0.57)
Annotating without weight 1 has negligible impact on scores Set D324 done without weight 1 SCUs Ave.magnitude between paired differences
On average 0.07 difference
![Page 40: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/40.jpg)
40
Correlations: Pearson’s, 25 Correlations: Pearson’s, 25 systemssystems
Pyr-mod Resp-1 Resp2 R-2 R-SU4
Pyr-orig 0.96 0.77 0.86 0.84 0.80
Pyr-mod 0.81 0.90 0.90 0.86
Resp-1 0.83 0.92 0.92
Resp-2 0.88 0.87
R-2 0.98
![Page 41: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/41.jpg)
41
Correlations: Pearson’s, 25 Correlations: Pearson’s, 25 systemssystems
Pyr-mod Resp-1 Resp2 R-2 R-SU4
Pyr-orig 0.96 0.77 0.86 0.84 0.80
Pyr-mod 0.81 0.90 0.90 0.86
Resp-1 0.83 0.92 0.92
Resp-2 0.88 0.87
R-2 0.98
Questionable that responsiveness could be a gold standard
![Page 42: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/42.jpg)
42
Pyramid and responsivenessPyramid and responsiveness
Pyr-mod Resp-1 Resp2 R-2 R-SU4
Pyr-orig 0.96 0.77 0.86 0.84 0.80
Pyr-mod 0.81 0.90 0.90 0.86
Resp-1 0.83 0.92 0.92
Resp-2 0.88 0.87
R-2 0.98
High correlation, but the metrics are not mutually substitutable
![Page 43: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/43.jpg)
43
Pyramid and RougePyramid and Rouge
Pyr-mod Resp-1 Resp2 R-2 R-SU4
Pyr-orig 0.96 0.77 0.86 0.84 0.80
Pyr-mod 0.81 0.90 0.90 0.86
Resp-1 0.83 0.92 0.92
Resp-2 0.88 0.87
R-2 0.98
High correlation, but the metrics are not mutually substitutable
![Page 44: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/44.jpg)
44
Lessons LearnedLessons Learned
Comparing content is hard All kinds of judgment calls We didn’t evaluate the NIST assessors in previous years
Paraphrases VP vs. NP
Ministers have been exchanged Reciprocal ministerial visits
Length and constituent type Robotics assists doctors in the medical operating theater Surgeons started using robotic assistants
![Page 45: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/45.jpg)
45
Modified scores betterModified scores better
Easier peer annotation Can drop weight 1 SCUs
Better agreement No emphasis on splitting non-matching
SCUs
![Page 46: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/46.jpg)
46
Agreement between annotatorsAgreement between annotators
Participants can perform peer annotation reliably
Absolute difference between scores Original: 0.0555 Modified: 0.0617 Empirical prediction of difference 0.06
(HLT 2004)
![Page 47: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/47.jpg)
47
CorrelationsCorrelations
Original and modified can substitute for each other
High correlation between manual and automatic, but automatic not yet a substitute
Similar patterns between pyramid and responsiveness
![Page 48: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/48.jpg)
48
Current DirectionsCurrent Directions
Automated identification of SCUs (Harnly et al 05)
Applied to DUC05 pyramid data set
Correlation of .91 with modified pyramid scores
![Page 49: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/49.jpg)
49
QuestionsQuestions
What was the experience annotating pyramids?
Does it shed insight on the problem Are people willing to do it again? Would you have been willing to go through
training?
If you’ve done pyramid analysis, can you share your insights
![Page 50: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/50.jpg)
50
![Page 51: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/51.jpg)
51
Annotators Setid Alpha Dice Alpha-dice102:218 324 0.59 0.71 0.67108:120 400 0.45 0.72 0.53109:122 407 0.41 0.59 0.49112:126 426 0.54 0.74 0.63116:124 633 0.58 0.87 0.68121:125 695 0.51 0.75 0.61
102:123 324 0.6 0.82 0.69218:123 324 0.49 0.66 0.56
![Page 52: 1 The Pyramid Method at DUC05 Ani Nenkova Becky Passonneau Kathleen McKeown Other team members: David Elson, Advaith Siddharthan, Sergey Siegelman](https://reader035.vdocuments.net/reader035/viewer/2022062517/5697bf791a28abf838c8249a/html5/thumbnails/52.jpg)
52
Correlations of Scores on Correlations of Scores on Matched SetsMatched Sets
102:123 324 0.7 (.44-.85) 0.73 (.48-.87) 218:123 324 0.6 (.29-.80) 0.77 (.55-.89)
AnnotatorsSet Id Pearson's w/ Orig Pearson's w/ Modif102:218 324 0.76 (.54-.89) 0.83 (.66-.92) 108:120 400 0.84 (.67-.92) 0.89 (.77-.95) 109:122 407 0.92 (.83-.96) 0.91 (.80-.96) 112:126 426 0.9 (.78-.95) 0.95 (.90-.98) 116:124 633 0.81 (.62-.91) 0.78 (.57-.90) 121:125 695 0.91 (.81-.96) 0.92 (.83-.96)