sthomas slides
TRANSCRIPT
![Page 1: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/1.jpg)
Modeling the Evolution of Topics in Source Code Histories
Stephen W. Thomas
Bram Adams
Ahmed E. Hassan
Dorothea Blostein
![Page 2: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/2.jpg)
[2]
![Page 3: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/3.jpg)
[3]Time
Pop
ular
ity
Linux Development
Audio Codecs
What have the Skype developers been
interested in?
Microsoft manager
![Page 4: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/4.jpg)
[4]
What are developers working on?
Option 1: Speak with every developer
Time
Pop
ular
ity
Linux Development
Audio Codecs
Option 2: Use automated tool
![Page 5: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/5.jpg)
[5]
Tool: Topic Evolution Models
…Topic “Linux”
… Topic “codec”Topi
c P
opul
arity
Time
V1.0 V1.1 V1.2 V2.0 V4.0
Topic “GUI”…
Applied to Source Code Histories
![Page 6: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/6.jpg)
[6]
Success in Other Domains
Email Archives
Conference Proceedings Newspaper Articles
![Page 7: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/7.jpg)
[7]
Topic Evolution on Source Code
![Page 8: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/8.jpg)
Topic Model
Mapping Topics Over Time
Background: The Hall Model
[8]
V1.0V1.1
V1.2V1.3
![Page 9: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/9.jpg)
XMLFile I/O
XMLGUI
GUIFile I/O
XMLFile I/O
XMLFile I/O
XMLFile I/O
XMLFile I/O
XMLGUI
XMLGUI
XMLGUI
XMLGUI
GUIFile I/O
[9]
V1 V2 V3 V4 V5
File
ID
Topic 1: XMLTopic 2: GUITopic 3: File I/O
Expect:
XMLFile I/O
XMLGUI
GUIFile I/O
XMLFile I/O
XMLFile I/O
XMLFile I/O
XMLFile I/O
XMLGUI
XMLGUI
XMLGUI
XMLGUI
GUIFile I/O
Topic 1: XML+ File I/OTopic 2: XML + GUITopic 3: GUI+ File I/O
Get:
Topic 1
Topic 3
Topic 2
Problem: Topics are muddled, not distinct
![Page 10: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/10.jpg)
[10]
Pop
ular
ity
File I/O
XMLGUIExpect:
V1 V2 V3 V4 V5
File
ID
XMLFile I/O
XMLGUI
GUIFile I/O
XMLFile I/O
XMLFile I/O
XMLFile I/O
XMLFile I/O
XMLGUI
XMLGUI
XMLGUI
XMLGUI
GUIFile I/O
XMLGUI
Problem: Evolutions not sensitive or accurate
Pop
ular
ity Get:
Topic 3Topic 1
Topic 2
Topic 1
Topic 3
Topic 2
Topic 2
![Page 11: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/11.jpg)
[11]
Problems due to duplication
Topics are muddled, not distinct
Evolutions are not accurate
Found in Source Code Histories
![Page 12: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/12.jpg)
63% files don’t change
84% files don’t change
99.8% words don’t change
99.8% words don’t change
[12]
JHotDraw
Real-World Duplication
![Page 13: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/13.jpg)
The Diff Model
[13]
![Page 14: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/14.jpg)
Topic Model
MappingTopics Over
TimeDiff Reconstruction
Step
The Diff Model
[14]
V1.0V1.1
V1.2V1.3
![Page 15: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/15.jpg)
...
if (vacstmt->options & VACOPT_VACUUM){ PreventTransactionChain(isTopLevel, stmttype); in_outer_xact = false;}...
...// Don’t run VACUUM in user transition block!if (vacstmt->options & VACOPT_VACUUM){ PreventTransactionChain(isTopLevel, stmttype); in_inner_xact = false;}...
Version 5.3.7 Version 5.3.8
// Don’t run VACUUM in user transition block!in_inner_xact = false;
Diff
in_outer_xact = false;
Deleted lines Added lines
Diff Step
[15]
![Page 16: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/16.jpg)
[16]
GUI (77%)XML (23%)
SecondVersion
FirstVersion
GUI (90%)XML (10%) =- +
Reconstructing Topic Memberships
(1000 * 90%) - (200*100%) + (150*20%) = 730
?
(950 lines)(150 lines)(200 lines)(1000 lines)
(1000 * 10%) - (200*0%) + (150*80%) = 220
Topic Model
DeletedLines
GUI (100%)XML (0%)
Topic Model
AddedLines
GUI (20%)XML (80%)
Topic Model Infer
![Page 17: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/17.jpg)
Case Studies
[17]
JHotDraw
Drawing Application Framework (Java)
13 releases (5.2.0 – 7.5.1)613 files84K SLOC
Database Management System(C)
46 releases (7.0.0 – 8.3.5)844 files501K SLOC
![Page 18: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/18.jpg)
I bet the Diff model discovers topics that are more distinct!
[18]
![Page 19: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/19.jpg)
High KL divergence High distinctness [19]
Measuring Distinctness
xml fopen button element menu fclose attribute
Wor
d P
roba
bilit
y XML topic
GUI topic
xml fopen button element menu fclose attribute
Wor
d P
roba
bilit
y
xml fopen button element menu fclose attribute
Wor
d P
roba
bilit
y XML + File IO topic
xml fopen button element menu fclose attribute
Wor
d P
roba
bilit
y XML + GUI topic
Low KL divergence Low distinctness
With KL-Divergence
![Page 20: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/20.jpg)
[20]
Average Topic Distinctness
Topic 1Topic 2Topic 3Topic 4Topic 5…Topic K
Topic 1Topic 2Topic 3Topic 4Topic 5…Topic K
Hall TopicsTopic 1Topic 2Topic 3Topic 4Topic 5…Topic K
Topic 1Topic 2Topic 3Topic 4Topic 5…Topic K
Diff Topics
![Page 21: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/21.jpg)
+32% +38%
Diff makes more distinct topics
[21]
JHotDraw
![Page 22: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/22.jpg)
[22]
Topics are muddled, not distinct
Evolutions are not accurate
Diff makes more distinct topics
Problems due to duplicationFound in Source Code Histories
![Page 23: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/23.jpg)
I bet the Diff model discovers more accurate topic evolutions
[23]
![Page 24: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/24.jpg)
[24]
No oracle dataset
Measuring Accuracy
Create simulatedscenario by handTruth known
1.
Manually investigateevolutions in JHotDraw and PSQL
2.
Truth learned
![Page 25: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/25.jpg)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
copy copy copy copy copy copy copy copy copy
PSQLbackend.access
Simulated Project
[25]
![Page 26: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/26.jpg)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
3 files from PSQLtimezone
Simulated Scenario 1
timezone topic
[26]
![Page 27: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/27.jpg)
Manual Investigation
[27]
Topic 1
2. Validate against project documentation (commit logs, release notes, etc.)
1 .Select change events
![Page 28: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/28.jpg)
Diff makes more accurate topics
[28]
+25% precision
SimulatedProject
+33% precision
JHotDraw
+47% precision
+100% recall
![Page 29: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/29.jpg)
[29]
Topics are muddled, not distinct
Evolutions are not accurate
Diff makes more distinct topics
Diff makes more accurate evolutions
Problems due to duplicationFound in Source Code Histories
![Page 30: Sthomas slides](https://reader035.vdocuments.net/reader035/viewer/2022062503/58edc5991a28abd8508b4595/html5/thumbnails/30.jpg)
[30]
Summary