risk-based attack surface approximation: how much data is enough? [icse - seip 2017]
TRANSCRIPT
![Page 1: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/1.jpg)
Risk-Based Attack Surface Approximation:
How Much Data is Enough?
Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams
North Carolina State University
Microsoft Research
![Page 2: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/2.jpg)
![Page 3: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/3.jpg)
Introduction
What is the “Attack Surface”? Quoting the Open Web Application
Security Project…
• All paths for data and commands in a software system
• The data that travels these paths
• The code that implements and protects both
Concept used for security effort prioritization.
3Introduction | Background | Methodology | Results | Conclusion
![Page 4: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/4.jpg)
4
Crashes represent activity that put the system under
stress.
Stack Traces tell us what happened.
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Risk-Based Attack Surface Approximation
(RASA)
Introduction | Background | Methodology | Results | Conclusion
![Page 5: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/5.jpg)
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
5
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Introduction | Background | Methodology | Results | Conclusion
![Page 6: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/6.jpg)
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
6
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Great! All done, right?
Introduction | Background | Methodology | Results | Conclusion
![Page 7: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/7.jpg)
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
7Introduction | Background | Methodology | Results | Conclusion
![Page 8: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/8.jpg)
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
8Introduction | Background | Methodology | Results | Conclusion
![Page 9: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/9.jpg)
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
9Introduction | Background | Methodology | Results | Conclusion
![Page 10: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/10.jpg)
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
10Introduction | Background | Methodology | Results | Conclusion
![Page 11: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/11.jpg)
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
– “We don’t have historical vulnerabilities to use as a
goodness measure.”
11Introduction | Background | Methodology | Results | Conclusion
![Page 12: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/12.jpg)
Research Questions
• RQ1: Can the RASA approach be implemented at the
source code file level with actionable results?
• RQ2: How does random sampling of crash dump stack
traces effect RASA?
12Introduction | Background | Methodology | Results | Conclusion
![Page 13: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/13.jpg)
Data Sources
• Mozilla Firefox
– ~1M crashes
– Vulnerability data from Mozilla Security
Blog and bug tracker
• Windows 8.1
– ~9M crashes
– Vulnerability data from internal data
sources
13Introduction | Background | Methodology | Results | Conclusion
![Page 14: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/14.jpg)
Methodology - RASA
14Introduction | Background | Methodology | Results | Conclusion
![Page 15: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/15.jpg)
Methodology - RASA
15Introduction | Background | Methodology | Results | Conclusion
![Page 16: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/16.jpg)
Methodology - RASA
16Introduction | Background | Methodology | Results | Conclusion
![Page 17: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/17.jpg)
Methodology - Sampling
17
10% of…
Introduction | Background | Methodology | Results | Conclusion
![Page 18: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/18.jpg)
Methodology - Sampling
18
10% of…20% of…
Introduction | Background | Methodology | Results | Conclusion
![Page 19: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/19.jpg)
Methodology - Sampling
19
10% of…20% of…
• Sample at each “level”
• Record stdev of files,
vulnerabilities covered
Introduction | Background | Methodology | Results | Conclusion
![Page 20: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/20.jpg)
20
12%
13%
14%
15%
16%
17%
70%
71%
72%
73%
74%
75%
Random Sample Size
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
![Page 21: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/21.jpg)
10%
12%
14%
16%
18%
20%
22%
24%
26%
30%
32%
34%
36%
38%
40%
42%
44%
46%
Random Sample Size
21Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
![Page 22: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/22.jpg)
Why Does Sampling Work?
• Crashes tend not to happen in isolation.
– If something crashes once, it will likely crash again.
• For Firefox, only 6 files in the data set with a vulnerability
had only one crash occurrence.
– Against ~300 vulnerable files, 50,000 total files
• If foo.cpp crashes many times, random sampling unlikely
to remove all foo.cpp’s from the dataset.
22Introduction | Background | Methodology | Results | Conclusion
![Page 23: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/23.jpg)
Future Work
• We have a list of vulnerable files; now what?
– Further prioritization to assist developers.
• We’re looking at:
– How the attack surface changes over time.
– How the complexity of the attack surface predicts
vulnerabilities.
– How proximity to the boundary of a software
system predicts vulnerabilities.
23Introduction | Background | Methodology | Results | Conclusion
![Page 24: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/24.jpg)
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
24Introduction | Background | Methodology | Results | Conclusion
![Page 25: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/25.jpg)
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
• “We don’t have that much data!”
– Orders of magnitude less data required compared
to previous studies.
25Introduction | Background | Methodology | Results | Conclusion
![Page 26: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/26.jpg)
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
26Introduction | Background | Methodology | Results | Conclusion
![Page 27: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/27.jpg)
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
• “We don’t have historical vulnerabilities to use as a
goodness measure.”
– Satisfied previous complaints with less data, naïve
sampling; evidence it will work on new systems.
27Introduction | Background | Methodology | Results | Conclusion
![Page 28: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]](https://reader034.vdocuments.net/reader034/viewer/2022052418/5aab10db7f8b9ac7548b457b/html5/thumbnails/28.jpg)
28
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
@theisencr
theisencr.github.io
Expected Graduation: May 2018Data Science, Security Analytics,
Security Education