sate 2010 background vadim okun, nist [email protected] october 1, 2010 the samate project
TRANSCRIPT
SATE 2010 Background
Vadim Okun, [email protected]
October 1, 2010
The SAMATE Projecthttp://samate.nist.gov/
2
Cautions on Using SATE Data• Our analysis procedure has limitations• In practice, users write special rules, suppress
false positives, and write code in certain ways to minimize tool warnings.
• There are many other factors that we did not consider: user interface, integration, etc.
• So do NOT use our analysis to rate/choose tools
Overview
3
X
X
X
X Human findings
CVE entriesProgram
BufLeakRace…
Security?Quality?Insignificant?False??
Tool ATool BTool C
X
Tools that work on source code
4
SATE 2010 timeline
• Choose test programs (2 C, 1 C++, 2 Java). Provide them to tool makers (28 June)
• Teams run their tools, return reports (30 July)• Analyze the tool reports (22 Sept)• Report at the workshop (1 Oct)• Teams submit a research paper (Dec)• Publish data (between Feb and May 2011)
5
Participating teams• Armorize CodeSecure• Concordia University Marfcat• Coverity Static Analysis for C/C++• Cppcheck• Grammatech CodeSonar• LDRA Testbed• Red Lizard Software Goanna• Seoul National University Sparrow • SofCheck Inspector• Veracode
– a service company
6
Test cases
• Dovecot: secure IMAP and POP3 server – C• Pebble: weblog server – Java• Wireshark - C• Google Chrome – C++• Apache Tomcat - Java
• All are open source programs• All have aspects relevant to security• From 30k LoC (Pebble) to 4.7M LoC (Chrome)
CVE-based pairs(vulnerable and fixed)
7
Tool reports
XML HTML DB
Original tool formats• Teams converted reports to
SATE format– Several teams also provided
original reports
• Described environment in which they ran tool
• Some teams tuned their tools• Several teams provided
analysis of their tool warnings
…
SATEformat
8
Analysis procedure
Tool warnings~60K warnings
Selectrandomly
Relatedto humanfindings
3 Selection Methods
Analyze forcorrectness
and associate
Analyzethe data
Selectedwarnings
Relatedto CVEs
9
Method 1 – Warning SelectionFor Dovecot and Pebble only
• We assigned severity if a tool did not
• Mostly avoid warnings with severity 5 (lowest)
• Statistically select from each warning class
• Select more warnings from higher severities
• Select 30 warnings from each of 10 tool reports– 1 report had only 6 warnings– Did not analyze Marfcat warnings
• Total is 276
10
Method 2 – Human findingsFor Dovecot and Pebble only
• Security experts analyze test cases
• A small number of findings– Root cause, with an example trace
• Find related warnings from tools
• Goal: focus our analysis on weaknesses found most important by security experts
11
Method 3 - CVEsFor Wireshark, Chrome, and Tomcat
• Identify the CVEs– Locations in code
• Find related warnings from tools
• Goal: focus our analysis on real-life exploitable vulnerabilities
12
Correctness categories
• True security weakness• True quality weakness• True but insignificant weakness• Weakness status unknown• Not a weakness
13
Differences from SATE 2009
• Add CVE-selected test cases
• Include a C++ test case
• Larger test cases: Chrome - 4.7 MLOC
• More correctness categories (true quality)
• More detailed guidelines for analysis
• Still, much can be improved…
14
Thanks
• Romain Gaucher, Ram Sugasi
• Aurelien Delaitre, Sue Wang, Paul Black, Charline Cleraux, and other SAMATE team members
• Most of all, the participating teams!
15
16
Questions
• What weaknesses exist in real programs?• What do tools report for real programs?• Do tools find important weaknesses?
• Focus on tools that work on source code• Defects that may affect security• Goal is NOT too choose the “best” tools• This is the 3rd SATE (1st in 2008)
17
SATE goals
• To enable empirical research based on large test sets • To encourage improvement and adoption of tools
• NOT to choose the “best” tools
18
SATE common tool output format
<weakness id=“23”>
<name cweid=“79”>SQL Injection</name>
<location id=“1” path=“dir/f.c” line=“71”/>
…
<grade severity=“2” probability=“0.5”/>
<output> Query is constructed
with user supplied input … </output>
…
</weakness>
one or moretraces
1 to 5, with 1 – the highest
optional
that it is true
…and other annotation
19
Lessons learned
• Guidelines for analysis often ambiguous – need to be refined even more
• Our analysis has inconsistencies and lapses
• Careful analysis takes longer than expected– We do not know the code well
• Tool interface is important to understand a weakness
20
Analysis procedure
• We cannot know all weaknesses in the test cases• Impractical to analyze all tool warnings
So analyze the following:
• Method 1. A subset of warnings from each tool report
• Method 2. Tool warnings related to manually identified weaknesses
21
SATE tool output format
• Common format in XML• For each weakness
– One or more trace - locations - line number and pathname
– Name of weakness and (optional) CWE id
– Severity: 1 to 5 (ordinal scale), with 1 – the highest
– Probability that the problem is true positive
– Original message from the tool
– And other annotation
22
Our analysis
• Correctness of warning
• Associate warnings that refer to the same (or similar) weakness