sate 2010 background vadim okun, nist [email protected] october 1, 2010 the samate project

SATE 2010 Background

Vadim Okun, [email protected]

October 1, 2010

The SAMATE Projecthttp://samate.nist.gov/

2

Cautions on Using SATE Data• Our analysis procedure has limitations• In practice, users write special rules, suppress

false positives, and write code in certain ways to minimize tool warnings.

• There are many other factors that we did not consider: user interface, integration, etc.

• So do NOT use our analysis to rate/choose tools

Overview

3

X

X

X

X Human findings

CVE entriesProgram

BufLeakRace…

Security?Quality?Insignificant?False??

Tool ATool BTool C

X

Tools that work on source code

4

SATE 2010 timeline

• Choose test programs (2 C, 1 C++, 2 Java). Provide them to tool makers (28 June)

• Teams run their tools, return reports (30 July)• Analyze the tool reports (22 Sept)• Report at the workshop (1 Oct)• Teams submit a research paper (Dec)• Publish data (between Feb and May 2011)

5

Participating teams• Armorize CodeSecure• Concordia University Marfcat• Coverity Static Analysis for C/C++• Cppcheck• Grammatech CodeSonar• LDRA Testbed• Red Lizard Software Goanna• Seoul National University Sparrow • SofCheck Inspector• Veracode

– a service company

6

Test cases

• Dovecot: secure IMAP and POP3 server – C• Pebble: weblog server – Java• Wireshark - C• Google Chrome – C++• Apache Tomcat - Java

• All are open source programs• All have aspects relevant to security• From 30k LoC (Pebble) to 4.7M LoC (Chrome)

CVE-based pairs(vulnerable and fixed)

7

Tool reports

XML HTML DB

Original tool formats• Teams converted reports to

SATE format– Several teams also provided

original reports

• Described environment in which they ran tool

• Some teams tuned their tools• Several teams provided

analysis of their tool warnings

…

SATEformat

8

Analysis procedure

Tool warnings~60K warnings

Selectrandomly

Relatedto humanfindings

3 Selection Methods

Analyze forcorrectness

and associate

Analyzethe data

Selectedwarnings

Relatedto CVEs

9

Method 1 – Warning SelectionFor Dovecot and Pebble only

• We assigned severity if a tool did not

• Mostly avoid warnings with severity 5 (lowest)

• Statistically select from each warning class

• Select more warnings from higher severities

• Select 30 warnings from each of 10 tool reports– 1 report had only 6 warnings– Did not analyze Marfcat warnings

• Total is 276

10

Method 2 – Human findingsFor Dovecot and Pebble only

• Security experts analyze test cases

• A small number of findings– Root cause, with an example trace

• Find related warnings from tools

• Goal: focus our analysis on weaknesses found most important by security experts

11

Method 3 - CVEsFor Wireshark, Chrome, and Tomcat

• Identify the CVEs– Locations in code

• Find related warnings from tools

• Goal: focus our analysis on real-life exploitable vulnerabilities

12

Correctness categories

• True security weakness• True quality weakness• True but insignificant weakness• Weakness status unknown• Not a weakness

13

Differences from SATE 2009

• Add CVE-selected test cases

• Include a C++ test case

• Larger test cases: Chrome - 4.7 MLOC

• More correctness categories (true quality)

• More detailed guidelines for analysis

• Still, much can be improved…

14

Thanks

• Romain Gaucher, Ram Sugasi

• Aurelien Delaitre, Sue Wang, Paul Black, Charline Cleraux, and other SAMATE team members

• Most of all, the participating teams!

16

Questions

• What weaknesses exist in real programs?• What do tools report for real programs?• Do tools find important weaknesses?

• Focus on tools that work on source code• Defects that may affect security• Goal is NOT too choose the “best” tools• This is the 3rd SATE (1st in 2008)

17

SATE goals

• To enable empirical research based on large test sets • To encourage improvement and adoption of tools

• NOT to choose the “best” tools

18

SATE common tool output format

<weakness id=“23”>

<name cweid=“79”>SQL Injection</name>

<location id=“1” path=“dir/f.c” line=“71”/>

…

<grade severity=“2” probability=“0.5”/>

<output> Query is constructed

with user supplied input … </output>

…

</weakness>

one or moretraces

1 to 5, with 1 – the highest

optional

that it is true

…and other annotation

19

Lessons learned

• Guidelines for analysis often ambiguous – need to be refined even more

• Our analysis has inconsistencies and lapses

• Careful analysis takes longer than expected– We do not know the code well

• Tool interface is important to understand a weakness

20

Analysis procedure

• We cannot know all weaknesses in the test cases• Impractical to analyze all tool warnings

So analyze the following:

• Method 1. A subset of warnings from each tool report

• Method 2. Tool warnings related to manually identified weaknesses

21

SATE tool output format

• Common format in XML• For each weakness

– One or more trace - locations - line number and pathname

– Name of weakness and (optional) CWE id

– Severity: 1 to 5 (ordinal scale), with 1 – the highest

– Probability that the problem is true positive

– Original message from the tool

– And other annotation

22

Our analysis

• Correctness of warning

• Associate warnings that refer to the same (or similar) weakness

sate 2010 background vadim okun, nist [email protected] october 1, 2010 the samate project

Documents

tools slide

weakness slide

cves slide

related warnings

tool b tool c x tools

fixed slide

improved slide

tool makers