bringing science to digital forensics with standardized ...simson.net/ref/2011/2011-03-09 nps...
TRANSCRIPT
Bringing Science to Digital Forensics with Standardized Forensic Corpora.
Digital Evaluation and Exploitation (DEEP) Grouphttp://domex.nps.edu/February 2010
1
NPS is the Navyʼs Research University.
Location: " Monterey, CACampus Size: "627 acres
Students: 1500§ US Military (All 5 services)§ US Civilian (Scholarship for Service & SMART)§ Foreign Military (30 countries)§ All students are fully funded
Schools:§ Business & Public Policy§ Engineering & Applied Sciences§ Operational & Information Sciences§ International Graduate Studies
2
Digital Forensics is at a turning point.Yesterdayʼs work was primarily reverse engineering.
Key technical challenges:§ Evidence preservation.§ File recovery (file system support); Undeleting files§ Encryption cracking.§ Keyword search.
3
Digital Forensics is at a turning point.Todayʼs work is increasingly scientific.Evidence Reconstruction§ Files (fragment recovery carving)§ Timelines (visualization)
Clustering and data mining
Social network analysis
Sense-making
4
Drives #74 x #77
25 CCNS
in common
Drives #171 & #172
13 CCNS
in common
Drives #179 & #206
13 CCNS
in common
Same Community College
SameMedical Center
SameCar Dealership
Science requires the scientific process.
Hallmarks of Science:§ Controlled and repeatable experiments.§ No privileged observers.§ Publication of data and results.§ Sharing of scientific materials.
Today's Digital Forensics is not Scientific!§ Researchers work on their own data
—Data can't be shared with other researchers (privacy)—Data can't be published (copyright)
§ Results can't be meaningfully compared.
5
Our solution: Standardized Corpora for Digital Forensics Research. "Standardized"§ Known contents§ Documented provenance
"Corpora"§ Many data sets§ Realistic — lifelike, but no Personally Identifiable Information (PII)§ Real — Public and Private
"Digital Forensics Research"§ Created to enable research§ Legally obtained (c.f. wiretap law)§ Publishable results§ Specific attention to privacy and copyright issues
6
UNCLASSIFIED
UNCLASSIFIED
Test Data§ Constructed for the purpose of testing a specific feature.§ CFReDS “Russian Tea Room floppy disk image” to validate Unicode search & display.
Sampled Data§ A subset of a large data source — e.g., sampled web pages or packets.§ Hard to randomly sample.
Realistic Data§ Not “real” — made in a lab, not in the field.
Real and Restricted Data§ Created by actual human beings during activities that were not performed for the purpose
of creating forensic data. § Controlled for privacy reasons.
Real but Unrestricted§ Released for some reason. e.g. the Enron Email Dataset§ Photos on Flickr; User profiles on Facebook.
7
Many different kinds of forensic corpora are needed.
1 million(*) documents from US Government web servers§ Specifically for file identification, data & metadata extraction.§ Found by random word searches on Google & Yahoo§ DOC, DOCX, HTML, ASCII, SWF, etc.
Free to use; Free to redistribute§ No copyright issues — US Government work is not copyrightable.§ Other files have simply been moved from one USG webserver to another.§ No PII issues — These files were already released.
Distribution format: ZIP files§ 1000 ZIP files with 1000 files each.§ 10 “threads” of 1000 randomly chosen files for student projects.§ Full provenance for every file (how found; when downloaded; SHA1; etc.)
______________________(*Approximately 3000 files redacted after release.)
8
http://domex.nps.edu/corp/files/govdocs1:1 Million files available now
034164.jpg
Test Images — Designed to demonstrate a particular aspect§ nps-2009-hfstest1" (HFS+)§ nps-2009-ntfs1 " (NTFS)
Realistic Images — Like real life, but no personally identifiable info.§ nps-2009-canon2" (FAT32)§ nps-2009-UBNIST1" (FAT32)§ nps-2009-casper-rw " (embedded EXT3) § nps-2009-domexusers" (NTFS)
Each image has:§ Narrative of how the image was created and expected uses. § Image file in RAW/SPLITRAW, AFF and E01 formats§ SHA1 of raw image§ “Ground truth” report
9
http://domex.nps.edu/corp/images/nps/"Test" and "Realistic" disk images
Typical scenarios include:§ Distribution of simulated pornography ("kitty porn.")§ Theft of corporate data.
Nitroba University:§ University harassment case
m57 theft§ Theft of corporate data
m57 patents§ 3 week simulation of a small business§ Four computers§ Daily disk and memory images§ Complete Network Packet Capture
10
http://domex.nps.edu/corp/scenarios/Complete Scenarios
The Real Data Corpus: "Real Data from Real People."Most forensic work is based on “realistic” data created in a lab.We get real data from CN, IN, IL, MX, and other countries.
Real data provides:§ Real-world experience with data management problems.§ Unpredictable OS, software, & content§ Unanticipated faults
We have multiple corpora:§ Non-US Persons Corpus§ US Persons Corpus (@Harvard)§ Releasable Real Corpus§ Realistic Corpus
IRB approval required for federally funded research.
11
UNCLASSIFIED
UNCLASSIFIED
Real Data Corpus: Current Status
12
Country HDs Flash Optical GB (uncomp)BA 7 38CA 73 1 1,064CE 1 82CH 2 5CN 143 568 98 3,627DE 36 1 755GR 13 27IL 229 4 2,226IN 317 66 19,540MX 175 1,110NZ 1 4PS 98 957TH 1 13UA 22 55
1,118 643 98 30,008
UNCLASSIFIED
UNCLASSIFIED
RDC has been provided to a range of researchers.
Received and satisfied data sharing request for Real Data:§ CMU Software Engineering Institute.§ AccessData§ I.D.E.A.L. Technology
Pending Agreements:§ University of Texas San Antonio§ University of California, Santa Cruz§ Georgetown University
Data sharing for use in training:§ West Point§ DC3/DCCI§ CMU Computer Science Department
13
Conclusion: Digital forensics needs digital corpora!National Research Council 2009 Report found a lack of “science” in forensics...§ “Substantive information and testimony based on faulty forensic science analysis may
have contributed to wrongful convictions of innocent people...§ “Moreover, imprecise or exaggerated expert testimony has sometimes contributed to the
admission of erroneous or misleading evidence.”—National Research Council, 2009
Contact Information:§ http://domex.nps.edu/deep§ Joshua B. Gross <[email protected]>§ Simson L. Garfinkel <[email protected]>
Questions?
14
PREPUBLICATION COPY
STRENGTHENING FORENSIC SCIENCE IN
THE UNITED STATES:
A PATH FORWARD
Committee on Identifying the Needs of the Forensic Science Community
Committee on Science, Technology, and Law
Policy and Global Affairs
Committee on Applied and Theoretical Statistics
Division on Engineering and Physical Sciences
THE NATIONAL ACADEMIES PRESS
Washington, D.C.
www.nap.edu
February 2009
!"#$%&'(&)*+#,-.#/0%1('$#/0%"-$%*((0%&'/1#2(2%./%."(%&)*+#,%
./%3-,#+#.-.(%.#4(+5%-,,($$%./%."(%,/44#..((6$%'(&/'.7%
8+."/)9"%."(%$)*$.-0,(%/3%."(%'(&/'.%#$%3#0-+:%(2#./'#-+%
,"-09($%;#++%*(%4-2(%."'/)9"/).%."(%.(<.:%-02%."(%
,#.-.#/0$%;#++%*(%,"(,=(2%*(3/'(%&)*+#,-.#/07%