hlt 2013 - triaging foreign language documents for medex by brian carrier

23
Triaging Foreign Language Documents for MEDEX Brian Carrier VP Digital Forensics Basis Technology

Upload: basis-technology

Post on 05-Dec-2014

397 views

Category:

Technology


0 download

DESCRIPTION

When digital forensics investigators come across multilingual documents during an examination, how do they quickly check the content without a translator in the room? Basis Technology has built a document triage solution that integrates entity finding and translation capabilities with navigation to quickly help the examiner identify the priority of the document. This solution is a module for Autopsy, which is an open source digital forensics platform that has thousands of users and contributors.

TRANSCRIPT

Page 1: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Triaging Foreign Language Documents for MEDEX

Brian Carrier

VP Digital Forensics

Basis Technology

Page 2: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Scenarios / Problem Statement

1. Media triage is performed in the field. Triage reveals dozens of non-English documents. The translator is busy talking with the suspect.

2. Medium-dive analysis is performed at a base. Even more documents are found. Limited translators are available.

How does examiner / operator prioritize the documents for the translator?

Page 3: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Ideal Solution: Translated Gist

▪ A several page non-English document turns into an English executive summary.

▪ Allow user to understand who, what, and where are mentioned.

▪ No one provides that solution today.

Page 4: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Our Proposed 70% Solution

▪ Show human generated gists when they are known.

▪ Use Rosette Named Entity software to find names of people, places, and organizations:

– Who and where

▪ Use name matching software to identify people on watch lists.

▪ Use dictionaries to find concepts (financial, drugs, IED).

– What

▪ Use graphical techniques to show relationships and context.

Page 5: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Names

▪ Rosette® Entity Extractor: – Uses statistical models, regular expressions, and

gazetteers to find names.

– Works on 17 languages.

▪ Rosette® Name Translator:– Translates names from native language to English.

– Uses linguistic algorithms, dictionaries, and statistical inference.

Page 6: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Concept Dictionary

▪ User generated dictionary based on concepts that are important to them.

▪ Contains both native word and English words.

▪ Text in documents are normalized using Rosette Base Linguistics.

▪ Concepts are identified in native or English.

Page 7: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Navigation Techniques

▪ Goals:– Provide summary of names and concepts.

– Provide context to know what was mentioned nearby.

▪ This is an area of research to find an approach that works best.

Page 8: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Concise, but no context

Page 9: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Prototype Interface 1

Page 10: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Prototype Interface 2

Page 11: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Deployment Platform

▪ Autopsy™ is an open source digital forensics platform.

▪ Development started after our first Open Source Digital Forensics Conference (OSDFCon) in 2010.

▪ Community wanted an end-to-end platform instead of many stand-alone tools.

▪ Version 3.0 was released in September 2012.

▪ Received some US Army funding.

Page 12: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Autopsy 3

Page 13: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Autopsy Capabilities

▪ Ingests hard drives, media cards, and other digital media.

▪ Identifies suspicious files based on:

– Keywords

– Hash databases

– File types

▪ Allows operator to quickly focus on recent user activity:

– Web artifacts

– E-mail

▪ Provides fast results to enable field-based scenarios.

Page 14: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Autopsy Extensibility

▪ Ingest Modules analyze media on import– Hash analysis, keyword search, registry, web artifacts

▪ Content viewers display files– Text, image, text analytics, video triage, …

▪ Report modules generate final reports– HTML, XML, …

Page 15: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Text Gisting Module

Page 16: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Another Module Example

Page 17: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Scenario: USB-based Triage

▪ USB drive from media triage.– Logical files are added to Autopsy.

– User can navigate all documents and images.

Page 18: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Review with Text Gist module

Page 19: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Tag High Priority Files

Page 20: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Translator Focuses on Tagged Files

Page 21: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Scenario 2: Medium Dive

▪ Media card, hard drive, or cell phone are added.

▪ File system is analyzed.

▪ User navigates media using:– Hash lookup

– Keyword search

– Web browser activity

– E-mail analysis

▪ Uses triage module to evaluate documents as they are found.

▪ Uses tags to flag priority files.

Page 22: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

80% Solution

▪ Entity resolution integration.

▪ Topic classifiers.

▪ More advanced analysis relating concepts and entities.

▪ More advanced interface approaches.

Page 23: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Questions?

Brian Carrier

VP of Digital Forensics

Basis Technology

617-386-2000