february 2010 8 things you cant afford to ignore about ediscovery
DESCRIPTION
8 Things You Can't Afford to Ignore About eDiscovery. Unstructured content is growing at an unprecedented rate, reaching 650% over five years, with Fortune 1000 companies managing petabytes of data. With electronically stored information (ESI) being formally covered under the Federal Rules of Civil Procedure (FRCP), organizations need new tools to effectively manage, analyze, and review ESI. This article presents 8 techniques and technologies that can be used to lower costs and improve litigation success.TRANSCRIPT
8 Things You Can’t Afford to
Ignore About eDiscovery
Brought to you by:John Wang, CCP
Product Manager and eDiscovery Specialist
February 25, 2010
AIIM 8 Things Series
About ZL Technologies
• Experts in Total Information Governance
– Unstructured Content Archiving
– eDiscovery
– Compliance
– Secure Email
– Scalability & Low TCO via Private Clouds
• Select Customers
About John Wang
• Experience / Roles
– 15+ years in Technology
i
• Degrees
– ..• Industry Participation
Product Manager Solutions Architect Developer
EDRM AIIM LexisNexis
• Project Leadership
• Search Guide
Co-author
• Research
proposal,
execution, and
presentation
• Certified
Concordance
Professional
M&T MBA Computer Science Finance
Agenda
1. Early Case Assessment
2. Data Mapping
3. Investigative eDiscovery
4. Concept Search
5. Non-Linear Review
6. Parallel Search
7. End-to-End eDiscovery
8. Cloud Computing
Overview
Did you know?
5 Year Enterprise Data Growth Estimate
85% will be Unstructured!
?
Sources: Gartner
Overview
• ESI is discoverable
• ESI volume is growing at 55+% annually*
• Litigation is increasing
– 42% US organizations expecting more litigation (from 34%)**
– 83% US organizations have been litigated against in 2008**
• Timelines have been shortened
• How do we handle this is an affordable way?
• Can we move from a reactive, bottom-up approach to a
strategic, top-down approach?
• This presentation shows us 8 technologies to do just that!
Sources:
* ESG
** Fulbright & Jaworski
Early Case Assessment
? Did you know?
In-house eDiscovery
Payback Period
Sources: Gartner, Merrill Lynch
Early Case Assessment
3 Questions
– Does the complaint have merit?
– How much will this cost us?
– What has the org learned?
Overview
– Estimate risk to prosecute or
defend a case
– Formulate resolution in first 90 -
120 days
– Examine key facts, allegations,
applicable laws and venues
– Analyze and assess potential
trial themes for both sides
– Pursue the best course
Item Achievement
Payback Period 3-6 months,or 1 large IP case
Litigation Success 76%**
Cost Reduction 50%**
0%
20%
40%
60%
80%
100%
Cost of E-Discovery Litigation Success Rate
Without ECA With ECA
Early Case Assessment Results
Sources:
** Cogent Research
Early Case Assessment
Assess ESI after Collection, Preservation, Processing and Analysis
VOLUME RELEVANCE
Electronic Discovery Reference Model / © 2009 / v2.0 / edrm.net
Information
ManagementIdentification
Preservation
Processing
Production Presentation
Collection
Review
Analysis
Traditional Post-Collection ECA
Early Case Assessment
VOLUME RELEVANCE
Electronic Discovery Reference Model / © 2009 / v2.0 / edrm.net
Information
ManagementIdentification
Preservation
Processing
Production Presentation
Collection
Review
Analysis
ECA “Now”
Compress timeline and assess before collection, reducing processing,
analysis and review
Early Case Assessment
Deployment
– In-house eDiscovery
– Allows faster and
iterative searching,
“going back to the
well”
Process
– Analysis
– Visualization
How does it affect you?
– Resolve cases faster
– Resolve cases more
favorably
– Reduce costs
Action Plan
– Evaluate solutions
– Try solutions on known
cases and case data
– Evaluate results
Data Mapping
Did you know?
Fortune 1000 Data per Firm
In potentially 100s of Repositories!
?
Sources: Industry Sources
Required by Rule 26(a)(1)(B)
• “… a copy of, or a description by
category and location of, all
documents, electronically stored
information, and tangible things”
• Requirements
– Repositories
– Types of ESI per repository
– Custodians
– Retention policy
– Preservation & disposition
– Legal hold enforcement
– Collection method
– Accessibility
Take Advantage of Rule 37(F)
• Provides defense against sanctions for “routine, good-faith operation of an electronic information system.”
Data Mapping
Spoliation “I’m Sorry” Sanctions
The Three Ss of eDiscovery
Data Mapping
How does it affect you?
– Reduce sanction risk
– Reduce overhead from 10 hrs
to 30 min / week
– Reduce costs
– Automate collections and
legal holds
– Work with BCP/DR and
InfoSec/DLP
Action Plan
– Evaluate current solution and
available solutions
– Analyze options if there is a
gap
Data Mapping
Legal Hold Notification
Culling
Collection
Legal Hold
Integrated Data Mapping
Exclusionary EDApproach
– Cull by Custodian
– Cull by Date
– Cull by File type
Limitations
– Blunt tool
– De-selects on secondary characteristics
– Find relevance late in process
– May need to go back to the source late in the process
– More false negatives as the collection grows
Investigative ED
• Approach– Cull by Matter
– Roots in Forensics
• Benefits– Finding highly relevant
information early in the process
– Finds information not necessarily tied to custodians, e.g. file server data
– Supports ECA
Investigative eDiscovery
Review
Cull by File type
Cull by Date
Cull by Custodian
Review
Cull by Matter
Investigative eDiscovery
How does it affect you?
– Higher Success Rates
– Lower Information Risk via Wider Safe Harbor
– Better results
– Successful ECA
Action Plan
– Evaluate past performance wrt initially missed relevant email
– Calculate cost
– Investigate options
Key Technologies
– Billion document search
engines
– Index in-place
– Cloud / GRID scalability
Investigative eDiscovery is based
on the science of forensics, an
older and more complete
approach than traditional
eDiscovery.
New technologies make
Investigative eDiscovery a reality
again.
Concept Search
Did you know?
Keyword Search
Missed Relevant Documents
?
Sources: Blair & Maron
Concept Search
• Attorneys and paralegals are not familiar with the terms in use
– Many words can be used to mean the same thing
– Organizations often create special “code words”
Subway Accident
Subway
Company
“unfortunate
incident”
Victims
“Disaster”
“event,” “incident,” “situation,” “problem,” “difficulty”
Concept Search
How does it affect you?
– Find more relevant
documents
– Discovery case facts faster
– Recommended by courts
and the Sedona
Conference
Action Plan
– Evaluate test cases
– Get review teams involved
for real world analysis
Year Technique
1763 Bayes Theory (Bayesian Inference)
1948 Shannon Entropy(Shannon Information Theory)
1951 K-Nearest Neighborhood
1988 Latent Semantic Indexing (LSI)
1999 Probabilistic LSI
2003 Latent Dirichlet Allocation
Actively Researched and Developed Technology
Non-Linear Review
Did you know?
Legal Review Productivity
Increased Productivity from Non-Linear Review
?
Sources: Deloitte, Industry Sources
Non-Linear Review
Traditional Linear eDiscovery
– Grouped by source, custodian,
date, etc.
– Like documents are scattered
– 10,000s of docs / case
Non-Linear Review
– Grouped by concept, near-
duplication
– Easy navigation via
visualization
– Less context switching
– Better sampling
– 1,000,000s of docs / case
Technologies– Clustering
– Auto-Classification
– Concept Search
– Visualization
0 5,000 10,000 15,000
eDiscovery Review Productivity
Non-Linear Review
How does it affect you?
• Faster review drives
– Lower costs
– Faster results
– Better results
– Successful ECA
Action Plan
– Evaluate current
process and costs
– Justify investigation
– Review options
Key Statistics
• 72% of attorneys say review is the
most expensive part of ED
• Review is up to 80% of ED costs
• Can save $187,500 on a 1.5 M
doc case
Traditional Linear Review
Non-Linear Review
Parallel Search
Did you know?
Keyword Search is still advancing?
Term searches – in seconds to minutes
?
Source: Gartner
Parallel Search
• Keywords
• User names
• Email addresses
• Patent numbers
• SSNs
• etc…
How does it affect you?
– Take the guesswork out of choosing keywords
– Run queries as simulations
– Supports wildcard search, proximity search, etc.
Action Plan
– Review complex searches
– See if parallel search can provide new insights that could not be economically performed before.
Search
100,000 terms across
billions of documents
in seconds to minutes…
End-to-End eDiscovery
Did you know?
eDiscovery Vendors
Offering Products and Services
?
Sources: Socha-Gelbmann 2009 E-Discovery Survey
VOLUME RELEVANCE
Electronic Discovery Reference Model / © 2009 / v2.0 / edrm.net
Information
ManagementIdentification
Preservation
Processing
Production Presentation
Collection
Review
Analysis
End-to-End eDiscovery
Typical Archive Initial Search Review toolCase Analytics
3.5 days to
index 30TB
3 days to
index 1.1TB
4 days to
export 2M docs
• 25% of vendors (150+) will disappear by 2011
• More vendors are entering eDiscovery than leaving
Single / Multi-Vendor
End-to-End eDiscovery
• No data transfer between initial collection, review, and production
• No incompatibilities or inter-stage processing time delays
VOLUME RELEVANCE
Electronic Discovery Reference Model / © 2009 / v2.0 / edrm.net
Information
ManagementIdentification
Preservation
Processing
Production Presentation
Collection
Review
Analysis
Single Platform
End-to-End eDiscovery
• True End-to-End eDiscovery
is:
– Single platform
• Benefits
– Integrated Data Map &
Legal Hold
– Single Collection
– Enterprise-wide search in
review platform
– No intermediate
Productions
• Bottom Line
– Cost and Time Savings
How does it affect you?
– Faster
– More Reliable
– Lower Cost
– Institutional Memory
Action Plan
– Evaluate current process
and costs
– Justify investigation
– Review options
Cloud Computing
Did you know?
Cloud Computing
Market Forecast by 2011 & 2013!
?
Sources: Gartner, Merrill Lynch
Cloud Computing
Industry hype?• Today:
– $56 billion
– 3% of enterprises using cloud
• By 2013:
– $150 billion market?
– 50+% of email archiving in the cloud?
Sources: Gartner, Forrester
Cloud Computing
Industry hype?• Today:
– $56 billion
– 3% of enterprises using cloud
• By 2013:
– $150 billion market?
– 50+% of email archiving in the cloud?
The Good, The Bad, and The Solution …
Sources: Gartner, Forrester
The Good
1. Lower Cost
– Only pay for what you use
2. Scalability
– GRID / MapReduce
3. Increased Storage
– Virtualized file system
4. Flexibility
– Deploy new capability quickly
5. Automation
– Less manpower requirement
6. More mobility
– Inside and outside counsel
Cloud Computing
The Good
1. Lower Cost
– Only pay for what you use
2. Scalability
– GRID / MapReduce
3. Increased Storage
– Virtualized file system
4. Flexibility
– Deploy new capability quickly
5. Automation
– Less manpower requirement
6. More mobility
– Inside and outside counsel
The Bad
1. Guaranteed service levels
– Some have no guarantees
– Data not under your control
2. Security & shared tenancy
– Provider capabilities vary
– Also may have no guarantees
3. Chain of custody
– Forensic examination?
4. Lock-in and pricing
– Ability to get data out?
5. Current adoption
– Only 3% of business users!
Cloud Computing
The Solution
Private Cloud Computing
• What is it?– Cloud infrastructure deployed
in-house
• Added Benefits– Secure
– QoS / SLA
Cloud Computing
How does it affect you?
• Faster review drives
– Lower costs
– Better resource utilization
– Scales for one time projects
Action Plan
– Check internal cloud strategy
– Run savings figuressIT Organizations Will Spend More
Money on Private Cloud Computing
Investments Than on Offerings From
Public Cloud Providers Through 2012
Gartner
8 Things You Can’t Afford to Ignore
with eDiscovery
1. Early Case Assessment
2. Data Mapping
3. Investigative eDiscovery
4. Concept Search
5. Non-Linear Review
6. Parallel Search
7. End-to-End eDiscovery
8. Cloud Computing
More Information
• http://aiim.typepad.com/
• http://www.zlti.com/
ZL Technologies
• Experts in Total Information
Governance
– Unstructured Content
Archiving
– eDiscovery
– Compliance
– Secure Email
– Scalability & Low TCO via
Private Clouds