draining the swamp how to plan and practice defensible disposition richard medina, doculabs january...
TRANSCRIPT
Draining the SwampHow to Plan and Practice Defensible Disposition
Richard Medina, DoculabsJanuary Greater Chattanooga Area Chapter ARMA Meeting
January 13, 2015
© Doculabs, Inc. 2014
How to Plan and Practice Defensible Disposition• This session explains how to tackle the monster problem of over-retention of
electronic information. Most organizations hoard and fail to destroy their piles of files in a legally defensible manner when business and law allow. The session shows how to develop and execute the four most important steps in defensible disposition: the Defensible Disposition Policy, Assessment Plan, Technology Plan, and Disposition Plan. It outlines business case development and tool selection.
Takeaways:1. Learn how to develop and execute the four steps in defensible disposition: the
Defensible Disposition Policy, Assessment Plan, Technology Plan, and Disposition Plan.
2. Learn which types of tools and technologies to use to analyze, sort, retain, and defensibly dispose of your information.
3. Learn how to develop a rigorous business case for defensible disposition.
2Escape Now While You Can
© Doculabs, Inc. 2014
3Doculabs
Doculabs is a strategy consulting firm. Our clients rely on us to help them improve the way they manage information. We provide services such as developing strategic roadmaps and business cases, program management, and content migration assistance. Our consultants are experts in helping clients manage content such as Office documents, web content, email, customer communications, and records to improve operations, lower costs, increase revenue, and reduce risk.
Differentiators• 20+ years of information management experience• Objective recommendations • Provide empirical data from over 1,200 engagements• More than 550 customers in financial services,
insurance, energy, manufacturing, and life sciences
© Doculabs, Inc. 2014
Richard Medina
• Co-Founder and a Principal Consultant at Doculabs.
• In my 20+ years with Doculabs, I’ve consulted for organizations in a wide range of industries, including financial services, insurance, communications, utilities, and government.
• 312-953-9983
• blog: richardmedinadoculabs.com
• LinkedIn, Twitter
• www.doculabs.com
4
© Doculabs, Inc. 2014
Issues
1. The problem– The sky is falling again
2. Break it into two problems– Day-forward versus historical content
3. How to address historical content– A defensible disposition methodology
4. Analysis and classification technology– Should you use it? Does it work?
5. Content assessment and disposition process– Approaches and results
5
© Doculabs, Inc. 2014
Map of the Territory 6
ENTERPRISE INFORMATION MANAGEMENT (EIM)How the organization uses all information assets to achieve business goals
ENTERPRISE INFORMATION MANAGEMENT (EIM)How the organization uses all information assets to achieve business goals
ENTERPRISE CONTENT MANAGEMENT (ECM)
How the organization uses its unstructured content (including documents and
collaborative/social content) to achieve its business goals
ENTERPRISE DATA MANAGEMENT
How the organization uses structure data (in databases) to achieve its
business goals
INFORMATION GOVERNANCE
RECORDS MANAGEMENTHow the organization manages its information to ensure compliance with recordkeeping
laws and regulations
E-DISCOVERYHow the organization finds, preserves, and produces information as needed in
response to litigation, investigations, or other discovery requests
INFORMATION SECURITY AND PROTECTIONHow the organization manages its information to ensure compliance with
privacy and security laws and regulations and protect against loss or misuse
d
© Doculabs, Inc. 2014
• Information governance is the control of information to meet your legal, regulatory, and business requirements. (Robert Smallwood)
– Great start because it's accurate and simple -- it avoids the trap of being a laundry list written in legalese.
• Information governance is the control of information to meet your legal, regulatory, and business risk requirements.– IG doesn't address all your business demands -- its primary focus is on
"defensive" business requirements as opposed to "offensive" business requirements.
– IG’s primary focus should be on controlling the risks and costs (primarily risk-related costs) of your information.
7What’s the Scope of Information Governance?
© Doculabs, Inc. 2014
1. The digital landfill problem. – 50, 100, or 1K TBs – or 10K PBs of files all over the place in your various
systems– How do you sort through it and responsibly retain or dispose appropriately
within your budget constraints?
2. The “systems of engagement” fragmentation problem. – How do you do IG on your dynamic, sometimes chaotic “systems of
engagement”? They use social media, mobile devices, and the cloud.– Your problem has three parts:
1. How do you meet your IG demands with your internal use of systems of engagement which you use for collaboration, interactive community building, etc.?
2. How do you meet your IG demands with your use of external SOE beyond the firewall, with customers, vendors, and the public?
3. How do you meet your IG demands in how you’re integrating your evolving SOE into your more mature systems of record, which help to run your core line of business processes?
3. The discovery problem. – How do you prepare for and respond to regulatory audit, litigation and
other discovery, given #1 and #2 above?
8Three Big IG Challenges for 2015
© Doculabs, Inc. 2014
Issues
1. The problem– The sky is falling again
2. Break it into two problems– Day-forward versus historical content
3. How to address historical content– A defensible disposition methodology
4. Analysis and classification technology– Should you use it? Does it work?
5. Content assessment and disposition process– Approaches and results
9
© Doculabs, Inc. 2014
Organizations have been over-retaining electronic information and failing to dispose of it in a legally defensible manner when business and law will allow
Retaining everything forever
Disposing of everything immediately
Having employees make classification decisions
Having technology make classification decisions
Hybrid with technology and people
10The Problem is Over-Retention
© Doculabs, Inc. 2014
• Organizations keep non-required electronic content forever because:
1. Classifying content (to determine what to keep and what to purge) is manual and expensive
2. Content worth preserving is mixed with content that should be purged
3. Legal -- and others -- are afraid of wrongfully deleting materials (spoliation)
4. Additional storage is inexpensive, which makes it easy for corporations to buy more storage and defer addressing the problem
11Why Over-Retention is the Problem
© Doculabs, Inc. 2014
12
1. The problem– The sky is falling again
2. Break it into two problems– Day-forward versus historical content
3. How to address historical content– A defensible disposition methodology
4. Analysis and classification technology– Should you use it? Does it work?
5. Content assessment and disposition process– Approaches and results
Issues
© Doculabs, Inc. 2014
• Addressing day-forward information lifecycle management (ILM) is much easier to address than historical content
– Even though addressing it messes with employees’ day-to-day business activities• Day-forward: Initiate ILM practices on a “day-forward” basis first, so any new content
created or saved is assigned a disposition period– Disposition horizons should begin to influence behavior on where content begins to be
stored (as users discover that those materials saved in the “wrong” system will be purged)• Guidance: Provide employees with explicit guidance for the acceptable use of
available tools for dynamic content and their associated retention periods – For example, retain non-records for 3 years, retain official records per the retention
schedule• Historical: For historical content, analyze the feasibility of content analytics and
autoclassification– Recognize that cleaning up TBs of content can take years. So conduct the analysis in 2014,
begin the cleanup effort in earnest by 2015, and eliminate a large portion of dated content by 2018
13Recommendations for Day-Forward
© Doculabs, Inc. 2014
System/Repository Recommended Retention Period
Personal Network Drives (“P” drives)
• Provide each user with personal drive space of a limited size for their storage, for as long as the user is employed
Shared Network Drives(“G” drives)
• Make them read only (which means no network storage for collaboration; content will have to go into an ECM system)
• Exceptions include application or systems that need to use network storage
ECM System 1. Default for non records: retained for 3 years 2. Default for non records that have long-term value: retained for 7
years3. Official records: retained per the retention schedule
Social Community Sites • No documents stored in communities (only links to documents in the ECM system)
• Consider retention periods for non-document content (e.g. 3 years)
14Guidance Example for Day-Forward
© Doculabs, Inc. 2014
15
1. The problem– The sky is falling again
2. Break it into two problems– Day-forward versus historical content
3. How to address historical content– A defensible disposition methodology
4. Analysis and classification technology– Should you use it? Does it work?
5. Content assessment and disposition process– Approaches and results
Issues
© Doculabs, Inc. 2014
• You must satisfy 4 demands:1. Regulatory retention requirements2. Hold retention requirements3. Business retention requirements4. Cost impact of anything you do
• What you do has impact:1. What you do2. Effects of what you do
• You can do 2 things:1. Sort2. Dispose
• Your mission stated two ways:• Your mission is to satisfy your retention demands (1-3) while minimizing bad
cost impact to yourself (4)• Your mission is to maximize good cost impact (4) while satisfying your retention
requirements (1-3)
16The DD Methodology in a Nutshell
© Doculabs, Inc. 2014
It’s Based on Reasonableness
• To determine what “satisfy your retention demands” really means for you, use the Principle of Reasonableness and act In Good Faith– Courts do not ask, expect or necessarily reward organizations for
perfection. Courts do expect, however, that whatever information management tactics an organization undertakes are appropriate to how that particular entity is situated (size, financial resources, regulatory and litigation profile, etc.). (Jim McGann and Julie Colgan, “Implement a defensible deletion strategy to manage risk and control costs”, Inside Counsel)
17
© Doculabs, Inc. 2014
1. Defensible Disposition Policy– It’s your design specification, your business rules for DD, your decision tree– Specifies very clearly the objectives that your methodology will fulfill. It states clearly what
you mean by your retention requirements and what you mean by reasonable costs when you are trying to fulfill your retention requirements.
2. Technology Approach– For Sorting and Disposing– You must use technology – it’s not an option
3. Assessment (Sorting) Plan– What information and systems you’re assessing– Your processing rules (decision plan)– It will be flexible
4. Disposition Plan– Evaluate your assessment results using your DD Policy– Dispose (which ranges from keeping forever to deleting right now with many options in
between)– Refine your DD Policy (1) and continue as needed
18Your DD Methodology Has 4 Parts
© Doculabs, Inc. 2014
19Sidebar: A Simple Set of Rules
© Doculabs, Inc. 2014
20A Simple Set of Rules
© Doculabs, Inc. 2014
21A Simple Set of Rules
© Doculabs, Inc. 2014
But Even Simple Rules Need Clarification
1. What’s a Legal Hold?2. What are Records versus Non-Records?3. What are Non-Records – which are still important for business
purposes?4. What about Non-Records that are not business-related?5. Where do documents under Legal Hold fit? Are they Records,
Non-Records, or what?
22
© Doculabs, Inc. 2014
23R
isk
Manageability
Likely Discoverable Information Declared Records
Oth
er B
usin
ess-
rela
ted
Info
rmat
ion
(OB
RI)
• Think about your ESI (electronically stored information) in terms of its Risk, Value, and Manageability.
• For simplicity, let’s just use Risk and Manageability.
But Even Simple Rules Need Clarification
© Doculabs, Inc. 2014
Ris
k
Manageability
Electronically Stored
Information (ESI)
24
• For simplicity, let’s just use Risk and Manageability.
What is the Scope of Records Management?
© Doculabs, Inc. 2014
What is the Scope of Records Management? 25R
isk
Manageability
Electronically Stored
Information (ESI)
Likely Discoverable Information
• One major source of risk for ESI is its “Likely Discoverability”.
• While all ESI is perhaps “discoverable”, we can prioritize the more likely and harmful ESI.
© Doculabs, Inc. 2014
What is the Scope of Records Management? 26R
isk
Manageability
Electronically Stored
Information (ESI)
Likely Discoverable Information Declared Records
• Your RM program probably declares only a subset of your LDI and ESI as records – these are your most valuable, risky, and manageable electronic documents.
© Doculabs, Inc. 2014
Ris
k
Manageability
Physical Documents and Electronically
Stored Information (ESI)
Likely Discoverable Information Declared Records
Non
-bus
ines
s-re
late
d In
form
atio
n (N
BR
I)
Oth
er B
usin
ess-
rela
ted
Info
rmat
ion
(OB
RI)
27
1. But most of your content and documents are non-records -- and range from very low to very high risk and value.
2. Most of the ESI on your shared drives, hard drives, and in email is OBRI.
3. Some is NBRI.
4. It’s a mess.
What is the Scope of Records Management?
© Doculabs, Inc. 2014
Ris
k
Manageability
Electronically Stored Information
(ESI)
Likely Discoverable Information Declared Records
Oth
er B
usin
ess-
rela
ted
Info
rmat
ion
(OB
RI)
Non
-bus
ines
s In
form
atio
n (N
BI)
Too Narrow
Ris
k
Manageability
Electronically Stored Information
(ESI)
Likely Discoverable Information Declared Records
Oth
er B
usin
ess-
rela
ted
Info
rmat
ion
(OB
RI)
Non
-bus
ines
s In
form
atio
n (N
BI)
Too Wide
28Two Extreme Approaches to RM
© Doculabs, Inc. 2014
• A much more effective approach is to divide your ESI into three “Tiers”.
• Tier 1 denotes your declared records, specified by a Records Retention Schedule.
• Tier 2 denotes the OBRI that is important to retain for business reasons.
• Tier 3 denotes the OBRI that is not important to retain for business reason; it also denotes NBRI, which – by definition -- is not important to retain for business reasons.
Ris
k
Manageability
Electronically Stored Information
(ESI)
Likely Discoverable Information Declared Records
Oth
er B
usin
ess-
rela
ted
Info
rmat
ion
(OB
RI)
Non
-bus
ines
s In
form
atio
n (N
BI)
Tier 1Tier 2
Tier 3
29Use a Tiered Approach
© Doculabs, Inc. 2014
• Tiered Approach– Different types of physical documents
and ESI are handled differently1. Keep as records2. Keep as non-records, but move to
rigorous ECM/RIM system3. Keep on (better managed) shared
drives4. Don't worry about them; they
aren't worth it – keep or dispose according togeneral rules
Ris
k
Manageability
Electronically Stored Information
(ESI)
Likely Discoverable Information Declared Records
Oth
er B
usin
ess-
rela
ted
Info
rmat
ion
(OB
RI)
Non
-bus
ines
s In
form
atio
n (N
BI)
Tier 1
1
2
3
4
30
Tier 2
Tier 3
“Treat them Differently”
© Doculabs, Inc. 2014
31Now This Tree Makes Sense
© Doculabs, Inc. 2014
32
1. The problem– The sky is falling again
2. Break it into two problems– Day-forward versus historical content
3. How to address historical content– A defensible disposition methodology
4. Analysis and classification technology– Should you use it? Does it work?
5. Content assessment and disposition process– Approaches and results
Issues
© Doculabs, Inc. 2014
Classification Technique Classification Rate Pricing Total Cost
to Classify
Manual Classification 10 seconds per document
$35 / hr. $20 million
Auto Classification
(with 95% machine and 5% human classified, via offshore labor)
Less than 1 second per document
$.005 per document for machine processing and $5 / hr. for those that require manual classification
$2 million
• … if the technology works
• 50 TB = ~200 million documents (average of 250KB per document)
• The following table illustrates the time and effort required to classify 200 million documents
33There’s an Awesome Business Case
© Doculabs, Inc. 2014
• Many different kinds of technology vendors are addressing analysis, classification, and disposition– File Analytics, Content Analytics, Content Classification, ECM, E-discovery, Search,
Capture, DLP, Storage Management– Products, hosted solutions, service providers – Nuix, IBM/Stored IQ, HP/Autonomy, EMC/Kazeon, SAS, Kofax, Equivio, Rational
Enterprise, Recommind, Index Engines, and others
• Most have a sweet spot where they will succeed (and deliver ROI)– But it’s highly dependent…. on 8 factors or so– E.g., your business purposes, your ECM environment, your “information
architecture”, your document types and their complexity and volume, the value and risk of the documents, your success criteria, etc., etc., etc.
Analysis and Classification Technologies 34
© Doculabs, Inc. 2014
Before After
<server XXX, drive G:>Forecast summary_121008.doc
Record = noAge = 2.5 yearsDocument type= departmental forecastKeywords = forecast, 2008, draftStatus = deleteConfidence = 9.2 (out of 10)
1
2
3
4
5
6
1. Analyze the content and review the retention schedule
2. Establish classification rules and train the systems with examples
3. Crawlers and recognition engines evaluate the content and generate a classification
4. For content where a high machine confidence factor exists, content is automatically tagged and then staged for migration to the appropriate system for retention or disposition
5. For content with low confidence factors, documents are routed to clerical staff (onshore or offshore) for manual classification
6. The results of the manual identification are fed back into the automated algorithms to “teach” the systems better classification
Throughout the process, results and samples are routed to records management and legal professionals within the firm for validation and confirmation
Client Validation
Sidebar: How they Work 35
© Doculabs, Inc. 2014
36
1. The problem– The sky is falling again
2. Break it into two problems– Day-forward versus historical content
3. How to address historical content– A defensible disposition methodology
4. Analysis and classification technology– Should you use it? Does it work?
5. Content assessment and disposition process– Approaches and results
Issues
© Doculabs, Inc. 2014
• There are three categories of attributes that can be used to determine what a file is:
1. Environmental attributes around the file (e.g. file location, ownership)2. File attributes about the file (e.g. file type, age, author)3. Content attributes within the file (e.g. keywords, character strings, word
proximity, word density)
• Various techniques and technologies, along with business rules, can be used to determine what a file is, and whether it is eligible for disposition– E.g. a DOC file created over 5 years ago and not accessed for a year may be
purged– This type of purging could be done after giving users adequate notice (“move it
or lose it” or “hold” for 90 days, then delete)
Content Assessment Approaches 37
© Doculabs, Inc. 2014
Attribute Evaluation Technique Tool(s) Used Examples How Used
Ownership Access ControlsContent Analytics, Data Loss Prevention, Storage Management
Permissions within LDAP list people and infer department or function
Large collections of files can be assessed en masse based on access controls
1
Location File PathContent Analytics, Data Loss Prevention, Storage Management
G:/accounting/july2004/temp Stranded and orphaned locations are often easily eliminated
2
Environmental Attributes (around a file)
Content Assessment: Environmental Attributes 38
© Doculabs, Inc. 2014
Duplicate
Hash AlgorithmContent Analytics Exact duplicates Exact duplicates can be easily
eliminated3
File Type Extension or MIME type
Content Analytics .TMP, .MP3 To identify file types that should not exist in a corporate setting
4
Block ReadContent Analytics Near duplicates Near duplicates must be assessed in
the context of other attributes
Metadata Properties
Content Analytics Age To determine old materials, materials authored by individuals that have left the organization
5 Content Analytics Author Typically, these attributed must be combined with other attributed via a rule to take action
Content Analytics Security Profile (Confidential)
User filename properties to determine type
File Name Character Strings
Content Analytics GL-USDIST31_093098.xls Determine whether a file was system generated vs. human generated
6 Content Analytics FORMUB92_SMITH Documents that are based on a specific form number can easily be identified
Attribute Evaluation Technique Tool(s) Used Examples How Used
File Attributes (about a file)
Content Assessment: File Attributes 39
© Doculabs, Inc. 2014
Key Word Character Strings
Content Analytics; Classification Module
“Enron”, “Guarantee” To determine if a document is on Hold via a word list per the hold request
7
Character or Word Patterns
“Classification” <pattern matching>
Classification Module Word proximity To determine the category in which a document may fit8
Classification Module Word frequency
Content Analytics; Classification Module
“Privileged” Identification of PII
Content Analytics; DLP SS#, Credit card # Regular Expression(RegEX) lists; determined entities for hold, security, IP, PHI, PII, DLP
Attribute Evaluation Technique Tool(s) Used Examples How Used
Content Attributes (within a file)
Content Assessment: Content Attributes 40
© Doculabs, Inc. 2014
Preservation Findings
Unnecessary File Types(Executables, non-business pictures, movies, etc.) 13 to 15%
Duplicates 15 to 20%
Near Duplicates 9 to 30%
Risk Findings
Files with PII 10 to 16%
Files with Sample Keywords 3 to 5%
Operational Findings
Files 10 years or older 7 to 11%
Files accessed within the last 18 months 25 to 35%
Findings not mutually exclusive ( e.g. a duplicate file could also be aged)
Content Analytics: Assessment Results 41
© Doculabs, Inc. 2014
Technique Status % of Total Total
Analytics Unnecessary 20% 500 TB (.5 PB)
Classification Record 8% 200 TB (.2 PB)
Non-Record, Business Reference
28% 700 TB (.7 PB)
Evaluated, Staged for Disposition (2018)
44% 1,100 TB (1.1PB)
Total 100% 2,500 TB (2.5 PB)
Findings Enterprise Impact
Total that could be disposed 20% of 2.5 PB
Enterprise Implications .5 PB removed @ $5M per PB
Savings $2.5M per year in storage expense
Summary 42
© Doculabs, Inc. 2014
• Given the results, $2.5 million in storage expense could be saved annually on the disposition of historic content, resulting in $12.5 million over 5 years
• Going forward with newly created content, if similar techniques are applied, the saving grows to $34.8 million over 5 years
– The current cost projections are based on the historical content growth rate of 30% per year– The expected cost projections are based on a content growth rate of 26% per year
@$5,000,000 per PB 2014 2015 2016 2017 2018* Total
Current Storage (PB) 2.5 3.25 4.23 5.49 7.14Current Cost (Mill) $12.5 $16.3 $21.1 $27.5 $35.7 $113.0
Expected Storage (PB) 2 2.52 3.18 4.00 3.94Expected Cost (Mill) $10 $12.6 $15.9 $20.0 $19.7 $78.2Total Savings (Mill) $2.5 $3.65 $5.25 $7.46 $16.00 $34.8
*In 2018, the 1.1 PB or 44% of content from the 2014 historical content assessment can be disposed
Implications 43
© Doculabs, Inc. 2014
1. The business case for disposition is strong– Costs, risks, and benefits
2. Address Information governance in phases– Starting today, the program will take years to mature– Set expectations according
3. Probably address day-forward ILM before tackling historical content
4. Manual classification (alone) is not an option5. The technologies are immature and varied, but you can be
successful by matching the techniques and technologies to the kinds of files you want to target
Conclusions 44
Thank YouDoculabs, Inc.
(312) 433-7793
© Doculabs, Inc. 2014
Richard Medina
• Co-Founder and a Principal Consultant at Doculabs.
• In my 20+ years with Doculabs, I’ve consulted for organizations in a wide range of industries, including financial services, insurance, communications, utilities, and government.
• 312-953-9983
• blog: http://www.richardmedinadoculabs.com
• http://www.linkedin.com/in/richmedinadoculabs
• Twitter: @richarddoculabs
• www.doculabs.com
46