the good, the bad, and the ugly of defensible disposition
Post on 09-May-2015
698 Views
Preview:
DESCRIPTION
TRANSCRIPT
#AIIM14 #AIIM14
#AIIM14
The Good, the Bad, and the Ugly of Defensible Disposi7on
Richard Medina Co-‐founder and Principal Consultant, Doculabs | doculabs.com
rmedina@doculabs.com | richardmedinadoculabs.com @richarddoculabs
#AIIM14
Issues 1. The problem
§ The sky is falling again
2. Break it into two problems § Day-‐forward versus historical content
3. How to address historical content § A defensible disposi2on methodology
4. Analysis and classificaLon technology § Should you use it? Does it work?
5. Doing the Assessment § Approaches and results
#AIIM14
Issues 1. The problem
§ The sky is falling again 2. Break it into two problems
§ Day-‐forward versus historical content
3. How to address historical content § A defensible disposi2on methodology
4. Analysis and classificaLon technology § Should you use it? Does it work?
5. Doing the Assessment § Approaches and results
#AIIM14
The Problem is Over-‐Reten7on OrganizaLons have been over-‐retaining electronic informaLon and failing to dispose of it in a legally defensible manner when business and law will allow
Retaining everything forever
Disposing of everything immediately
Having employees make classificaLon decisions
Having technology make classificaLon decisions
Hybrid with technology and people
#AIIM14
Why Over-‐Reten7on is the Problem
§ Organiza2ons keep non-‐required electronic content forever because: 1. Classifying content (to determine what to keep and what to purge) is
manual and expensive
2. Content worth preserving is mixed with content that should be purged
3. Legal -‐-‐ and others -‐-‐ are afraid of wrongfully deleLng materials (spoliaLon)
4. AddiLonal storage is inexpensive, which makes it easy for corporaLons to buy more storage and defer addressing the problem
#AIIM14
Issues 1. The problem
§ The sky is falling again
2. Break it into two problems § Day-‐forward versus historical content
3. How to address historical content § A defensible disposi2on methodology
4. Analysis and classificaLon technology § Should you use it? Does it work?
5. Doing the Assessment § Approaches and results
#AIIM14
Recommenda7ons for Day-‐forward § Addressing day-‐forward informa7on lifecycle management (ILM) is much easier to address than
historical content § Even though addressing it messes with employees’ day-‐to-‐day business acLviLes
§ Day-‐forward: Ini2ate ILM prac2ces on a “day-‐forward” basis first, so any new content created or saved is assigned a disposi2on period § DisposiLon horizons should begin to influence behavior on where content begins to be stored (as
users discover that those materials saved in the “wrong” system will be purged) § Guidance: Provide employees with explicit guidance for the acceptable use of available tools for
dynamic content and their associated reten2on periods § For example, retain non-‐records for 3 years, retain official records per the retenLon schedule
§ Historical: For historical content, analyze the feasibility of content analy2cs and autoclassifica2on § Recognize that cleaning up TBs of content can take years. So conduct the analysis in 2014, begin
the cleanup effort in earnest by 2015, and eliminate a large porLon of dated content by 2016
#AIIM14
Guidance Example for Day-‐forward
System/Repository Recommended Reten7on Period
Personal Network Drives (“P” drives)
• Provide each user with personal drive space of a limited size for their storage, for as long as the user is employed
Shared Network Drives (“G” drives)
• Make them read only (which means no network storage for collabora7on; content will have to go into an ECM system)
• Excep7ons include applica7on or systems that need to use network storage
ECM System 1. Default for non records: retained for 3 years 2. Default for non records that have long-‐term value: retained for 7 years 3. Official records: retained per the reten7on schedule
Social Community Sites
• No documents stored in communi7es (only links to documents in the ECM system) • Consider reten7on periods for non-‐document content (e.g. 3 years)
#AIIM14
Issues 1. The problem
§ The sky is falling again
2. Break it into two problems § Day-‐forward versus historical content
3. How to address historical content § A defensible disposi;on methodology
4. Analysis and classificaLon technology § Should you use it? Does it work?
5. Doing the Assessment § Approaches and results
#AIIM14
What’s the Purpose of Your DD Methodology?
§ You must sa7sfy 4 demands: 1. Regulatory retenLon requirements 2. Hold retenLon requirements 3. Business retenLon requirements 4. Cost impact of anything you do
§ What you do has impact: 1. What you do 2. Effects of what you do
§ You can do 2 things: 1. Sort 2. Dispose
§ Your mission stated two ways: § Your mission is to saLsfy your retenLon demands (1-‐3) while minimizing bad cost impact to
yourself (4) § Your mission is to maximize good cost impact (4) while saLsfying your retenLon requirements
(1-‐3)
#AIIM14
It’s Based on Reasonableness § To determine what “sa2sfy your reten2on
demands” really means for you, use the Principle of Reasonableness and act In Good Faith § Courts do not ask, expect or necessarily reward organizaLons for
perfecLon. Courts do expect, however, that whatever informaLon management tacLcs an organizaLon undertakes are appropriate to how that parLcular enLty is situated (size, financial resources, regulatory and liLgaLon profile, etc.). (Jim McGann and Julie Colgan, “Implement a defensible dele2on strategy to manage risk and control costs”, Inside Counsel)
#AIIM14
Your DD Methodology Has 4 Parts 1. Defensible Disposi7on Policy
§ It’s your design specificaLon, your business rules for DD, your decision tree
§ Specifies very clearly the objecLves that your methodology will fulfill. It states clearly what you mean by your retenLon requirements and what you mean by reasonable costs when you are trying to fulfill your retenLon requirements.
2. Technology Approach § For SorLng and Disposing § You must use technology – it’s not an opLon
#AIIM14
Your DD Methodology Has 4 Parts 3. Assessment (Sor7ng) Plan
§ Do the legwork and look at what’s there § What informaLon and systems you’re assessing § Your processing rules (decision plan) § It will be flexible
4. Disposi7on Plan § Evaluate your assessment results using your DD Policy § Dispose (which ranges from keeping forever to deleLng right now with many opLons in between)
§ Refine your DD Policy (1) and conLnue as needed
#AIIM14
Issues 1. The problem
§ The sky is falling again
2. Break it into two problems § Day-‐forward versus historical content
3. How to address historical content § A defensible disposi2on methodology
4. Analysis and classifica7on technology § Should you use it? Does it work?
5. Doing the Assessment § Approaches and results
#AIIM14
There’s an Awesome Business Case
Classifica7on Technique Classifica7on Rate Pricing Total Cost
to Classify
Manual ClassificaLon 10 seconds per document
$35 / hr. $20 million
Auto ClassificaLon (with 95% machine and 5% human classified, via offshore labor)
Less than 1 second per document
$.005 per document for machine processing and $5 / hr. for those that require manual classificaLon
$2 million
§ … if the technology works § 50 TB = ~200 million documents (average of 250KB per document) § The following table illustrates the Lme and effort required to classify 200 million documents
#AIIM14
Analysis and Classifica7on Technologies
§ Many different kinds of technology vendors are addressing analysis, classificaLon, and disposiLon § File AnalyLcs, Content AnalyLcs, Content ClassificaLon, ECM, E-‐discovery,
Search, Capture, DLP, Storage Management § Products, hosted soluLons, service providers § IBM/Stored IQ, HP/Autonomy, EMC Kazeon, SAS, Kofax, Equivio, RaLonal
RetenLon, Recommind, Index Engines, and others § Most have a sweet spot where they will succeed
§ But it’s highly dependent…. on just about every factor you can think of § E.g., your business purposes, your ECM environment, your “informaLon
architecture”, your document types and their complexity and volume, the value and risk of the documents, your success criteria, etc., etc., etc.
#AIIM14
Sidebar: How Many of them Work
Before Acer
<server XXX, drive G:> Forecast summary_121008.doc
Record = no Age = 2.5 years Document type= departmental forecast Keywords = forecast, 2008, drav Status = delete
Confidence = 9.2 (out of 10)
1. Analyze the content and review the retenLon schedule 2. Establish classificaLon rules and train the systems with examples
3. Crawlers and recogniLon engines evaluate the content and generate a classificaLon 4. For content where a high machine confidence factor exists, content is automaLcally tagged
and then staged for migraLon to the appropriate system or disposiLon
5. For content with low confidence factors, documents are routed to clerical staff (onshore or offshore) for manual classificaLon
6. The results of the manual idenLficaLon are fed back into the automated algorithms to “teach” the systems bewer classificaLon
Throughout the process, results and samples are routed to records management and legal professionals within the firm for validaLon and confirmaLon
1 2
3 4
5
6 Client
Valida7on
#AIIM14
Issues 1. The problem
§ The sky is falling again
2. Break it into two problems § Day-‐forward versus historical content
3. How to address historical content § A defensible disposi2on methodology
4. Analysis and classificaLon technology § Should you use it? Does it work?
5. Doing the Assessment § Approaches and results
#AIIM14
Assessment Approaches § There are three categories of awributes that can be used to
determine what a file is: 1. Environmental awributes around the file (e.g., file locaLon, ownership) 2. File awributes about the file (e.g., file type, age, author) 3. Content awributes within the file (e.g., keywords, character strings, word
proximity, word density) § Various techniques and technologies, along with business rules,
can be used to determine what a file is, and whether it is eligible for disposiLon § E.g., a DOC file created over 5 years ago and not accessed for a year may be
purged § This type of purging could be done aver giving users adequate noLce (“move it
or lose it” or “hold” for 90 days, then delete)
#AIIM14
#1: Environmental Ahributes
Ahribute Evalua7on Technique Tool(s) Used Examples How Used
Ownership Access Controls
Content Analy7cs, Data Loss Preven7on, Storage Management
Permissions within LDAP list people and infer department or func7on
Large collec7ons of files can be assessed en masse based on access controls 1
Loca7on File Path
Content Analy7cs, Data Loss Preven7on, Storage Management
G:/accoun7ng/july2004/temp Stranded and orphaned loca7ons are ocen easily eliminated 2
Environmental Ahributes (around a file)
#AIIM14
#2: File Ahributes
Duplicate Hash Algorithm Content AnalyLcs Exact duplicates Exact duplicates can be easily eliminated
3
File Type Extension or MIME type Content AnalyLcs .TMP, .MP3 To idenLfy file types that should not exist in a corporate seyng
4
Block Read Content AnalyLcs Near duplicates Near duplicates must be assessed in the context of other awributes
Metadata ProperLes
Content AnalyLcs Age To determine old materials, materials authored by individuals that have lev the organizaLon
5 Content AnalyLcs Author Typically, these awributed must be conLnued with other awributed via a rule to take acLon
Content AnalyLcs Security Profile (ConfidenLal) User filename properLes to determine type
File Name Character Strings
Content AnalyLcs GL-‐USDIST31_093098.xls Determine whether a file was system generated vs. human generated
6 Content AnalyLcs FORMUB92_SMITH Documents that are based on a specific form number can easily be idenLfied
Ahribute Evalua7on Technique Tool(s) Used Examples How Used
File Ahributes (about a file)
#AIIM14
#3: Content Ahributes
Key Word Character Strings
Content AnalyLcs; ClassificaLon Module
“Enron”, “Guarantee” To determine if a document is on Hold via a word list per the hold request
7
Character or Word Paherns
“ClassificaLon” <pawern matching>
ClassificaLon Module Word proximity To determine the category in which a document may fit 8
ClassificaLon Module Word frequency
Content AnalyLcs; ClassificaLon Module
“Privileged” IdenLficaLon of PII
Content AnalyLcs; DLP SS#, Credit card # Regular Expression(RegEX) lists; determined enLLes for hold, security, IP, PHI, PII, DLP
Ahribute Evalua7on Technique Tool(s) Used Examples How Used
Content Ahributes (within a file)
#AIIM14
Assessment Results Preserva7on Findings
Unnecessary File Types (Executables, non-‐business pictures, movies, etc.) 13 to 15%
Duplicates 15 to 20%
Near Duplicates 9 to 30%
Risk Findings
Files with PII 10 to 16%
Files with Sample Keywords 3 to 5%
Opera7onal Findings
Files 10 years or older 7 to 11%
Files accessed within the last 18 months 25 to 35%
Findings not mutually exclusive ( i.e., a duplicate file could also be aged)
#AIIM14
Assessment Summary
Findings Enterprise Impact
Total that could be disposed 20% of 2.5 PB
Enterprise ImplicaLons .5 PB removed @ $5,000,000 per PB
Savings $2,500,000 per year in storage expense
Technique Status % of Total Total
AnalyLcs Unnecessary 20% 500 TB (.5 PB)
ClassificaLon Record 8% 200 TB (.2 PB)
Non-‐Record, Business Reference
28% 700 TB (.7 PB)
Evaluated, Staged for DisposiLon (2016)
44% 1,100 TB (1.1PB)
Total 100% 2,500 TB (2.5 PB)
#AIIM14
Assessment Implica7ons § Given the results, $2.5 million in storage expense could be saved annually on the disposiLon of
historic content, resulLng in $12.5 million over 5 years § Going forward with newly created content, if similar techniques are applied, the saving grows to
$34.8 million over 5 years § The current cost projecLons are based on the historical content growth rate of 30% per year § The expected cost projecLons are based on a content growth rate of 26% per year
@$5,000,000 per PB 2012 2013 2014 2015 2016* Total
Current Storage (PB) 2.5 3.25 4.23 5.49 7.14
Current Cost (Mill) $12.5 $16.3 $21.1 $27.5 $35.7 $113.0
Expected Storage (PB) 2 2.52 3.18 4.00 3.94 Expected Cost (Mill) $10 $12.6 $15.9 $20.0 $19.7 $78.2
Total Savings (Mill) $2.5 $3.65 $5.25 $7.46 $16.00 $34.8
*In 2016, the 1.1 PB or 44% of content from the 2012 historical content assessment can be disposed
#AIIM14
Conclusions 1. The business case for disposiLon is strong
§ Costs, risks, and benefits 2. InformaLon governance must be addressed in phases
§ StarLng today, the program will take years to mature § Set expectaLons according
3. You should probably address day-‐forward ILM before tackling historical content
4. Recognize that manual classificaLon is not an opLon 5. The technologies are immature and varied, but you can be successful by
matching the techniques and technologies to the kinds of files you want to target
6. Your DD methodology has 4 main parts: DD Policy, Technology Approach, Assessment Plan, Disposi2on Plan
#AIIM14 #AIIM14
#AIIM14
Thank You Richard Medina
Co-‐founder and Principal Consultant, Doculabs | doculabs.com rmedina@doculabs.com | richardmedinadoculabs.com
@richarddoculabs
www.aiim.org/infochaos�
Do YOU understand the business challenge of the next 10 years?
This ebook from AIIM President John Mancini explains.
top related