predictive analytics: email management...

32
Predictive Analytics: Email Management Magic?

Upload: others

Post on 29-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Predictive Analytics: Email Management Magic?

August 18, 2014

Thank you for being here today

Presenters:

Jason R. Baron

Of Counsel, Drinker Biddle & Reath

Avi Elkoni

Consultant, Equivio

Neil Etheridge

VP, Product Management, Recommind Inc.

Mark Olson

National Analytics Manager, Dataskill

Sandra E. Serkes

President & CEO, Valora Technologies Inc.

Why All The Fuss About Big Data?

How to deal? Email Management as a Case in Point

Valora’s Analytics & Data Mining Solutions for Email Management

Predictive Analytics: Email Management Magic?

What is “Big Data” Anyway?

4,079,442,984,960 bytes of data since 1/1/2014 That’s 25 GB per day!

Sandy’s Digital Footprint

EMC Digital Footprint Calculator Hartford Union HS Digital Footprint Calculator

• Any amount of data that is overwhelming

• Any data whose contents, source or purpose are unknown

– Cannot answer why or whether you should have it

• Data that can harm the organization (liability)

• Data that can help the organization (asset)

Does your organization know what exactly your

data says?

Why talk about this now?

• Big Data is the “talk of the town”

• Clients, investors, media, employees, management, government

• Increasing data breach events keep the conversation alive

• People now expect that organizations are routinely collecting and mining data on their behavior, purchases, searches, posts, etc.

• They are starting to demand ethical, compliant & competent management of that data

• Costs to perform large-scale, complex analytics & hosting have come down enough to be financially viable for most organizations

608,087,870 – total number of records containing sensitive personal information involved in security breaches in the United States since January 2005

Source: Privacy Rights Clearinghouse, June 2013

Facebook Tinkers With Users’ Emotions in News Feed Experiment

6/24/2014

Google removes search results in wake of EU privacy ruling 6/26/2014

Enterprise Email Management is a very good Case In Point

Universal Issue

Involves several key IG problems:

• Storage/hosting

• Content analysis & classification

• Context – correspondence, notification & record, date/time/file signatures, transmission & attachments, custodianship, etc.

• Administration, management & maintenance

Elements of Backfile and Day Forward records management

ESI is generally easier & lower cost to tackle than paper files

Because of Context, EEM is a hot button issue with real budgets available

• Investor & media attention

• Customer concerns

• Risk & compliance danger zone

How a computer classifies an email

with data mining (analytics)

Implied matter: Passaro (34-6788)

Author

Author Validation & Contact Info

Matter indicator & validation

Doc Type & Implied attachment range

Additional Info Analytics Determines

• What DocType is this?

– An email with an attachment

• Who created this? Who is the author?

– Stuart Trumbull, Partner at DCH

• Who is receiving this? Why?

– Roberta Halstrom, paralegal

– Work instruction/direction

• What is the Author-Recipient relationship?

– Supervisor-subordinate

• What are important words, patterns & concepts?

– “please file”

– “Motion in Limine”

– “Passaro matter”

– “34-6788”

• How is attachment related?

– Author match

– Passaro match

– Key Motion content

• What else is known about this party?

– Wrote 14 emails that day

– 94% of “Passaro” mentions include him as auth/recip/cc

– 7 instances as Pleading Author w/ Passaro matter(s)

– Halstom & assistant/associate on 48% of Trumbull+pasaro content

• What other context can be inferred?

– Tuesday = 5/13/14

– 15 date-correlated instances of 5/13/14 with Passaro docket

– Tone is neutral-friendly, professionally appropriate

• What presents better visually?

– Topics over time

– Relationship between Trumbull & others

– Passaro matter against other matters

Drawbacks to Classification without Context (Classification alone)

• Treats all document contents the same

• Misses the notion of document context (who, what, where, when and how)

• Email particularly has important attributes as a communication mechanism

• Makes retain/delete decisions on content only

• Oversimplifies inherent or explicit decisioning hierarchies

• Duplicative content vs. “best” content

• Assumes all content instances are equal

• Assumes Backfile/Batch methodology only

• Weak solution for Day Forward document creation or intake

• Does not inform creation or approval of email content

• Focused on cleanup, rather than asset management

• Assumes no future value of past email or content, ignores business value

• Does not prioritize results for future use

• Relegates to IT tool, rather than IG strategy

• Unable to adapt to ongoing maintenance or contextual changes

• Assumes technology is independent of policy creation & enforcement

• Ignores evolving capabilities to define policy and ensure compliance

• Ignores PII, PHI & other content sensitivities

• Assumes all content instances are equal

• Exists outside of data visualization

• Missed opportunity for information presentation, forward-use asset management

Predictive Analytics: Email Management Magic? Presented by Avi Elkoni - Consultant, Equivio

Comments indicate actual usage is lower • “We are in early experiments”

• “Very preliminary exploration”

• “We continue to experiment with predictive coding”

Use of predictive coding

76% reported using/exploring predictive coding or other technologies

Is your company using or exploring “predictive coding” or other technologies for preservation,

collection or production of ESI?

Yes

No

The Magic of Predictive coding

• AKA supervised classification or TAR

• Trainable software

• Form of machine learning

• Widely used in industry/academia since 1970s

• Now well established in e-discovery

• Used for ECA, culling, prioritized review, QA

• Court approved

• Useful for sorting through volumes of documents including emails!

Predictive coding for IG - scenarios

Disposition scenarios

• Email (and other Records) retention

• Enables implementation of retention schedules

• Efficient / controlled / consistent

• Replaces “Trusted custodian” or “Do nothing”

• Legacy data remediation

• System migration

• Data hygiene

Predictive coding for IG - scenarios

Detection scenarios

• Pre-litigation

• E-discovery

• Investigations. Regulatory / Internal

• “Early warning” systems

Predictive coding for IG - challenges

• Handling low richness

• Training multiple categories

• Quantification of risk

• Federated architecture for centralized IG management

• Add-on approach to optimize use of legacy archiving and RIM systems

Sedona guidelines say:

Defensibility

The requirement is for

reasonableness, not perfection.

“We have to quantify the

imperfection.”

Corporate management says:

Defensibility

• Transparency Standard / repeatable / auditable

• Validation Standard step in the process

• Quantification

• “Right” statistics for IG environment

• ROI

• Risk retention trade-off

Risk of under-retention vs. Cost & risk of over-retention

Case study

18 Equivio proprietary and confidential

Drinker POC

Client International bank

Categories 2 retention, 3 junk

Days invested 3 days

Defensible deletion 6%

Projected deletion 45%

Retention success rate 95%

Neil Etheridge | VP, Product Management

EMAIL MANAGEMENT

2002 2004 2006 2008 2010 2012 2014

CONCEPT SEARCH

CATEGORIZATION

PHRASE EXTRACTION

SMART FILTERS

DYNAMIC JOINS

ENTITY EXTRATION

FOOTER DETECTION

LANGUAGE DETECTION PREDICTIVE CODING BEST MATCHES

HYPERGRAPH

EASY UPLOAD

USE ANALYTICS

AND MACHINE

LEARNING TO

BETTER CONNECT

PEOPLE AND

INFORMATION

A HISTORY OF INNOVATION

AUTOMATIC FILING

o Great for matter-centric filing

o Filling location suggested to the

user

o No training = minimal set up time

INTELLIGENT EMAIL FILING –

APPROACHES

SUGGESTED FILING

o Great for topic-centric filing

o 100% adoption - no end users

o System is “trained” then

deployed, sampling required

CASE STUDY - U.S.

DEPARTMENT OF

ENERGY

Category Driven Filing Needed

Previous efforts with manual

filing had been unsuccessful for

applying categories

Automated Process Implemented

Recommind predictive analytics

categorize emails with no burden

on end users

Higher Accuracy Attained Recommind exceeded 70% project

accuracy target with high 80’s to

low 90’s across all categories

CASE STUDY –

DAVIES WARD

Client-Matter Classification Needed

The firm needed a way to assign

C/M’s to all emails, beyond what

DMS could offer

Decisiv Email Implemented

Decisiv suggests likely C/M’s,

with user confirmation or editing

High Accuracy Maintained

The combination of Recommind’s

analytics and user intervention

yields strong results

Natural Language Processing for Email Management Presented by Mark Olson

August 18, 2014

Thank you for being here today

Presenter:

Mark R. Olson Director, North American Watson Analytics, DataSkill, Inc.

• Business Drivers for email management and many solutions

• Email Classification as the key activity and Dataskill’s application for

Email Management

• About Dataskill

Agenda

Summary of typical client conversations about email management

Business Drivers ‘First to scream’ principle

All drivers for implementing an email management solution are ultimately financial, but they can still be classified depending on “who screams first”: • Technical. IT complains that they are running out of storage, want to buy

more and more of ‘disk’. Email system slows down due to a large volume of email data.

• Business. Line of business personnel cannot find or access relevant emails easily. Has to rely on personal knowledge which leads to continuity issues when staff leaves.

• Compliance. General Counsel requires legal hold management and defensible disposal capabilities to protect the organization and help make responding to legal enquiries more efficient and cost effective.

Dataskill’s Natural Language Approach Address the problem not the symptom

The Real problem is, the information that is required to classify emails is contained with in the body of the email, and in Natural Language.

“Mark and Denise are going to Paul and Sue’s for dinner.

Mark and Denise are going to Sue Paul over dinner.”

Normal classification and ILG solutions can not understand the difference in

these two sentences.

Dataskill Acumi for Email Can.

ACUMI for Email Management

Because Acumi uses Natural Language Processing, Language Dictionaries, and Legal Industry Annotators it can do a better job of classifying mail and Documents.

And if that is not good enough, you can train it to be better.

What it does REALLY WELL

We’ll now open it up for questions

Questions

Thank You