text analytics world future directions of text analytics

24
Text Analytics World Future Directions of Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: sancho

Post on 25-Feb-2016

70 views

Category:

Documents


2 download

DESCRIPTION

Text Analytics World Future Directions of Text Analytics. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction: Current State of Text Analytics Survey Roadblocks for Text Analytics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Text Analytics World  Future Directions of Text Analytics

Text Analytics World Future Directions of Text Analytics

Tom ReamyChief Knowledge Architect

KAPS GroupKnowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Text Analytics World  Future Directions of Text Analytics

2

Agenda Introduction:

– Current State of Text Analytics– Survey

Roadblocks for Text Analytics– Complexity and Customization

Fast and Slow (Thinking) Text Analytics– Building Text Analytics Brains

New Methods for Text Analytics– Lessons from Watson– Some Wild New Ideas and Approaches

Questions

Page 3: Text Analytics World  Future Directions of Text Analytics

3

Introduction: KAPS Group Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted taxonomies, complexity theory, natural

categories, emotion taxonomies Services:

– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Partners – SAS, Smart Logic, Expert Systems, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics

Projects – Portals, taxonomy, Text analytics – news, expertise location, information strategy, text analytics evaluation, Quick Start in Text A.

Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.

Presentations, Articles, White Papers – www.kapsgroup.com

Page 4: Text Analytics World  Future Directions of Text Analytics

4

Introduction:What is Text Analytics? Text Mining – NLP, statistical, predictive, machine learning Semantic Technology – ontology, fact extraction Extraction – entities – known and unknown, concepts, events

– Catalogs with variants, rule based Sentiment Analysis

– Objects and phrases – statistics & rules – Positive and Negative Auto-categorization

– Training sets, Terms, Semantic Networks– Rules: Boolean - AND, OR, NOT– Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE– Disambiguation - Identification of objects, events, context– Build rules based, not simply Bag of Individual Words

Page 5: Text Analytics World  Future Directions of Text Analytics

5

Text Analytics WorldCurrent State of Text Analytics History – academic research, focus on NLP Inxight –out of Zerox Parc

– Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data

Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends– Half from 2008 are gone - Lucky ones got bought

Early applications – News aggregation and Enterprise Search – Second Wave = shift to sentiment analysis Enterprise search down, taxonomy up –need for metadata – not

great results from either – 10 years of effort for what? Text Analytics is growing – But

Page 6: Text Analytics World  Future Directions of Text Analytics

6

Text Analytics WorldCurrent State of Text Analytics Current Market: 2012 – exceed $1 Bil for text analytics (10% of total

Analytics) Growing 20% a year Search is 33% of total market Other major areas:

– Sentiment and Social Media Analysis, Customer Intelligence– Business Intelligence, Range of text based applications

Fragmented market place – full platform, low level, specialty– Embedded in content management, search, No clear leader.

Page 7: Text Analytics World  Future Directions of Text Analytics

7

Text Analytics WorldCurrent State of Text Analytics: Vendor Space Taxonomy Management – SchemaLogic, Pool Party From Taxonomy to Text Analytics

– Data Harmony, Multi-Tes Extraction and Analytics

– Linguamatics (Pharma), Temis, whole range of companies Business Intelligence – Clear Forest, Inxight Sentiment Analysis – Attensity, Lexalytics, Clarabridge Open Source – GATE Stand alone text analytics platforms – IBM, SAS, SAP, Smart

Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching

Embedded in Content Management, Search– Autonomy, FAST, Endeca, Exalead, etc.

Page 8: Text Analytics World  Future Directions of Text Analytics

8

Future Directions: Survey Results

28% just getting started, 11% not yet What factors are holding back adoption of TA?

– Lack of clarity about value of TA – 23.4%– Lack of knowledge about TA – 17.0%– Lack of senior management buy-in - 8.5%– Don’t believe TA has enough business value -6.4%

Other factors– Financial Constraints – 14.9%– Other priorities more important – 12.8%

Lack of articulated strategic vision – by vendors, consultants, advocates, etc.

Page 9: Text Analytics World  Future Directions of Text Analytics

9

Text Analytics WorldPrimary Obstacle: Complexity Usability of software is one element More important is difficulty of models:

– Conceptual and document models General need – more structure but also more flexible kinds

of structure and interactions More modules and more ways of combining or interacting –

IBM – select best answer but others – Competitive – learn and evolve – Feedback!– Cooperative – join together to form higher level structures

Page 10: Text Analytics World  Future Directions of Text Analytics

10

Text Analytics WorldPrimary Obstacle: Complexity: Partial Solutions Build complex semantic networks – basic concepts – good for

demo, gets a start, but very complex to build on Library of taxonomies – but all need major customization and

often are not a good starting point – different types of taxonomies – index vs. categorization

Customization – Text Analytics– heavily context dependent– Content, Questions, Taxonomy-Ontology– Level of specificity – Telecommunications – Specialized vocabularies, acronyms– Specialized relationships – conceptual and organizational– How overcome?

Page 11: Text Analytics World  Future Directions of Text Analytics

11

Text Analytics World Thinking Fast and Slow – Daniel Kahneman System 1 and System 2 – Daniel Kahneman System 1 – fast and automatic – little conscious control Represents categories as prototypes – stereotypes

– Norms for immediate detection of anomalies – distinguish the surprising from the normal

– fast detection of simple differences, detect hostility in a voice, find best chess move (if a master)

– Priming / Anchoring – susceptible to systemic errors • Temperature Example

– Biased to believe and confirm– Focuses on existing evidence (ignores missing – WYSIATI)

.

Page 12: Text Analytics World  Future Directions of Text Analytics

12

Text Analytics World Thinking Fast and Slow System 2 – Complex, effortful judgments and calculations

– System 2 is the only one that can follow rules, compare objects on several attributes, and make deliberate choices

– Understand complex sentences– Check the validity of a complex logical argument– Focus attention – can make people blind to all else – Invisible Gorilla

Similar to traditional dichotomies – Tacit – Explicit, etc Basic Design – System 1 is basic to most experiences, and

System 2 takes over when things get difficult – conscious control

Text Analysis and Text Mining / Auto-Cat and TA Cat

Page 13: Text Analytics World  Future Directions of Text Analytics

13

Text Analytics WorldSystem 1 & 2 – and Text Analytics Approaches “Automatic Categorization” – System 1 prototypes

– Limited value -- only works in simple environments– Shallow categories with large differences – Not open to conscious control

System 2 – categories – complex, minute differences, deep categories

Together:– Choose one or other for some contexts– Combine both – need to develop new kinds of categories

and/or new ways to combine?

Page 14: Text Analytics World  Future Directions of Text Analytics

14

Text Analytics World Text Mining and Text Analytics Text Analytics and Big Data enrich each other

– Data tells you what people did, TA tells you why Text Analytics – pre-processing for TM

– Discover additional structure in unstructured text– Behavior Prediction – adding depth in individual documents – New variables for Predictive Analytics, Social Media Analytics– New dimensions – 90% of information, 50% using Twitter analysis

Text Mining for TA– Semi-automated taxonomy development – Apply data methods, predictive analytics to unstructured text– New Models – Watson ensemble methods, reasoning apps

Extraction – smarter extraction – sections of documents, Boolean, advanced rules – drug names, adverse events – major mention

Page 15: Text Analytics World  Future Directions of Text Analytics

15

Text Analytics WorldIntegration of Text and Data Analytics Expertise Location: Case Study: Data and Text Data Sources:

– HR Information: Geography, Title-Grade, years of experience, education, projects worked on, hours logged, etc.

Text Sources:– Document authored (major and minor authors) – data and/or text– Documents associated (teams, themes) – categorized to a taxonomy– Experience description – extract concepts, entities

Self-reported expertise – requires normalization, quality control Complex judgments:

– Faceted application– Ensemble methods – combine evaluations

Page 16: Text Analytics World  Future Directions of Text Analytics

16

Text Analytics World : Building on the PlatformExpertise Analysis Expertise Characterization for individuals, communities,

documents, and sets of documents Experts prefer lower, subordinate levels

– Novice & General – high and basic level Experts language structure is different

– Focus on procedures over content Applications:

– Business & Customer intelligence – add expertise to sentiment

– Deeper research into communities, customers– Expertise location- Generate automatic expertise

characterization based on documents

Page 17: Text Analytics World  Future Directions of Text Analytics

17

Text Analytics WorldNew Approaches – Applied Watson Key concept is that multiple approaches are required – and

a way to combine them – confidence score Aim = 85% accuracy of 50% of questions (Ken Jennings –

92% of 62% Used a combination of structure and text search Massive parallelism, many experts, pervasive confidence

estimation, integration of shallow and deep knowledge Key step – fast filtering to get to top 100 (System 1) Then – intense analysis to evaluate (System 2) – multiple

scoring

Page 18: Text Analytics World  Future Directions of Text Analytics

18

Text Analytics WorldNew Approaches – Applied Watson Multiple sources – taxonomies, ontologies, etc. Special modules – temporal and spatial reasoning –

anomalies Taxonomic, Geospatial, Temporal, Source Reliability,

Gender, Name Consistency, Relational, Passage Support, Theory Consistency, etc.

Merge answer scores before ranking 3 Years, 20 researchers of all types Got to 70% of 70% - in two hours More difficult answers / more complete questions

Page 19: Text Analytics World  Future Directions of Text Analytics

19

Text Analytics WorldNew Approaches: Adding Structure to Content Contexts – whole range of types of context

– Document types-purpose, Textual complexity, formats Categorization by page, sections (text markers) or even

sentence or phrase – Key – remember what the last page was– [Key– documents are not unstructured – they have a variety

of structures] Use generic components – like the level of generality of

terms or concepts (general and context specific)

Page 20: Text Analytics World  Future Directions of Text Analytics

20

Text Analytics WorldNew Approaches Idea – build a higher level language – like tutoring systems

– More complex primitives IDEA – Crowd sourcing – to evolve better structures – how

design to avoid design by committee – other side of wisdom of crowds

Design TA Game – 1,000’s to play and evolve Partner with MOOC - example – better essay evaluation –

avoid gaming the system – lots of multi-syllabic words – nonsense– Also to enhance software / modules

Page 21: Text Analytics World  Future Directions of Text Analytics

21

New Directions in Text AnalyticsConclusions Text Analytics is growing – but Big obstacles remain

– Strategic Vision of text analytics in the enterprise, applications– Concrete and quick application to drive acceptance– Software still too complex, un-integrated

New models are being developedCognitive science – System 1 and 2, AI – brains that learn Watson like integrated approaches

Overcome complexity – modules (System 1/ Standard) with new ways of integrating (System 2 / Customized) – smarter and easier

Page 22: Text Analytics World  Future Directions of Text Analytics

Questions? Tom Reamy

[email protected] Group

http://www.kapsgroup.comUpcoming: Taxonomy Boot Camp – KMWorld -DC, Nov 3-6

Workshop on Text AnalyticsText Analytics World – San Francisco, March 17-19

Page 23: Text Analytics World  Future Directions of Text Analytics

23

Future Directions for Text AnalyticsSocial Media: Beyond Simple Sentiment Analysis of Conversations- Higher level context

– Techniques: self-revelation, humor, sharing of secrets, establishment of informal agreements, private language

– Detect relationships among speakers and changes over time– Strength of social ties, informal hierarchies

Combination with other techniques– Expertise Analysis – plus Influencers– Quality of communication (strength of social ties, extent of private

language, amount and nature of epistemic emotions – confusion+)– Experiments - Pronoun Analysis – personality types– Analysis of phrases, multiple contexts – conditionals, oblique

Page 24: Text Analytics World  Future Directions of Text Analytics

24

Introduction: Personal Deep Background: History of Ideas – dissertation – Models of

Historical Knowledge Artificial Intelligence research at Stanford AI Lab Programming – designed two computer games, educational

software Started an Education Software company, CTO

– Height of California recession Information Architect – Chiron/Novartis, Schwab Intranet

– Importance of metadata, taxonomy, search – Verity From technology to semantics, usability From library science to cognitive science 2002 – started consulting company