text analytics past, present & future: an industry view
of 52
/52
Text Analytics Past, Present & Future: An Industry View Seth Grimes Alta Plana Corporation @sethgrimes June 5, 2014
Embed Size (px)
DESCRIPTION
Keynote presentation to JADT.org, June 5, 2014TRANSCRIPT
Text Analytics Past, Present & Future: An Industry View Seth
Grimes Alta Plana Corporation @sethgrimes June 5, 2014 Text
Analytics: An Industry View JADT June 5, 2014 2 Text Analytics: An
Industry View JADT June 5, 2014 3 Analytics is the systematic
application of algorithmic methods that derive and deliver
information, typically expressed quantitatively, whether in the
form of indicators, tables, visualizations, or models. Systematic
means formal & repeatable. Algorithmic contrasts with
heuristic. Text Analytics: An Industry View JADT June 5, 2014 4
Text analytics past: Pioneers Document input and processing
Knowledge handling is key Desk Set (1957): Computer engineer
Richard Sumner (Spencer Tracy) and television network librarian
Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.
Hans Peter Luhn A Business Intelligence System IBM Journal, October
1958 Text Analytics: An Industry View JADT June 5, 2014 6
Statistical information derived from word frequency and
distribution is used by the machine to compute a relative measure
of significance, first for individual words and then for sentences.
Sentences scoring highest in significance are extracted and printed
out to become the auto-abstract. H.P. Luhn, The Automatic Creation
of Literature Abstracts, IBM Journal, 1958. Text Analytics: An
Industry View JADT June 5, 2014 10 Pipelines and patterns IBMs
MedTAKMI, 1997-
http://www.research.ibm.com/trl/projects/textmining/index_e.htm
Text Analytics: An Industry View JADT June 5, 2014 11 Exhaustive
extraction An (old) Attensity example NLP to identify roles and
relationships, for a law-enforcement application . Text Analytics:
An Industry View JADT June 5, 2014 12 Language engineering GATE:
General Architecture for Text Engineering. http://gate.ac.uk/ Text
Analytics: An Industry View JADT June 5, 2014 13 Text analytics
present: Business, technology, applications, and solutions Text
Analytics: An Industry View JADT June 5, 2014 14 Organizations
embracing text analytics all report having an epiphany moment when
they suddenly knew more than before. -- Philip Russom, the Data
Warehousing Institute, 2007
http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-analytics.aspx
Text Analytics: An Industry View JADT June 5, 2014 15 Linguistics,
statistics, and semantics Text analytics (typically) involves
linguistic modelling, statistical characterization, learned
patterns, and semantic understanding of text-derived features Named
entities: people, companies, places, etc. Pattern-based features:
e-mail addresses, phone numbers, etc. Concepts: abstractions of
entities. Facts and relationships. Events. Concrete and abstract
attributes (e.g., expensive & comfortable) including
measure-value pairs. Subjectivity in the forms of opinions,
sentiments, and emotions: attitudinal data. applied to business
ends. Text Analytics: An Industry View JADT June 5, 2014 16 Sources
Its a truism that 80% of enterprise-relevant information originates
in unstructured form: E-mail and messages. Web pages, online news
& blogs, forum postings, and other social media. Contact-center
notes and transcripts. Surveys, feedback forms, warranty claims.
Scientific literature, books, legal documents. ... Non-text
unstructured content? Images Audio including speech Video Value
derives from patterns. Text Analytics: An Industry View JADT June
5, 2014 17 Value What do we do with text, whether online,
on-social, or in the enterprise? 1. Post/Publish, Manage, and
Archive. 2. Index and Search. 3. Categorize and Classify according
to metadata & contents. 4. Extract information and Analyze.
Text Analytics: An Industry View JADT June 5, 2014 18 Semantics,
analytics, and IR Text analytics generates semantics to bridge
search, BI, and applications, enabling next-generation information
systems. Search BI/Big Data Applica- tions Search based
applications (search + text + apps) Information access (search +
analytics) Synthesis (text + BI)/(big data) Text analytics (inner
circle) Semantic search (search + text) NextGen CRM, EFM, MR,
marketing, apps Text Analytics: An Industry View JADT June 5, 2014
19 Content, composites, connections 1 Text Analytics: An Industry
View JADT June 5, 2014 20 Content, Composites, Connections, 2
Content, composites, connections 2 Text Analytics: An Industry View
JADT June 5, 2014 21 Applications Text analytics has applications
in: Intelligence & law enforcement. Life sciences &
clinical medicine. Media & publishing including social-media
analysis and contextual advertizing. Competitive intelligence.
Voice of the Customer: CRM, product management & marketing.
Public administration & policy. Legal, tax & regulatory
(LTR) including compliance. Recruiting. Text Analytics: An Industry
View JADT June 5, 2014 22 Opinion, sentiment & emotion Text
Analytics: An Industry View JADT June 5, 2014 23 Sentiment analysis
A specialization, of relevance to: Brand/reputation management.
Customer experience management (CEM). Competitive intelligence.
Survey analysis (EFM = Enterprise Feedback Management). Market
research. Product design/quality. Trend spotting. Text Analytics:
An Industry View JADT June 5, 2014 24 Data exploration via
dashboards and workbenches. Text Analytics: An Industry View JADT
June 5, 2014 25 Text analytics present: The market Text Analytics:
An Industry View JADT June 5, 2014 26 http://altaplana.com/TA2014
Text Analytics: An Industry View JADT June 5, 2014 27 5% 6% 8% 9%
10% 11% 13% 14% 15% 16% 25% 27% 29% 33% 38% 38% 39% 0% 5% 10% 15%
20% 25% 30% 35% 40% 45% Military/national security/intelligence Law
enforcement Intellectual property/patent analysis Financial
services/capital markets Product/service design, quality assurance,
or warranty claims Other Insurance, risk management, or fraud
E-discovery Life sciences or clinical medicine Online commerce
including shopping, price intelligence, Content management or
publishing Customer /CRM Search, information access, or Question
Answering Competitive intelligence Brand/product/reputation
management Research (not listed) Voice of the Customer / Customer
Experience Management What are your primary applications where text
comes into play? Text Analytics: An Industry View JADT June 5, 2014
28 Voice of the Customer Text analytics is applied to improve
customer service and boost satisfaction and loyalty. Analyze
customer interactions and opinions E-mail, contact-center notes,
survey responses. Forum & blog posting and other social media.
to Address customer product & service issues. Improve quality.
Manage brand & reputation. Assessment of qualitative
information from text helps users Gain feedback on interactions.
Assess customer value. Understand root causes. Mine data for
measures such as churn likelihood. Text Analytics: An Industry View
JADT June 5, 2014 29 The commercial scene Text Analytics: An
Industry View JADT June 5, 2014 30 Online commerce Text analytics
is applied for marketing, search optimization, competitive
intelligence. Analyze social media and enterprise feedback to
understand the Voice of the Market: Opportunities Threats Trends
Categorize product and service offerings for on-site search and
faceted navigation and to enrich content delivery. Annotate pages
to enhance Web-search findability, ranking. Scrape competitor sites
for offers and pricing. Analyze social and news media for
competitive information. Text Analytics: An Industry View JADT June
5, 2014 31 E-Discovery and compliance Text analytics is applied for
compliance, fraud and risk, and e-discovery. Regulatory mandates
and corporate practices dictate Monitoring corporate communications
Managing electronic stored information for production in event of
litigation Sources include e-mail (!!), news, social media Risk
avoidance and fraud detection are key to effective decision making
Text analytics mines critical data from unstructured sources
Integrated text-transactional analytics provides rich insights Text
Analytics: An Industry View JADT June 5, 2014 32 16% 19% 20% 20%
22% 26% 31% 31% 32% 36% 37% 38% 42% 61% 0% 20% 40% 60% 80% Web-site
feedback social media not listed above chat employee surveys
contact-center notes or transcripts e-mail and correspondence
online reviews scientific or technical literature Facebook postings
on-line forums customer/market surveys comments on blogs and
articles news articles blogs (long form+micro) What textual
information are you analyzing or do you plan to analyze? 2014 2011
2009 Text Analytics: An Industry View JADT June 5, 2014 33 5% 5% 5%
5% 7% 9% 11% 11% 12% 12% 12% 13% 16% 19% 20% 20% 22% 26% 31% 31%
32% 36% 37% 38% 42% 43% 46% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
50% insurance claims or underwriting notes point-of-service notes
or transcripts video or animated images warranty
claims/documentation photographs or other graphical images crime,
legal, or judicial reports or evidentiary materials
field/intelligence reports speech or other audio patent/IP filings
other text messages/instant messages/SMS medical records Web-site
feedback social media not listed above chat employee surveys
contact-center notes or transcripts e-mail and correspondence
online reviews scientific or technical literature Facebook postings
on-line forums customer/market surveys comments on blogs and
articles news articles blogs (long form) including Tumblr Twitter,
Sina Weibo, or other microblogs What textual information are you
analyzing or do you plan to analyze? Text Analytics: An Industry
View JADT June 5, 2014 34 Current, 33% Current, 31% Current, 34%
Current, 47% Current, 51% Current, 56% Current, 47% Current, 54%
Current, 66% Expect, 21% Expect, 24% Expect, 23% Expect, 23%
Expect, 28% Expect, 25% Expect, 33% Expect, 28% Expect, 22% 0% 10%
20% 30% 40% 50% 60% 70% 80% 90% 100% Events Semantic annotations
Other entities phone numbers, part/product Metadata such as
document author, Concepts, that is, abstract groups of entities
Named entities people, companies, Relationships and/or facts
Sentiment, opinions, attitudes, emotions, Topics and themes Do you
currently need (or expect to need) to extract or analyze... Text
Analytics: An Industry View JADT June 5, 2014 35 The share rise in
users who selected Arabiccoincided with much of the civil unrest in
Middle Eastern countries.
http://bits.blogs.nytimes.com/2014/03/09/the
-languages-of-twitter-users/ Text Analytics: An Industry View JADT
June 5, 2014 36 10% 1% 16% 9% 36% 34% 2% 2% 18% 7% 4% 3% 13% 8% 7%
38% 3% 2% 3% 2% 5% 9% 17% 3% 28% 7% 17% 24% 2% 10% 11% 15% 8% 4%
17% 21% 3% 20% 4% 0% 1% 1% 2% 0% 0% 10% 20% 30% 40% 50% 60% Arabic
Bahasa Indonesia or Malay Chinese Dutch French German Greek Hindi,
Urdu, Bengali, Punjabi, or Italian Japanese Korean Polish
Portuguese Russian Scandinavian or Baltic Spanish Turkish or Turkic
Other African Other Arabic script (including Urdu, Other East Asian
Other European or Slavic/Cyrillic Other Current Within 2 years
Non-English language support? Text Analytics: An Industry View JADT
June 5, 2014 37 Software & platform options Text-analytics
options may be grouped in general classes. Installed text-analysis
application, whether desktop or server or deployed in-database.
Data mining workbench. Hosted. Programming tool. As-a-service, via
an application programming interface (API). Code library or
component of a business/vertical application, for instance for CRM,
e-discovery, search. Text analytics is frequently embedded in
search or other end-user applications. The slides that follow next
will present leading options in each category except Hosted Text
Analytics: An Industry View JADT June 5, 2014 38 22% 25% 28% 30%
32% 33% 33% 36% 37% 40% 41% 43% 44% 45% 53% 53% 54% 64% 0% 10% 20%
30% 40% 50% 60% 70% media monitoring/analysis interface hosted or
Web service (on-demand "API") option supports data fusion / unified
analytics sector adaptation (e.g., hospitality, insurance, retail,
health care, BI (business intelligence) integration ability to
create custom workflows or to create or change big data
capabilities, e.g., via Hadoop/MapReduce predictive-analytics
integration open source support for multiple languages sentiment
scoring "real time" capabilities low cost deep
sentiment/emotion/opinion/intent extraction document classification
broad information extraction capability ability to use specialized
dictionaries, taxonomies, ontologies, or ability to generate
categories or taxonomies What is important in a solution? 2014
(n=139) 2011 (n=136) 2009 (n=78) Text Analytics: An Industry View
JADT June 5, 2014 39 User decision criteria Primary considerations
include Adaptation or specialization: To a business or cultural
domain, language, information type (e.g., text, speech, images)
& source (e.g., Twitter, e-mail, online news). By-user
customization possibilities: For instance, via custom taxonomies,
rules, lexicons. Sentiment resolution: Aggregate, message, or
feature level. (What features? Topics, coreferenced entities?) What
sentiment? Valence & what else? Emotion? Intent? Outputs: E.g.,
annotated text, models, indicators, dashboards, exploratory data
interfaces. Usage mode: As-a-service (API), installed, or
hosted/cloud. Capacity: Volume, performance, throughput, latency.
Cost. Text Analytics: An Industry View JADT June 5, 2014 40 A few
French companies Text Analytics: An Industry View JADT June 5, 2014
41 Academic spin-offs People Pattern Text Analytics: An Industry
View JADT June 5, 2014 42 Text analytics future: Synthesis and
sensemaking. New York Times, September 8, 1957 Text Analytics: An
Industry View JADT June 5, 2014 44 Emotion in text Text Analytics:
An Industry View JADT June 5, 2014 45 Emotion and outcomes Text
Analytics: An Industry View JADT June 5, 2014 46 Audio including
speech. Images. Video. http://www.geekosystem.com/
facebook-face-recognition/ http://www.sciencedirect.com/science
/article/pii/S0167639312000118
http://flylib.com/books/en/2.495.1.54/1/ Beyond Text Text
Analytics: An Industry View JADT June 5, 2014 47 The world of big
data Machine data (e.g., logs, sensor outputs, clickstreams).
Actions, interactions, and transactions: geolocation and time.
Profiles: individual, demographic & behavioral. Text, audio,
images, and video. Facts and feelings. Text Analytics: An Industry
View JADT June 5, 2014 48 (Accessible) data everywhere Text
Analytics: An Industry View JADT June 5, 2014 49
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
A big data analytics architecture (example) Text Analytics: An
Industry View JADT June 5, 2014 50 http://searchuserinterfaces.com/
It is convenient to divide the entire information access process
into two main components: information retrieval through searching
and browsing, and analysis and synthesis of results. This broader
process is often referred to in the literature as sensemaking.
Sensemaking refers to an iterative process of formulating a
conceptual representation from of a large volume of information.
Marti Hearst, 2009 Sensemaking Text Analytics: An Industry View
JADT June 5, 2014 51
http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm
En route Text Analytics Past, Present & Future: An Industry
View Seth Grimes Alta Plana Corporation @sethgrimes June 5, 2014