jeroen kleinhoven (treparel), turn big content into business insights - data donderdag
DESCRIPTION
Presentatie van Jeroen Kleinhoven, CEO van Treparel over Big Content en Big data tijdens Data Donderdag op 4 september 2014.TRANSCRIPT
Treparel Delftechpark 26 2628 XH Delft
The Netherlands www.treparel.com
Turn Big Content
in to Business Insights
Jeroen Kleinhoven CEO
September 4, 2014
Gartner Hype Cycle, Emerging Technologie July 2014: Where are Content Analy?cs and Big Data?
Treparel KMX – All Rights Reserved 2014 www.treparel.com 2
Mainstream adoption • Content Analytics is 2 to 5 years away. • Big Data is 5 to 10 years away.
About Treparel
• Company
– HQ in DelG (The Netherlands)
– R&D in ecosystem: DelG University of Technology, Univ. of Paris and Sao Paulo
– Founded by a Data Scien?st, a Visualiza?on Prof and Search/Machine Learning engineers. Managed by Gartner VP since 2013
• Treparel is a solu?on provider:
– Rooted in Patent Analy?cs & Visualiza?on, Evolved in to Big Content Solu?ons
– Big Content and KMX: content type agnos?c Search, Text Analy?cs & Visualiza?on
– KMX (Knowledge Mapping eXplora?on) provides fast and accurate insights in Big Content (email, patents, literature, web, social) for making be_er informed decisions
• 3 types of clients:
– End Users: Client/Server applica?on (Download, Install, Run)
– Partners: Client/Server + Developer API (Download, Install, Run + Integrate)
– Independent: Developers, Researchers: Developer API + C/S… OpenSource (tbd)
Treparel KMX – All Rights Reserved 2014 3 www.treparel.com
KMX -‐ extract, analyze & visualize pa_erns in large content collec?ons
Treparel KMX – All Rights Reserved 2014 5 www.treparel.com
1. Landscaping / Clustering: Examine a content cluster and extract entities or references to people, products, locations, and other concepts
2. Categorization/Classification: Group similar information together
Value from Big Content in Publishing
• Examples of added value for Publishers : 1. Content dashboarding: offering Business Intelligence
style Search, Repor?ng, Analy?cs and Visualiza?on 3. Explora?on of content that will not show up in a
standard search query 4. Interac?ve Content Naviga?on As well as: 4. Ar?cle recommenda?ons, Smart collec?ons, Group
tagging
Treparel KMX – All rights reserved 2014 6
1. Content Dashboard: ease-‐of-‐use naviga?on in large sets of content (Report – Search – Analyse – Visualize)
Page 7 |
Ease of Use access to Research, Patents, Business News, Legislation
Treparel KMX – All Rights Reserved 2014 7
Recorded Demo: h_p://treparel.com/next-‐gen-‐ip-‐
rd-‐dashboard/
2. Enhance users ability to visually explore relevant (hidden) content -‐ 2
Page 8 |
Interactive taxonomy with multiple coupled views incl. integrated visualizations and search in large sets of documents
Treparel KMX – All Rights Reserved 2014 8
2. Enhance users ability to visually explore content (example: search in research on Ebola)
Page 9 |
Zoomlevel 1
Zoomlevel 2 Zoomlevel 3
Clustering: Automatic annotation and zooming on large sets of documents Treparel KMX – All Rights Reserved 2014 9
3. Explora?on through classifica?on of content (that will not show up in a standard search query)
Publishing Database
10.000 documents
1.000 documents
10 documents
Ranking
Queries
Filtering
Present Final Results
Content Dashboard
Ranking Filtering
Ranking Filtering
Treparel KMX – All rights reserved 2014
Key Take Aways
Page 11 | Treparel KMX – All Rights Reserved 2014 11
Treparel is interested to partner to empower Content Rich Search-Driven solutions.
• Mail me your details at [email protected] when you’re interested in:
1. Getting a 30 days free trial 2. Test driving the KMX API in your content application
or 3. To be part of the pre launch group for… KMX OpenSource.
Treparel KMX – All rights reserved 2014 12 www.treparel.com
APPENDIX
How to posi?on KMX in Big Content Analy?cs
KMX & Developer API
Content Dashboard
Developer Partnerships
Treparel KMX – All Rights Reserved 2014 13
Key Solutions: 1. Intellectual Property 2. eDiscovery 3. Publishing: Law, IP & Science 4. Risk & Compliance 5. Fraud & Forensics
Today’s topic
Visualiza?on
Clustering Classifica?on
Text Preprocessing and Indexing
Acquire documents
Present Results
Taxonomies, Ontologies
Seman?c Analysis
KMX Text Analy?cs Applica?on overview
KMX unique func?ons: • Extract concepts in context using clustering and classifica?on of documents
• Use classifica?on to create ranked lists and to tag subsets
• Support of binary and mul?-‐class Classifica?on
• Enterprise edi?on (server/cloud) & Professional edi?on (desktop)
• Integra?on with other applica?ons through KMX API
Treparel KMX – All rights reserved 2014 www.treparel.com 14
Query & Search Tools
Benefits: Get quick insights through automated visual clusters with annota?ons to enhance the discovery process 1. Analyze the clusters and the rela?onships in the data 2. Explore outliers in the data 3. Find documents of interest
What it does: A visualiza?on of clusters where the documents are displayed as points and the distance between them shows their similarity. What KMX delivers: Use KMX to do: 1. Perform text preprocessing (stemming/tokeniza?on etc) 2. Calculate between all documents a similarity measure 3. Calculate visualiza?on (landscape) with automa?c annota?on 4. Create the visualiza?on
– As a sta?c image – Or provide interac?on where the user can zoom in/out with
support for adap?ve annota?on
Clustering: User Unsupervised Analy?cs
Treparel KMX – All rights reserved 2014 15
Benefits: Finding fast, accurate and precise small result sets and enabling trend repor?ng and Aler?ng by reusing predefined categoriza?on models. 1. Obtain a ranked list of the most relevant documents 2. Separate the important documents from the irrelevant documents
(noise)
How it works: A list of the relevant documents defined from a users perspec?ve. What KMX delivers Use KMX to do: 1. Tag (label) a small number of relevant and irrelevant documents
– Use search to iden?fy documents that need to be tagged – Perform manual tagging – Select documents interac?ve from the visualiza?on
2. Create a Classifier (categorizer) using the tagged documents 3. Automa?cally perform the classifica?on on all documents 4. Obtain the important documents as ranked high and the irrelevant
documents which are ranked low
Classifica?on: User Supervised Analy?cs
Treparel KMX – All rights reserved 2014 16
KMX API: Embed Advanced Text Analy?cs func?ons
Clustering Provides users unsupervised analytics and automatically identifies inherent themes or information clusters. Through a dynamic hierarchical topic view into search results it enables users to quickly focus on annotated subjects rather than scrolling through long results lists.
Classification Supervised analytics to help users automatically categorize large sets of documents. The Classification process can use a small number of documents sets for learn-by-example categorization. By sorting the content of documents by topic, relevancy and keywords users can apply their own models or rules for classification.
Visualization Advanced visual knowledge
discovery for displaying, exporting and sharing data
results, ranked document lists, labeled and enriched data or
interactive visualizations.
Terms can be extracted to use in building thesauri or
taxonomies.
KMX API XML-RPC and REST (JSON)
Python Pickle protocol
Server: User / Tenant mgt User objects mgt (datasets,
work spaces, classifiers, stop lists,.)
Databases: Oracle, PostgreSQL
Client Application:
Native Windows (for creating Analysis pipelines)
Using QT for GUI Using OpenGL for
visualizations
Industry Thought Leaders about KMX
“Treparel KMX’s visualiza(on capabili(es around its auto-‐categoriza>on and clustering offer immediate insight into unstructured data sets and appear to be adaptable and customizable to customer needs. Its approach to auto-‐categoriza>on u>lizes sta>s>cal principles and machine learning that require significantly less training and tuning on the part of customers than other approaches.” David Schubmehl, IDC
“As we acquire more and more informa>on, we need tools that will guide us through the data maze. Analysts need tools to help them understand pa;erns and define clusters. Users need to explore data to uncover rela>onships from scaNered sources. Treparel’s KMX serves both these needs with its ability to cluster and categorize collec(ons of data with a high degree of accuracy, and its interac>ve visualiza>on tools that enable explora>on of large data sets.” Sue Feldman, Synthexis.com (author: The Answer Machine.
Treparel KMX – All Rights Reserved 2014 18