text analytics a tool for taxonomy development tom reamy chief knowledge architect kaps group...
TRANSCRIPT
![Page 1: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/1.jpg)
Text AnalyticsA Tool for Taxonomy Development
Tom ReamyChief Knowledge Architect
KAPS Group
Program Chair – Text Analytics World
Knowledge Architecture Professional Services
http://www.kapsgroup.com
![Page 2: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/2.jpg)
2
Agenda
Introduction
Project: Update ACM taxonomy – after 12+ years
Information Environment
Text Mining / Text Analytics Multiple Methods / Reports
Conclusion
![Page 3: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/3.jpg)
3
Introduction: KAPS Group
Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted & emotion taxonomies, natural categories
Services:– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics, Social Media development, consulting– Text Analytics Quick Start – Audit, Evaluation, Pilot
Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics
Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.
Program Chair – Text Analytics World – March 29-April 1 - SF Presentations, Articles, White Papers – www.kapsgroup.com Current – Book – Text Analytics: How to Conquer Information Overload,
Get Real Value from Social Media, and Add Smart Text to Big Data
![Page 4: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/4.jpg)
4
Introduction: Approach
Is Automatic Taxonomy Development Here Yet? Not Yet But it is getting closer Hybrid:
– Taxonomists, SME’s, database analysts, text analysts– Text Mining software – basic text analysis – power – Text analytics software – brains
New taxonomy terms & structure– Old = indexing, authors adding tags & keywords– New = auto-tagging, applications
![Page 5: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/5.jpg)
5
Information Environment
Existing Taxonomy: Computing Classification System Content:
– Database export of Guide to the Computing Literature bibliographic records (.txt; approximately 7GB in 58 files.)
– Statistical distribution of CCS categories across the Digital Library and Guide to Computing Literature (Excel; 4 files)
– ACM Digital Library full text files (PDFs and XML metadata, including CCS categories; approximately 170GB in 240,000 files)
– Ralston Encyclopedia of Computer Science (PDFs and HTML of each article with XML metadata, including CCS categories; approximately 350MB in 1,850 files)
![Page 6: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/6.jpg)
Text Analytics in Taxonomy DevelopmentCase Study – Multiple Methods
Text Mining - terms in documents – frequency, date, source, etc.– Text Preparation – Create multiple filters
Quality – important terms, co-occurring terms Time savings – only feasible way to scan documents Clustering – suggested categories, chunking for editors
– Clustering within clusters - explore Entity Extraction – people, organizations, programming
languages, hardware/devices, etc. Joint Work Sessions – interactive exploration
6
![Page 7: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/7.jpg)
Case Study – Taxonomy Development
7
![Page 8: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/8.jpg)
8
![Page 9: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/9.jpg)
Case Study – Taxonomy Development
9
![Page 10: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/10.jpg)
Case Study – Taxonomy Development
10
![Page 11: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/11.jpg)
Case Study – Taxonomy Development
11
![Page 12: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/12.jpg)
12
Multiple Sets of Reports
Keyword Frequency– First Pass – 3,026 – Total – 508, 941 (Get from Big Database)– Sub-Totals
• Year Pre-1998, By Year, By 5 year blocks• Map to other variables – Journals, Authors – basis for
communities Keywords in Abstract/Title Cluster analysis of keyword-abstract-title Search Terms in keyword-abstract-title
![Page 13: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/13.jpg)
13
Entity Extraction – Company, Internet, Organization, Title
![Page 14: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/14.jpg)
14
Multiple Methods - Reports
Spreadsheets – static reports Database query reports
– Create multiple slices, views, filters
Working reports – eliminate more noise words Multiple mapping – extractions, author tags &keywords Map – frequency in abstracts, titles, articles Search logs – terms and phrases
Date ranges – trend reports – per terms, new words
![Page 15: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/15.jpg)
15
![Page 16: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/16.jpg)
16
![Page 17: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/17.jpg)
17
Conclusions
Auto-taxonomy not here - Yet Scale requires semi-automated solution Human effort – initial design, text preparation
– Now would add more auto-categorization Human effort – analysis & refinement – of queries, text mining, and
taxonomy Simple taxonomies are better – part of information ecosystem
– Lower levels of terms – into auto-tagging rules Early 2015: New Book:
– Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data
– Title might be shorter but it will be cover all you need to know
![Page 18: Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070413/5697bfd91a28abf838caf77c/html5/thumbnails/18.jpg)
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com