autoclassificaiton - rules versus machine learning
TRANSCRIPT
![Page 1: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/1.jpg)
Jeff FriedCTOBA Insight
@jefffried#tbc2016
Rules-Based vs. Document-Based Bake-off
![Page 2: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/2.jpg)
![Page 3: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/3.jpg)
Focused on Search and
SharePoint since 2004
Longtime
Search Nerd
• CTO, BA Insight
• Senior PM, Microsoft
• VP, FAST
• SVP, LingoMotors
About Jeff Fried
Passionate About
• Search
• SharePoint
• Search-driven
applications
• Information Strategy
Blog:
BAinsight.com/blog
Technet Column
“A View from the
Crawlspace”
![Page 4: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/4.jpg)
About BA Insight
– Connectivity
– Applications -
– Classification -
– Analytics
![Page 5: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/5.jpg)
Metadata Drives Great User Experiences
Documents from many sourcesAll client or matter-relevant documents are integrated.
Rich MetaDataContent annotated automatically – concepts,
categories, citations, matters, clients, etc
Navigation ControlsExplore, Discover, Drill-down
![Page 6: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/6.jpg)
Manual Tagging is impractical
and remarkably inconsistent
![Page 7: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/7.jpg)
Automation
Called: AutoClassification, AutoTagging, Metadata Generation, Text Analytics, ….
![Page 8: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/8.jpg)
8
![Page 9: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/9.jpg)
Complicators
–
–
–
–
–
![Page 10: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/10.jpg)
![Page 11: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/11.jpg)
11
Common Techniques across Applications
![Page 12: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/12.jpg)
-
-
-
-
-
-
-
-
-
-
-
-
![Page 13: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/13.jpg)
Rules-based Approach
Enhanced Content
Enriched with
Metadata and
Content Types
Search Visualization Workflow
![Page 14: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/14.jpg)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammalspython cold no no no reptilessalmon cold no no yes fisheswhale warm yes no yes mammalsfrog cold no no sometimes amphibianskomodo cold no no no reptilesbat warm yes yes no mammalspigeon warm no yes no birdscat warm yes no no mammalsleopard shark cold yes no yes fishesturtle cold no no sometimes reptilespenguin warm no no sometimes birdsporcupine warm yes no no mammalseel cold no no yes fishessalamander cold no no sometimes amphibiansgila monster cold no no no reptilesplatypus warm no no no mammalsowl warm no yes no birdsdolphin warm yes no yes mammalseagle warm no yes no birds
Rule-based Classifier (Example)
R1: (Give Birth = no) (Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) ReptilesR5: (Live in Water = sometimes) Amphibians
![Page 15: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/15.jpg)
Example Rules Engine UI
![Page 16: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/16.jpg)
Examples of Rules
Boolean
• “IT” OR “Information Technology” or “MIS”
• (“Expert” OR “Witness”) NOT “police”
• “New York” AND “environmental policy”
• *work
• "legal" -briefs
• "Legal" NEAR(5) "issue“
Property-based
• filetype:docx
• title:"2029 L.P" or title:2030
• footer="BA Insight Confidential" or
footer:proprietary or footer:BA*
Overriding/changing Linguistics
• NOSTEM(“illumination")
• CASE("prerequisites")
• SOUNDLIKE("prerech")Regular expressions
• title:REGEX([0-4])
• REGEX("\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))")
![Page 17: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/17.jpg)
Controlling scores & thresholds
![Page 18: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/18.jpg)
Taxonomy Management is often included with Auto-Classification Tools
![Page 19: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/19.jpg)
Where do you get Taxonomies?
![Page 20: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/20.jpg)
20
Semantics! Machine Learning! AI!
![Page 21: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/21.jpg)
![Page 22: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/22.jpg)
Key Concepts
![Page 23: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/23.jpg)
False positives vs. false negativesLook at the impact of each in your context
![Page 24: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/24.jpg)
Machine Learning Approach
![Page 25: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/25.jpg)
Example: identify people as good or bad from their appearance
![Page 26: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/26.jpg)
Decision Tree Classifier
![Page 27: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/27.jpg)
Building an accurate classifier
![Page 28: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/28.jpg)
–
Training and Test Data
28
![Page 29: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/29.jpg)
Choosing the algorithm
–
–
–
![Page 30: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/30.jpg)
+ Easy to get started
+ Transparent and debuggable
+ Easily controlled (when # rules not too large)
- Need taxonomies
- Rule maintenance effort
- Harder to cover domain fully and to switch domains
+ Don’t need taxonomies
+ Improves without manual maintenance
+ Handles new data types/domains more easily
- Need a training set
- Opaque, usually can’t debug
- Can’t specify or control specific examples
![Page 31: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/31.jpg)
What would you use for
![Page 32: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/32.jpg)
Case StudyContent Identification and Movement
![Page 33: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/33.jpg)
Benchmarks
![Page 34: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/34.jpg)
Large scale example
![Page 35: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/35.jpg)
Combinations of Techniques usually work better
![Page 36: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/36.jpg)
Examples of hybrid configurations
![Page 37: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/37.jpg)
Example: clustering combined with rules
![Page 38: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/38.jpg)
![Page 39: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/39.jpg)
carrot2
Open Source & Platform packages offer an easy way to play
![Page 40: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/40.jpg)
How to get started
Setup up a metadata framework
– keep it simple
Develop or acquire managed vocabularies for
critical elements
Start with rule-driven automation
Test out ML-based techniques as you grow
![Page 41: AutoClassificaiton - Rules versus Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022030402/589a7b1f1a28ab0e2f8b4955/html5/thumbnails/41.jpg)
41
www.BAinsight.com
@jefffried