automated text coding: humans and machines learning...

31
Automated Text Coding: Humans and Machines Learning Together Dr. Stuart W. Shulman Founder & CEO, Texifter @stuartwshulman “…a wealth of information creates a poverty of attention.” - Herbert Simon, 1971

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Automated Text Coding:Humans and Machines Learning Together

Dr. Stuart W. ShulmanFounder & CEO, Texifter

@stuartwshulman

“…a wealth of information creates a poverty of attention.”- Herbert Simon, 1971

Page 2: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Text Classification

A 2500 year-old problem

Plato argued it would be frustrating. It still is.

Page 3: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is
Page 4: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is
Page 5: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Fall 1999

Page 6: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Grimmer & Stewart “Text as Data” Political Analysis (2013)

Volume is a problem for scholarsCoders are expensive

Groups struggle to accurately label text at scaleValidation of both humans and machines is “essential”

Some models are easier to validate than othersAll models are wrong

Automated models enhance/amplify, but don’t replace humansThere is no one right way to do this

“Validate, validate, validate”“What should be avoided then, is the blind use of any method without a validation step.”

Page 7: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Free, Open-Source, Web-based Text Analytics Toolkit

Page 8: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Original Software Kernel: Tools for Measurement

Page 9: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Avoid Tennis Elbow

Items load to the screen and the coder hits the keystroke

Page 10: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Keystroke Human Coding

Human coding can be distributed to individuals, groups & crowds

Page 11: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Computer Science & NSF Influence: Measure Everything

How fast?How reliable?

How accurate?

Page 12: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Annotator Speed

Redacted

Page 13: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Interrater Reliability: A Critical Measurement

Page 14: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Adjudication

Page 15: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

CoderRank for enhanced machine-learning is our key innovation

Patent issued March 1, 2016

Page 16: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

CoderRank for Enhanced Machine-learning

CoderRank is to text analytics what PageRank was to search. Just as Google said not all web pages are created equal, Texifter argues that not all humans are created equal. When training machines, it is best to rely most on the humans most likely to create a valid observation. We proposed a unique way to rank humans on trust and knowledge vectors.

Page 17: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

The Five Pillars of Text Analytics

SearchFiltering

De-duplication and ClusteringHuman Coding

Machine-Learning

Page 18: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Pillar #1: Search

Page 19: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Pillar #2: Filters

Page 20: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Pillar #3: Deduplication & clustering

Page 21: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Pillar #4: Human coding (a.k.a. labeling or tagging)

Page 22: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Pillar#5: Machine-learning

Page 23: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

ActiveLearning engines and human coding tools combine…

what humans do best… with what computers do best

Humans and machines learning together

It is always good to keep humans “in-the-loop”

Page 24: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Word sense disambiguation (relevance)

Page 25: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Word sense disambiguation (relevance)

Page 26: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Word sense disambiguation (relevance)

Page 27: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Word sense disambiguation (relevance)

Page 28: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is
Page 29: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Human coding can be converted into machine classifiers

Accumulated human coding becomes training data via machine-learning

Page 30: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Crowdsourcing accelerates the insight generation process through machine-learning

Distributed for synchronous & asynchronous collaboration

Page 31: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is

Thank-you for listening!

@[email protected]