coderrank: creating gold standards

22
CoderRank: Creating Gold Standards Dr. Stuart W. Shulman Founder & CEO, Texifter @stuartwshulman “…a wealth of information creates a poverty of attention.” - Herbert Simon, 1971

Upload: stuart-shulman

Post on 20-Mar-2017

113 views

Category:

Software


2 download

TRANSCRIPT

Page 1: CoderRank: Creating Gold Standards

CoderRank: Creating Gold Standards

Dr. Stuart W. ShulmanFounder & CEO, Texifter

@stuartwshulman

“…a wealth of information creates a poverty of attention.” - Herbert Simon, 1971

Page 2: CoderRank: Creating Gold Standards

Text Classification

A 2500 year-old problem Plato argued it would be frustrating. It still is.

Page 3: CoderRank: Creating Gold Standards

Grimmer & Stewart “Text as Data” Political Analysis (2013)

Volume is a problem for scholarsCoders are expensive

Groups struggle to accurately label text at scaleValidation of both humans and machines is “essential”

Some models are easier to validate than othersAll models are wrong

Automated models enhance/amplify, but don’t replace humansThere is no one right way to do this

“Validate, validate, validate”“What should be avoided then, is the blind use

of any method without a validation step.”

Page 4: CoderRank: Creating Gold Standards

Free, Open-Source, Web-based Text Analytics Toolkit

Page 5: CoderRank: Creating Gold Standards

Original Software Kernel: Tools for Measurement

Page 6: CoderRank: Creating Gold Standards

Avoid Tennis Elbow

Items load to the screen and the coder hits the keystroke

Page 7: CoderRank: Creating Gold Standards

Keystroke Human Coding

Human coding can be distributed to individuals, groups & crowds

Page 8: CoderRank: Creating Gold Standards

Computer Science & National Science Foundation: Measure Everything

How fast?How reliable?

How accurate?

Page 9: CoderRank: Creating Gold Standards

Annotator Speed

Redacted

Page 10: CoderRank: Creating Gold Standards

Interrater Reliability: A Critical Measurement

Page 11: CoderRank: Creating Gold Standards

Adjudication

Page 12: CoderRank: Creating Gold Standards

CoderRank for enhanced machine-learning is our key innovation

Patent issued March 1, 2016

Page 13: CoderRank: Creating Gold Standards

CoderRank for Enhanced Machine-learning

CoderRank is to text analytics what PageRank was to search. Just as Google said not all web pages are created equal, Texifter argues that not all humans are created equal. When training machines, it is best to rely most on the humans most likely to create a valid observation. We proposed a unique way to rank humans on trust and knowledge vectors.

Page 14: CoderRank: Creating Gold Standards

ActiveLearning engines and human coding tools combine…

what humans do best… with what computers do best.

Humans and machines learning together

It is always good to keep humans “in-the-loop”

Page 15: CoderRank: Creating Gold Standards

Word sense disambiguation (relevance)

Page 16: CoderRank: Creating Gold Standards

Word sense disambiguation (relevance)

Page 17: CoderRank: Creating Gold Standards

Word sense disambiguation (relevance)

Page 18: CoderRank: Creating Gold Standards

Word sense disambiguation (relevance)

Page 19: CoderRank: Creating Gold Standards
Page 20: CoderRank: Creating Gold Standards

Human coding converts into machine classifiers

Accumulated human coding becomes training data via machine-learning

Page 21: CoderRank: Creating Gold Standards

Crowdsourcing accelerates the insight generation

Distribute coding for synchronous & asynchronous collaboration

Page 22: CoderRank: Creating Gold Standards

Thank-you for listening!

@[email protected]