better machine learning through data -...

Better Machine Learning Through Data

Saleema AmershiMachine Teaching GroupMicrosoft Research

August 14, 2016

Making better sense of data.

Better data makes better machine learning.

Data + Algorithm = Model

Data ModelAlgorithm


Data ModelAlgorithm

Machine learning research often takes the data as given.

When Algorithms Discriminate – The New York Times, 2015

Big Data’s all-too-human failings – Reuters, 2016

Artificial Intelligence’s White Guy Problem – The New York Times, 2016

Mapping Crime – Or Stirring Hate?– Financial Times, 2014


Better data makes better machine learning.

Most influence practitioners have on machine learning is through data.


Data ModelAlgorithm

In research,

data is often taken as given.

Algorithm


ModelData

In practice, the

algorithm is often taken as given.

In research,

data is often taken as given.

Algorithm


ModelData

“Data scientists, according to interviews and

expert estimates, spend 50 percent to 80 percent

of their time mired in this more mundane labor

of collecting and preparing unruly digital data.”

- New York Times, 2014

In practice, the

algorithm is often taken as given.

Algorithm


ModelData

[Patel et al., CHI 2008]

Algorithm


ModelData

Algorithm


Data Model

Iterations are driven by evaluating models on data.

Algorithm


ModelData

Iterations are driven by evaluating models on data.

In practice, most effort is spent crafting input data.

Algorithm ModelData

Machine learning in theory

Algorithm

Collect &

Label

Samples

Create

Features

Evaluate

Results

Machine learning in practice

AlgorithmEvaluate

Results

Collect &

Label

Samples

Create

Features

AlgorithmEvaluate

Results

Collect &

Label

Samples

Create

Features

Structured Labeling [CHI 2014]

Feature Insight [VAST 2015]

ModelTracker[CHI 2015, VAST 2016]

Evaluate

Results

Create

FeaturesAlgorithm



Collect &

Label

Samples


Traditional Labeling

Pre-defined high-level categories.

Cat

Not Cat

Is this a Cat?

0

0



Cat

Not Cat

Is this a Cat?

1

0



Cat

Not Cat

Is this a Cat?

2

0



Cat

Not Cat

Is this a Cat?

2

1



Cat

Not Cat

Is this a Cat?

2

2



Cat

Not Cat

Is this a Cat?

3

2



Cat

Not Cat

Is this a Cat?

4

2



Cat

Not Cat

Is this a Cat?

4

3



Cat

Not Cat

Is this a Cat?

4

3

Does not support

concept evolution

(refining the target

concept as data is

observed).

How common is concept evolution?

Nine machine learning experts labeled the same 200 pages in two sessions (4 weeks apart).

Average consistency 81.7% (SD=6.8%)

6 out of 9 people’s labels changed significantly (via Chi Square test of symmetry)

25

50

75

100

Co

nsi

sten

cy

Participants

Proposed Solution – Structured Labeling

Enable people to explicitly organize their concept via grouping and tagging within a traditional labeling scheme.



Cat

Not Cat

Is this a Cat?

2

2

Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

2

2

Definitely Not Cat

Cat Poster

1

User provided tags on groups aid recall.

Grouping within high-level categories.

Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

2

2

Definitely Not Cat

Cat Poster

1


Grouping within high-level categories.Blogs

2

Lions

2

Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

2

2

Definitely Not Cat


Grouping within high-level categories.Blogs

2

Lions

2

Cat Poster

2

Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

2

2

Definitely Not Cat


Grouping within high-level categories.

Lions

2

Blogs

2

Cat Poster

2

Can move, merge

and split groups

as desired.

Assisted Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

2

2

Definitely Not Cat

Lions

2

Blogs

2

Cat Poster

2

Grouping recommendations to improve label consistency.

Assisted Structured Labeling

Cat

Not Cat

Is this a Cat?

2

Definitely Cat

2

Definitely Not Cat

2

2

Definitely Not Cat

Lions

2

Blogs

2

Cat Poster

2

Grouping recommendations to improve label consistency.

Similar items to help users make decisions.

Findings

People revised labels significantly more with structured labeling

People labeled more consistently

People preferred it over traditional labeling

Label Consistency

(X2=6.53, df=2, p < .038)(X2=20.19, df=2, p < .001)

Mean # Groups

(X2=12, df=2, p < .002)

# Revisions

Structured Labeling Summary

Current tools do not support concept evolution.

Structured labeling helps people refine their concepts by surfacing labeling decisions and aiding recall.

People used structured labeling when it was available and labeled more consistently.

Structure contains additional information (e.g., group related features, group related accuracy, decisions made…)

Evaluate

ResultsAlgorithm


Collect &

Label

Samples

Create

Features


Structured labeling improves consistency

[CHI 2014]

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.” [Domingos, CACM 2012]

…yet, little guidance or best practices exist.

How do people come up with features?

Look for features used in related domains.

Use intuition or domain knowledge.

Apply automated techniques

Feature ideation – Think of and experiment with custom features (a “black art”).

Proposed Solution – Feature Insight

Support compare and contrast of data.

What makes a cat a cat?



Comparing pairs vs sets?

Comparing Pairs vs Sets

Sets may help people think of generalizable features.

NegativePositivePositives Negatives

vs



Comparing pairs vs sets?

Raw data vs visual summaries?

Looking at Raw Data vs. Visual Summaries

Visual summaries may reveal relevant characteristics and hide irrelevant noise.

vs

Raw Data Visual Summary

Individual Comparison Set Comparison

Raw

Dat

aV

isu

alSu

mm

arie

s

0.0

0.5

1.0

ClassifierPerformance (p<.01)

0

2

4

6

Feature Count

0

1

2

3

Preference Rank(Small is better) (p=.03)

Raw + Individual

Raw + Set

Visual + Individual

Visual + Set

Findings

Visual summaries led to better features

Visual summaries preferred over looking at raw data

Sets useful only in combination with visuals

Feature Insight Summary

Featuring is arguably the most important step in machine learning, but there is little guidance on feature ideation.

Feature Insight supports error comparison, examination of sets, and visual summaries.

Visual summaries help people create better quality features.

Algorithm

Collect &

Label

Samples


Create

Features


Evaluate

Results


AlgorithmEvaluate

Results


Collect &

Label

Samples


Create

Features


How do people evaluate performance?

AlgorithmEvaluate

Results

Collect &

Label

Samples

Create

Features

0.710.67 0.70 ???

Predicted

Act

ual

Positive Negative

Positive 143 72

Negative 35 190


Summary statistics hide important

information about model behavior.

AlgorithmEvaluate

Results

Collect &

Label

Samples

Create

Features

0.710.67 0.70 ???

Predicted

Act

ual

Positive Negative

Positive 143 72

Negative 35 190

Summary statistics hide important

information about model behavior.

Switching tools to examine data is

disruptive and leads to a trial-and-

error approach [Patel et al., AAAI 2008].


Example: Predicting Income Levels

Decision Tree

86% Accuracy

Support Vector Machine

85% Accuracy

ModelTracker Demo

Significantly faster and more accurate performance analysis

ModelTracker Common Confusion Matrix

ModelTracker Summary

Current tools for performance analysis and debugging hide a lot of important information about model behavior.

ModelTracker supports estimating performance at multiple levels of granularity while enabling direct access to data.

People are significantly faster and more accurate at performance analysis with ModelTracker.

AlgorithmEvaluate

Results


Collect &

Label

Samples


Create

Features


TuneCollect Clean DeployTrain

Many more opportunities to better support machine learning in practice.

Label Feature Evaluate

Tune EvaluateLabel FeatureCollect Clean Train Deploy

Many more opportunities to better support machine learning in practice.

Tune EvaluateFeatureCollect Clean DeployLabel Train

Many more opportunities to better support machine learning in practice and theory.


Better data means better machine learning.

Most influence practitioners have on machine learning is through data.

Many more opportunities!

Better Machine Learning Through Data

Saleema Amershi, [email protected] Teaching GroupMicrosoft Research

August 14, 2016

Thanks! Questions?

mailto:[email protected]

better machine learning through data -...

Documents