how to train your classifier: create a serverless machine learning system with aws and python

26
How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python PyData November 27th, 2017 [email protected]

Upload: stuart-myles

Post on 22-Jan-2018

210 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

How to Train

Your Classifier:

Create a Serverless Machine Learning System

with AWS and Python

PyData ✤ November 27th, 2017 ✤ [email protected]

Page 2: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Classification

Parrots

Sandwiches

[email protected]

Page 3: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

[email protected]

Tags

Why do you want tags

on your text content?

● Search, navigation,

recommendations

● Aggregation, routing

● Discoverability○ properties

○ relationships

Page 4: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

[email protected]

Taxonomy

Page 5: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

[email protected]

TaxonomyJordan Larson

<http://cv.ap.org/id/9A7FD8FA87AD4A43BDD522B65147A808> ,

ap:associatedState <http://cv.ap.org/id/8083[Nebraska]43E>;ap:displayLabel "Jordan Larson (Women's volleyball)"@en;

ap:hometown "Hooper, NE"@en;

ap:olympicTeam2016 <http://cv.ap.org/id/46[United States Olympic Team]B73H>;ap:sport <http://cv.ap.org/id/DA[Volleyball]C8EA>;dbprop:birthdate "1986-10-16"^^xsd:date;dcterms:created "2012-07-11T14:30:26-04:00"^^xsd:dateTime;dcterms:modified "2017-07-25T10:37:49-04:00"^^xsd:dateTime;

a <http://cv.ap.org/c/ProfessionalAthlete>, skos:Concept;

skos:broader <http://cv.ap.org/id/384[Professional Athlete]88>;skos:definition "American volleyball player."@en;skos:inScheme <http://cv.ap.org/a#person>;

skos:prefLabel "Jordan Larson"@en;foaf:gender "Female"@en.

Page 6: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Applying taxonomy to textManually

[email protected]

Airlines Industry

Pan American Airlines Co.

Travel

Page 7: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

<Hurricane Harvey>

(AND,

(MINOC_2,

(SENT,

(NOTIN,

(OR,"Harvey_C","HARVEY_C"),

(OR,"[Fullname

female]","[Fullname

male]","[Person]")),

(OR,"texas","landfall","storm",

"hurricane","nws","National weather

service","evacuate@","surge@","flood@",

"rain@N","coastal","sandbag@N"...

)

)

)...

Applying taxonomy to textRules-based classifier

[email protected]

https://www.flickr.com/photos/notionscapital/15556898221/

Page 8: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Applying taxonomy to textStatistical classifier

[email protected]

Training data

Training engine Trained model

Page 9: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

AP Metadata ServicesTag with AP taxonomy

APMS Custom TaggingSimple four step REST API

Add your own tags and taxonomy

[email protected]

Page 10: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Let’s create a classifier! For dragons

What if l like the AP Taxonomybut I want to classify with some additional tags?

In this case, documents about dragons

Page 11: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

A taxonomy of dragons

(borrowed from screencrush.com)

New documents about dragons

To be classified

Page 12: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

A map (with some * )

A fully automated workflow for training and deploying a Lambda-based classifier

Sadly, the expression hic sunt

dracones (here be dragons) is an

anachronism, but it does appear

at least once, on the Hunt-Lenox

globe (ca 1510).

The Hunt-Lenox Globe (NYPL)

* Dragon emojis indicate problems found and (mostly) solved

Page 13: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Step Functions

Client

EC2

Auto Scaling

Download training data

Download dependencies

Train model

Deploy model

EC2 classifier.py

classifier.pkl

tags.json

API Gateway

Lambda

Workflow Scaling Worker Classifier

[email protected]

Creating a classifier

Page 14: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

A Lambda-based classifier

• AWS Lambda: run event-driven code without provisioning or managing a server or servers

•Cost efficient solution to ensure capacity meets demand

• What do we need?• Code to invoke classifier and return results to user

• Code dependencies (e.g. scikit-learn)

• Other supporting artifacts (the trained model, the taxonomy)

• Permissions for Lambda function to interact with other AWS services

• API endpoint for accessing Lambda function

[email protected]

Page 15: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Step Functions

Client

EC2

Auto Scaling

Download training data

Download dependencies

Train model

Deploy model

EC2 classifier.py

classifier.pkl

tags.json

API Gateway

Lambda

Workflow Scaling Worker Classifier

[email protected]

Processing user requests

Page 16: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Processing user requests

Validate and trainAdding complexity: a workflow for algorithm selection

AWS Step Functions: use visual workflows to coordinate microservices into a single application

Triggers auto-scaling,

sends training request

to worker in the cloud.

[email protected]

Page 17: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Step Functions

Client

EC2

Auto Scaling

Download training data

Download dependencies

Train model

Deploy model

EC2 classifier.py

classifier.pkl

tags.json

API Gateway

Lambda

Workflow Scaling Worker Classifier

[email protected]

Training and deploying

Page 18: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Training in the cloud

• AWS EC2: scalable computing capacity in the cloud

• Register an Amazon Machine Image (AMI) specifically for training

•Speeds up provisioning your server

• Ensures versions match between dependencies and your model•Prepare dependencies ahead of time to beat AWS Lambda’s size limits

•If you are using scikit-learn, sklearn-build-lambda can generate an appropriately sized zip

• Save model and taxonomy to disk, add to dependency zip

[email protected]

Page 19: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Automating deployments• Serverless Framework: Node.js

application for rapid deployment of serverless architectures

• Simplifies the task of creating (and deleting) our classifier Lambdas•Provider agnostic, though you may not be•Zip artifact support for Lambda creation

[email protected]

Page 20: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Step Functions

Client

EC2

Auto Scaling

Download training data

Download dependencies

Train model

Deploy model

EC2 classifier.py

classifier.pkl

tags.json

API Gateway

Lambda

Workflow Scaling Worker Classifier

[email protected]

Classifying with AWS Lambda

Page 21: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Classifying with AWS Lambda

• Be mindful of cold starts•Allocating more memory may help

• Store large models in S3 and take advantage of container reuse•Download assets to /tmp•Check /tmp for cached data before invocation

Item Limit

Deployment package (compressed) 50MB

Deployment package (uncompressed) 250MB

Non-persistent disk space in /tmp 500MB

[email protected]

Page 22: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

Predicted Eagles

Predicted Doves

PredictedPigeons

Sum of items

= 300

Actual Eagles

95 3 2 100 Eagles

Actual Doves

3 72 25 100 Doves

ActualPigeons

2 23 75 100 Pigeons

How do I measure results?Confusion matrix

[email protected]

Page 23: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

How do I measure results?

[email protected]

Measure your model’s performance per class• Precision (number of correct predictions divided by the total number in the dataset)

• Recall (number of correct positive predictions divided by the total number of positives)

Predicted

Eagles

Predicted

Doves

Predicted

Pigeons

Sum of items

= 300

Actual

Eagles95 3 2 100 Eagles

Actual

Doves3 72 25 100 Doves

Actual

Pigeons2 23 75 100 Pigeons

Model accuracy:

242 / 300 = 80%

Page 24: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

How do I improve results?

Training data• Correctly tagged - quality matters• Quantity matters too - as long as it’s ‘good’ data!• Balanced training sets across classes

[email protected]

Page 25: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python

How do I improve results?

Taxonomy• Clean taxonomy nodes and structure• Distinct semantics, use relationships• Avoid overlapping concepts between nodes

[email protected]