real-world smart applications with amazon machine...

66
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alex Ingerman Sr. Manager, Tech. Product Management, Amazon Machine Learning 2/25/2016 Real-World Smart Applications with Amazon Machine Learning

Upload: phungtruc

Post on 11-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Alex Ingerman

Sr. Manager, Tech. Product Management, Amazon Machine Learning

2/25/2016

Real-World Smart

Applications with Amazon

Machine Learning

Agenda

• Why social media + machine learning = happy customers

• Using Amazon ML to find important social media

conversations

• Building an end-to-end application to act on these

conversations

Application details

Goal: build a smart application for social media listening in the cloud

Full source code and documentation are on GitHub:

http://bit.ly/AmazonMLCodeSample

Amazon

KinesisAWS

Lambda

Amazon

Machine LearningAmazon

SNSAmazon

Mechanical Turk

Motivation for listening to social media

Customer is reporting a possible service issue

Motivation for listening to social media

Customer is making a feature request

Motivation for listening to social media

Customer is angry or unhappy

Motivation for listening to social media

Customer is asking a question

Why do we need machine learning for this?

The social media stream is high-volume, and most of the

messages are not CS-actionable

Amazon Machine Learning in one slide

• Easy to use, managed machine learning service built for developers

• Robust, powerful machine learning technology based on Amazon’s internal systems

• Create models using your data already stored in the AWS cloud

• Deploy models to production in seconds

Formulating the problem

We would like to…

Instantly find new tweets mentioning @awscloud, ingest and

analyze each one to predict whether a customer service agent

should act on it, and, if so, send that tweet to customer service

agents.

Formulating the problem

We would like to…

Instantly find new tweets mentioning @awscloud, ingest and

analyze each one to predict whether a customer service agent

should act on it, and, if so, send that tweet to customer service

agents.

Twitter API

Formulating the problem

We would like to…

Instantly find new tweets mentioning @awscloud, ingest and

analyze each one to predict whether a customer service agent

should act on it, and, if so, send that tweet to customer service

agents.

Twitter API Amazon

Kinesis

Formulating the problem

We would like to…

Instantly find new tweets mentioning @awscloud, ingest and

analyze each one to predict whether a customer service agent

should act on it, and, if so, send that tweet to customer service

agents.

Twitter API Amazon

Kinesis

AWS

Lambda

Formulating the problem

We would like to…

Instantly find new tweets mentioning @awscloud, ingest and

analyze each one to predict whether a customer service

agent should act on it, and, if so, send that tweet to customer

service agents.

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Formulating the problem

We would like to…

Instantly find new tweets mentioning @awscloud, ingest and

analyze each one to predict whether a customer service agent

should act on it, and, if so, send that tweet to customer

service agents.

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

Building smart applications

Pick the ML

strategy

1

Prepare

dataset

2 3

Create

ML model

4

Write and

configure code

5

Try it out!

Picking the machine learning strategy

Question we want to answer:

Is this tweet customer service-actionable, or not?

Our dataset:

Text and metadata from past tweets mentioning @awscloud

Machine learning approach:

Create a binary classification model to answer a yes/no question, and

provide a confidence score

Building smart applications

Pick the ML

strategy

1

Prepare

dataset

2 3

Create

ML model

4

Write and

configure code

5

Try it out!

Retrieve past tweets

Twitter API can be used to search for tweets containing our

company’s handle (e.g., @awscloud)

import twitter

twitter_api = twitter.Api(**twitter_credentials)

twitter_handle = ‘awscloud’

search_query = '@' + twitter_handle + ' -from:' + twitter_handle

results = twitter_api.GetSearch(term=search_query, count=100, result_type='recent’)

# We can go further back in time by issuing additional search requests

Retrieve past tweets

Twitter API can be used to search for tweets containing our

company’s handle (e.g., @awscloud)

import twitter

twitter_api = twitter.Api(**twitter_credentials)

twitter_handle = ‘awscloud’

search_query = '@' + twitter_handle + ' -from:' + twitter_handle

results = twitter_api.GetSearch(term=search_query, count=100, result_type='recent')

# We can go further back in time by issuing additional search requests

Good news: data is well-structured and clean

Bad news: tweets are not categorized (labeled) for us

Labeling past tweets

Why label tweets?

(Many) machine learning algorithms work by discovering patterns connecting data points and labels

How many tweets need to be labeled?

Several thousands to start with

Can I pay someone to do this?

Yes! Amazon Mechanical Turk is a marketplace for tasks that require human intelligence

Creating the Mechanical Turk task

Creating the Mechanical Turk task

Creating the Mechanical Turk task

Creating the Mechanical Turk task

Creating the Mechanical Turk task

Publishing the task

Publishing the task

Preview labeling results

Sample tweets from our previously collected dataset + their labels

This column was

created from

Mechanical Turk

responses

Preview labeling results

Sample tweets and labels (most metadata fields removed for clarity)

Preview labeling results

Sample tweets and labels (most metadata fields removed for clarity)

Preview labeling results

Sample tweets and labels (most metadata fields removed for clarity)

Preview labeling results

Sample tweets and labels (most metadata fields removed for clarity)

Preview labeling results

Sample tweets and labels (most metadata fields removed for clarity)

Building smart applications

Pick the ML

strategy

1

Prepare

dataset

2 3

Create

ML model

4

Write and

configure code

5

Try it out!

Amazon ML process, in a nutshell

1. Create your datasourcesTwo API calls to create your training and evaluation data

Sanity-check your data in service console

2. Create your ML modelOne API call to build a model, with smart default or custom setting

3. Evaluate your ML modelOne API call to compute your model’s quality metric

4. Adjust your ML modelUse console to align performance trade-offs to your business goals

Create the data schema string

{

"dataFileContainsHeader": true,

"dataFormat": "CSV",

"targetAttributeName": "trainingLabel",

"attributes": [

{

"attributeName": "description",

"attributeType": "TEXT"

},

<additional attributes here>,

{

"attributeName": "trainingLabel",

"attributeType": "BINARY"

}

]

}

Schemas communicate metadata about your dataset:

• Data format

• Attributes’ names, types, and order

• Names of special attributes

Create the training datasource

import boto

ml = boto.connect_machinelearning()

data_spec = {

'DataLocationS3’ : s3_uri # E.g.: s3://my-bucket/dir/data.csv

'DataSchema’ : data_schema } # Schema string (previous slide)

# Use only the first 70% of the datasource for training.

data_spec['DataRearrangement'] = ‘{ "splitting”: {"percentBegin": 0, "percentEnd”: 70 } }’

ml.create_data_source_from_s3( data_source_id = “ds-tweets-train”,

data_source_name = “Tweet training data (70%)”,

data_spec,

compute_statistics = True)

Create the evaluation datasource

import boto

ml = boto.connect_machinelearning()

data_spec = {

'DataLocationS3’ : s3_uri # E.g.: s3://my-bucket/dir/data.csv

'DataSchema’ : data_schema } # Schema string (previous slide)

# Use the last 30% of the datasource for evaluation.

data_spec['DataRearrangement'] = ‘{ "splitting”: {"percentBegin": 70, "percentEnd”: 100 } }’

ml.create_data_source_from_s3( data_source_id = “ds-tweets-eval”,

data_source_name = “Tweet evaluation data (30%)”,

data_spec,

compute_statistics = True)

Visually inspecting training data

Create the ML model

import boto

ml = boto.connect_machinelearning()

ml.create_ml_model( ml_model_id = “ml-tweets”,

ml_model_name = “Tweets screening model”,

ml_model_type = “BINARY”,

training_data_source_id = “ds-tweets-train”)

Input data location is looked up from the training datasource ID

Default model parameters and automatic data transformations are used, or you

can provide your own

Evaluate the ML model

import boto

ml = boto.connect_machinelearning()

ml.create_evaluation( evaluation_id = “ev-tweets”,

evaluation_name = “Evaluation of tweet screening model”,

ml_model_id = “ml-tweets”,

evaluation_data_source_id = “ds-tweets-eval”)

Input data location is looked up from the evaluation datasource ID

Amazon ML automatically selects and computes an industry-standard

evaluation metric based on your ML model type

Visually inspecting and adjusting the ML model

Building smart applications

Pick the ML

strategy

1

Prepare

dataset

2 3

Create

ML model

4

Write and

configure code

5

Try it out!

Reminder: Our data flow

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

Create an Amazon ML endpoint for retrieving real-

time predictions

import boto

ml = boto.connect_machinelearning()

ml.create_realtime_endpoint(“ml-tweets”)

# Endpoint information can be retrieved using the get_ml_model() method. Sample output: #"EndpointInfo": {

# "CreatedAt": 1424378682.266,

# "EndpointStatus": "READY",

# "EndpointUrl": ”https://realtime.machinelearning.us-east-1.amazonaws.com",

# "PeakRequestsPerSecond": 200}

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

Create an Amazon Kinesis stream for receiving

tweets

import boto

kinesis = boto.connect_kinesis()

kinesis.create_stream(stream_name = ‘tweetStream’, shard_count = 1)

# Each open shard can support up to 5 read transactions per second, up to a

# maximum total of 2 MB of data read per second. Each shard can support up to

# 1000 records written per second, up to a maximum total of 1 MB data written

# per second.

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

Set up AWS Lambda to coordinate the data flow

The Lambda function is our application’s backbone. We will:

1. Write the code that will process and route tweets

2. Configure the Lambda execution policy (what is it allowed to do?)

3. Add the Kinesis stream as the data source for the Lambda function

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

Create Lambda functions

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

// These are our function’s signatures and globals only. See GitHub repository for full source.

var ml = new AWS.MachineLearning();

var endpointUrl = '';

var mlModelId = ’ml-tweets';

var snsTopicArn = 'arn:aws:sns:{region}:{awsAccountId}:{snsTopic}';

var snsMessageSubject = 'Respond to tweet';

var snsMessagePrefix = 'ML model '+mlModelId+': Respond to this tweet: https://twitter.com/0/status/';

var processRecords = function() {…} // Base64 decode the Kinesis payload and parse JSON

var callPredict = function(tweetData) {…} // Call Amazon ML real-time prediction API

var updateSns = function(tweetData) {…} // Publish CS-actionable tweets to SNS topic

var checkRealtimeEndpoint = function(err, data) {…} // Get Amazon ML endpoint URI

Create Lambda functions

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

// These are our function’s signatures and globals only. See GitHub repository for full source.

var ml = new AWS.MachineLearning();

var endpointUrl = '';

var mlModelId = ’ml-tweets';

var snsTopicArn = 'arn:aws:sns:{region}:{awsAccountId}:{snsTopic}';

var snsMessageSubject = 'Respond to tweet';

var snsMessagePrefix = 'ML model '+mlModelId+': Respond to this tweet: https://twitter.com/0/status/';

var processRecords = function() {…} // Base64 decode the Kinesis payload and parse JSON

var callPredict = function(tweetData) {…} // Call Amazon ML real-time prediction API

var updateSns = function(tweetData) {…} // Publish CS-actionable tweets to SNS topic

var checkRealtimeEndpoint = function(err, data) {…} // Get Amazon ML endpoint URI

Configure Lambda execution policy

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

{ "Statement": [

{ "Action": [ "logs:*” ],

"Effect": "Allow",

"Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"

},

{ "Action": [ "sns:publish” ],

"Effect": "Allow",

"Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"

},

{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],

"Effect": "Allow",

"Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”

},

{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”, "kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ],

"Effect": "Allow",

"Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"

}

] }

Configure Lambda execution policy

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

{ "Statement": [

{ "Action": [ "logs:*” ],

"Effect": "Allow",

"Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"

},

{ "Action": [ "sns:publish” ],

"Effect": "Allow",

"Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"

},

{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],

"Effect": "Allow",

"Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”

},

{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”, "kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ],

"Effect": "Allow",

"Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"

}

] }

Allow request

logging in

Amazon

CloudWatch

Configure Lambda execution policy

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

{ "Statement": [

{ "Action": [ "logs:*” ],

"Effect": "Allow",

"Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"

},

{ "Action": [ "sns:publish” ],

"Effect": "Allow",

"Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"

},

{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],

"Effect": "Allow",

"Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”

},

{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”, "kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ],

"Effect": "Allow",

"Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"

}

] }

Allow

publication of

notifications to

SNS topic

Configure Lambda execution policy

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

{ "Statement": [

{ "Action": [ "logs:*” ],

"Effect": "Allow",

"Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"

},

{ "Action": [ "sns:publish” ],

"Effect": "Allow",

"Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"

},

{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],

"Effect": "Allow",

"Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”

},

{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”, "kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ],

"Effect": "Allow",

"Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"

}

] }

Allow calls to

Amazon ML

real-time

prediction APIs

Configure Lambda execution policy

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

{ "Statement": [

{ "Action": [ "logs:*” ],

"Effect": "Allow",

"Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"

},

{ "Action": [ "sns:publish” ],

"Effect": "Allow",

"Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"

},

{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],

"Effect": "Allow",

"Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”

},

{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”, "kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ],

"Effect": "Allow",

"Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"

}

] }

Allow reading of

data from

Kinesis stream

Connect Kinesis stream and Lambda function

import boto

aws_lambda = boto.connect_awslambda()

aws_lambda.add_event_source(

event_source = 'arn:aws:kinesis:' + region + ':' + aws_account_id + ':stream/' + “tweetStream”,

function_name = “process_tweets”,

role = 'arn:aws:iam::' + aws_account_id + ':role/' + lambda_execution_role)

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

Building smart applications

Pick the ML

strategy

1

Prepare

dataset

2 3

Create

ML model

4

Write and

configure code

5

Try it out!

Amazon ML real-time predictions test

Here is a tweet:

Amazon ML real-time predictions test

Here is the same tweet…as a JSON blob:

{

"statuses_count": "8617",

"description": "Software Developer",

"friends_count": "96",

"text": "`scala-aws-s3` A Simple Amazon #S3 Wrapper for #Scala 1.10.20 available : https://t.co/q76PLTovFg",

"verified": "False",

"geo_enabled": "True",

"uid": "3800711",

"favourites_count": "36",

"screen_name": "turutosiya",

"followers_count": "640",

"user.name": "Toshiya TSURU",

"sid": "647222291672100864"

}

Amazon ML real-time predictions test

Let’s use the AWS Command Line Interface to request a prediction for this tweet:

aws machinelearning predict \

--predict-endpoint https://realtime.machinelearning.us-east-1.amazonaws.com \

--ml-model-id ml-tweets \

--record ‘<json_blob>’

Amazon ML real-time predictions test

Let’s use the AWS Command Line Interface to request a prediction for this tweet:

aws machinelearning predict \

--predict-endpoint https://realtime.machinelearning.us-east-1.amazonaws.com \

--ml-model-id ml-tweets \

--record ‘<json_blob>’

{"Prediction": {

"predictedLabel": "0", "predictedScores": {

"0": 0.012336540967226028}, "details": {

"PredictiveModelType": "BINARY", "Algorithm": "SGD"

}}

}

Recap: Our application’s data flow

Twitter API Amazon

Kinesis

AWS

Lambda

Amazon

Machine Learning

Amazon

SNS

End-to-end application demo

Generalizing to more feedback channels

Amazon

Kinesis

AWS

Lambda

Model 1 Amazon

SNS

Model 2

Model 3

What’s next?

Try the service:

http://aws.amazon.com/machine-learning/

Download the Social Media Listening application code:

http://bit.ly/AmazonMLCodeSample

Get in touch!

[email protected]

Thank you!