lab 0 – do this at the start of workshop9. leave elasticsearch at default setting of “small”,...

Lab 0 – Do this at the start of workshop

1. Download and extract content.zip from https://s3.amazonaws.com/mae303/content/content.zip

2. In the Management Console, in the upper right section, choose a region N Virginia.

3. In the Management Console Navigate to CloudFormation

4. Click on

5. Under - , select Navigate to the extracted content.zip /template folder and choose “media-analysis-deploy.template”

6. Click on

7. Name the Stack “media-analysis”

8. For enter an email address, this is where the Amazon Cognito temporary password will be sent.

9. Leave ElasticSearch at default setting of “Small”, Click on

10. Leave options default, Click on

11. At Review page, check box “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”

Click on

12. In Cloudformation console, Click on Refresh , review the stack creation to see the components of the Media Analysis Solution get created

13. In approximately 12-20 minutes you will get an email with subject “Your Media Analysis Solution demo login”. Which will contain a link to your serverless Media Analysis Solution Website, with temporary credentials provided by Amazon Cognito. Note: The link can also be found in the CloudFormation stack output under the key “DemoWebsite”

Lab 1

1. This lab utilizes videos that have audio, please lower volume and/or utilize headphones for the video components.

2. Go to the link in the email “Your Media Analysis Solution demo login” that you received from the Media Analysis Solution, upon logging in, create your password, and login to the website. NOTE: The login for the website will not be your email address, the username will look similar to your email address with the @ replaced with a .

3. On the Media Analysis Solution website, Click on

4. On the right side of the page, under “Add to Collection”, Click on , Select Jane_Smith.jpg from the workshop content /media folder. Give a name to the indexed face “Jane_Smith”. Click on . Here we are adding a custom face that will be indexed and detected in the content we upload. Note: it will not return an upload complete when the face is indexed. If you you want to validate that it has indexed the face you can use the CLI and run aws rekognition describe-collection --collection-id <<insert-your-collection-id>>

5. On the left side of the page, under “Analyze New Media”, Click on , Select AlexaSuperBowlCommercial.mp4 from the workshop content. Click on , you will see a status page that updates as the Step Function runs

6. This process will run for several minutes. On “Media Analysis Progress”, while this runs we will review the state machine. Click on , which will take you to AWS Step Functions. Review the Step Functions branches. Note: We will customize this later in the workshop.

14. When Step Functions completes return to the Media Analysis Solution serverless website link from the email and Click on . Note: The link can also be found in the CloudFormation stack output under the key “DemoWebsite”

7. Select under the video you uploaded, and the tab will be automatically populated. Scroll over some of the labels. Note: the confidence score of each label is shown when you hover over a label. This is the confidence score returned by Amazon Rekognition.

8. Select to play the video. While it is playing, click on , and / or , note that it highlights the label you select in the video.

9. Click on to restart the video and Click on , note that the indexed face you uploaded earlier (Jane_Smith) is highlighted in the video.

10. Click on to restart the video and Click on , note that the captions are provided by Amazon Transcribe.

11. Click on the tab below video to see the full transcript. Note: that it may have some inaccuracies, these will likely have lower confidence scores from Amazon Transcribe. This highlights the need for human verification/correction of the low confidence words. We will correct this in step 16.

12. In the AWS Management console, go to CloudFormation. Click on the media-analysis stack (it will have the description (SO0042) Media Analysis Solution, and select , Click on the bucket listed under resource MediaAnalysisBucket (It should be media-analysis-us-east-1-ACCOUNTNUMBER).

13. In the media analysis bucket, navigate to private, then into the only folder that is listed (it should look like us-east-1:randomguid (this correlates to your Cognito user), then into media, then into

the single random guid folder (this correlates to the single video you uploaded), then into the results folder. The full path will look something like this media-analysis-us-east-1-xxxxxxxxxxxxx/private/us-east-1:xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/media/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx/results Note: If you uploaded multiple videos, there will be multiple GUID’s, to validate that you are browsing to the correct folder, you can validate is against the media analysis website after you browse to the video to view results, it will be appended to the URL after /result and will correspond to the GUID of the folder In that folder, you will find all of the json results that were returned by Amazon Rekognition, Amazon Transcribe and Amazon Comprehend.

14. Click on transcript.json and Click on , open it with a notepad or JSON editor. Note: there is a full transcript at the top, followed by the individual words, including timecode, confidence, punctuation and alternatives. A high confidence result will look similar to this {"start_time":"0.174","end_time":"0.674","alternatives":[{"confidence":"0.9918","content":"Austin"}],"type":"pronunciation"},{"alternatives":[{"confidence":null,"content":","} While a low confidence result will look similar to this {"start_time":"4.066","end_time":"4.536","alternatives":[{"confidence":"0.2283","content":"Klaxon"}],"type":"pronunciation"} To increase the confidence of the words, you can create Custom Vocabularies, with known words. See https://docs.aws.amazon.com/transcribe/latest/dg/what-is-transcribe.html#what-custom-vocabulary for more detail. To deal with words that are low confidence you could flag these to be validated by a human, or passed automatically to SageMaker Ground Truth or Mechanical Turk to be validated by human validation.

15. Upload transcript.json from the content.zip/results folder replacing the transcript.json that is currently there. This is a corrected transcript.json, correcting any low confidence words.

16. Return to media solution website and Click on your uploaded video from section.

17. Click on to restart the video and Click on , note that the captions have been corrected.

Bonus:

1. Click on the tab below video to see all celebrities detected. Note that some of the celebrities will be inaccurately detected, these will likely have lower confidence scores from Amazon Recognition, and this highlights the need for human verification/correction of the low confidence words. We will view the confidence scores and correct this in step 5.

2. In the media analysis bucket from Lab 1 step 14, click on celebs.json and Click on , open it with a notepad or JSON editor. Note: there is a detail of all celebrities detected in the video including timecode, confidence, bounding box, details on facial landmarks and pitch of the face. This is the result of the A high confidence result will look similar to this {"Timestamp":7440,"Celebrity":{"Urls":["www.imdb.com/name/nm1757263"],"Name":"Jeff Bezos","Id":"1SK7cR8M","Confidence":89,"BoundingBox":{"Width":0.25468748807907104,"Height":0.675000011920929,"Left":0.30781251192092896,"Top":0.19027778506278992}}} While a low confidence result will look similar to this {"Timestamp":63763,"Celebrity":{"Urls":["www.imdb.com/name/nm0870038"],"Name":"Harry Townes","Id":"3Ad1ce6J","Confidence":50,"BoundingBox":{"Width":0.4000000059604645,"Height":0.4722222089767456,"Left":0.19296875596046448,"Top":0.3097222149372101}}} To deal with labels that are low confidence you could flag these to be validated by a human, or passed automatically to Mechanical Turk to be validated by Mechanical Turk workers, and returned to the workflow once verified.

3. Upload celebs.json from content.zip/results folder.

4. Return to media solution website and Click on your uploaded video from section.

5. Click on to restart the video and Click on , note that the celebrities have been corrected.

Lab 2

1. In AWS Console, go to S3

2. Go to the media-analysis-us-east-1-xxxxxxxxxxxxx (xxxxxxxxxxxxxx will be your account number) that was created by the Media Analysis Solution in Lab 1

3. Upload model.tar.gz from Content.zip /model to the root of media-analysis-us-east-1-xxxxxxxxxxxxx bucket.

4. In AWS Console, go to Sagemaker

5. On left-hand pane, Click on , then

6. On the right hand side, Click on

7. Name the model “imageclassification”

8. Under IAM role, Click on

9. Under S3 buckets you specify, add the media-analysis-us-east-1-xxxxxxxxxxxxx that was created in Lab 1

10. Click on

11. Scroll down on the Sagemaker create model page. For Location of inference code image: 811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest Note: This corresponds to the latest container for the image classification algorithm for US-EAST-1. For other regions 'us-west-2' 433757028032.dkr.ecr.us-west-2.amazonaws.com/image-classification:latest 'us-east-1'

811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest 'us-east-2' 825641698319.dkr.ecr.us-east-2.amazonaws.com/image-classification:latest 'eu-west-1' 685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:latest

12. For Location of model artifacts: s3://media-analysis-us-east-1-xxxxxxxxxxxx/model.tar.gz (using the name of the media analysis bucket from lab 1) NOTE: us s3:// not https:// to reference the model object

13. Click on

14. On left-hand pane skip over Endpoint Configuration and click on

15. On the right hand side, Click on

16. For Endpoint Name: imageclassification

17. For Endpoint Configuration:

18. For Endpoint configuration name: imageclassification-endpointcfg

19. Leave Encryption key at default

20. Click on

21. On the Add Model screen, choose the imageclassification model you created earlier

22. Under (scroll to the right) click on to the right of the imageclassification model you just added

23. Under , change the instance type from ml.m4.4xlarge to ml.t2.medium (leaving the rest at their defaults)

24. Click on

25. Click on

26. Click on , This will take 5-7 minutes, you can continue with the lab while the endpoint creates

27. In the AWS Console, navigate to IAM, Click on Policies

28. Click on then the tab

29. Paste the JSON code from the content.zip/code/sagemakerinvokeIAMpolicy.json into the JSON editor (NOTE: Replace the JSON policy with the JSON from sagemakerinvokeIAMpolicys.json)

30. Click on , name the policy “SagemakerInvoke”, and Click on

31. In the IAM Console, Click on , and look for an IAM Role MediaAnalysis-MediaAnalysisFunctionRole-xxxxxxxxxxxxxx (For exact function name see Resources section of MediaAnalysis CloudFormation stack). Open MediaAnalysis-MediaAnalysisFunctionRole-xxxxxxxxxxxxx role and Click on

32. Search for the SagemakerInvoke policy you just created, select it and Click on

33. In AWS Console, go to S3

34. Navigate to the media-analysis-eu-west-1-xxxxxxxxxxxxx (bucket from Lab 1) bucket. Click on

35. Upload amazongocommercial1.jpg from content.zip/media to the root of your media-analysis-eu-west-1-xxxxxxxxxxxxx (bucket from Lab 1) bucket.

36. In the AWS Console, navigate to Lambda, Click on

{ "Version": "2012-10-17", "Statement": [ { "Sid": "SagmakerAPI", "Effect": "Allow", "Action": [ "sagemaker:ListEndpointConfigs", "sagemaker:ListTags", "sagemaker:ListModels", "sagemaker:ListTrainingJobs", "sagemaker:ListNotebookInstances", "sagemaker:ListEndpoints", "sagemaker:InvokeEndpoint" ], "Resource": "*" }, { "Sid": "SagemakerEndpoint", "Effect": "Allow", "Action": "sagemaker:InvokeEndpoint", "Resource": "arn:aws:sagemaker:*:*:endpoint/*" } ] }

37. In the Create Function section, Click on , name the Name: Sagemaker-imageclassification Runtime: Node.js 8.10 Role: Click on an existing role Existing Role: MediaAnalysis-MediaAnalysisFunctionRole-xxxxxxxxxxxxx

Click on

38. Once in the function paste code from content.zip\code\sagemakerinvokenode.txt into the Function Code section.

39. Click on , and Click on

40. In the Configure Test Event section, Click on , name the event

“S3Imageupload” , replace the entire section with the code from

var AWS = require('aws-sdk'); var s3 = new AWS.S3(); var sagemakerruntime = new AWS.SageMakerRuntime(); var endpointName = "imageclassification";// process.env['SMEP'] var objectCategories = ['Amazon Logo', 'Whole Foods Logo', 'Zappos Logo', 'NoLogo'];

function detectCustomLabels(bucket, key, callback) { s3.getObject({Bucket: bucket, Key: key}, function(err, data) { if(err) { console.log("Error getting object:" + err); callback(err, null); } else { var payload = data.Body; sagemakerruntime.invokeEndpoint({EndpointName: endpointName, ContentType: 'application/x-image', Body: payload}, function(err, data) { if(err) { console.log("Error calling sagemaker endpoint:" + err); callback(err, null); } else { var response = JSON.parse(data.Body.toString('utf8')); var index = response.indexOf(Math.max.apply(null, response)); callback(null, {Labels: [{Name: objectCategories[index], Confidence: response[index].toString()}]}); } }); } }); }

exports.handler = function(event, context, callback) { var bucket = event.Records[0].s3.bucket.name; var key = decodeURI(event.Records[0].s3.object.key); detectCustomLabels(bucket, key, callback); };

content.zip/code/testevent.json, changing the bucket from media-analysis-xx-xxxx-x-xxxxxxxxxxxxx, to the media analysis bucket from lab 1 and the region to the correct region.

41. Test the function by selecting , you should something similar to the following result

{ "Records": [ { "eventVersion": "2.0", "eventTime": "1970-01-01T00:00:00.000Z", "requestParameters": { "sourceIPAddress": "127.0.0.1" }, "s3": { "configurationId": "testConfigRule", "object": { "eTag": "0123456789abcdef0123456789abcdef", "key": "amazongocommercial1.jpg", "sequencer": "0A1B2C3D4E5F678901", "size": 1024 }, "bucket": { "ownerIdentity": { "principalId": "EXAMPLE" }, "name": "media-analysis-xx-xxxx-x-xxxxxxxxxxxx", "arn": "arn:aws:s3:::mybucket" }, "s3SchemaVersion": "1.0" }, "responseElements": { "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH", "x-amz-request-id": "EXAMPLE123456789" }, "awsRegion": "us-east-1", "eventName": "ObjectCreated:Put", "userIdentity": { "principalId": "EXAMPLE" }, "eventSource": "aws:s3" } ] }

Lab 3

1. In the AWS Management console, go to CloudFormation. Click on the media-analysis stack (it will have the description (SO0042) Media Analysis Solution), and select , Click on the Lambda Function listed under resource MediaAnalysisFunction (It should be media-analysis-MediaAnalysisFunction-xxxxxxxxxxxxxxx).

2. In the AWS Console, back in the MediaAnalsisFunction-xxxxxxxxxxxxxxx. Navigate down to and select , then click on , select the updatedmediaanalysisfunction.zip file from the content.zip/function folder. Click on to upload the code. This code adds Sagemaker as an additional endpoint to generate labels.

3. Under , add a new environment variable named SAGEMAKER_ENDPOINT, and enter imageclassification (the endpoint from Lab 2). It should look similar to this Click on , to save the Environment Variable Change.

4. On the Media Analysis Solution website from lab 1, Click on

5. Under “Analyze New Media”, Click on Browse , Select amazongocommercial1.jpg from the workshop content /media directory. Click on , you will see a status page that updates as the Step Function runs

6. When Step Functions completes click on

7. Select the image you uploaded, and Click on , scroll over some of the labels. Note: the labels you see will be Amazon Rekognition, and at the very end of the labels you will see labels that were generated by your Sagemaker endpoint.

8. Some of the labels may not be accurate, and will have lower confidence scores. In order to provide the most accurate a confidence threshold should be decided on. To change this we will utilize the API function that searches for results. In the Management Console Navigate to Lambda

9. Click the function named media-analysis-MediaAnaly-MediaAnalysisApiFunction-xxxxxxxxxxxxx (there will be a random guid at the end)

10. Scroll down to , and change the environment variable CONFIDENCE_SCORE from 0 to 80

11. Click on

12. Return to the Media Analysis Solution website, and click on

13. Under the image you just uploaded choose , Note that the labels are now filtered to only show results above the confidence level

Clean-up

1. In the Management Console Navigate to CloudFormation

2. Check the box next to the media-anaylsis stack (Note: not the nested stacks), and select , then

3. The stack should take approx. 6 minutes and will end with due to the S3 buckets not being empty and the mediaanalysis function role has been modified.

4. In the Management Console, Navigate to S3

5. Check the box next to the media-analysis-eu-west-1-xxxxxxxxxxxxx bucket and select you will have to type the name of the bucket to complete the deletion.

6. In the Management Console, Navigate to IAM

7. Choose , and check the box next to the MediaAnalysis-MediaAnalysisFunctionRole-xxxxxxxxxxxxxx and click on

8. In the Management Console, Navigate to Sagemaker

9. On the left side, click on

10. Click on the next to the imageclassification endpoint you created and choose and , on the delete verification pop-up click on


12. Click on the next to the imageclassification-endpointcfg endpoint configuration you created and choose and , on the delete verification pop-up click on


14. Click on the next to the imageclassification model you created and choose and , on the delete verification pop-up click on

Appendix (To be done outside of workshop): In Lab 2 you used a model that was already trained in Sagemaker and created the Model and Endpoint in the console from the model artifacts. Below are instructions on how to build and train a model, create and endpoint in Sagemaker using the Jupyter notebook for Image Classification

1. In AWS Console, go to Sagemaker

2. On left-hand pane, Click on Notebook Instances

3. On right-hand side, Click on

4. Name your notebook “MediaAnalysis”.

5. Choose the AmazonSageMaker-ExecutionRole-xxxxxxxxxxxxxxxxx you created earlier in the lab

6. Click on

7. In 3-4 minutes your notebook will show Status of , Click on

8. In Jupyter Notebook, Click on

9. Under Introduction to Amazon Algorithms, to the right of Image-classification-transfer-learning.ipynb, click on , leave the name as default and Click on

10. Once in Jupyter notebook, you will need to replace <<bucket-name>> with your media-analysis-eu-west-1-xxxxxxxxxx bucket. To advance each cell Click on, or Ctrl-Enter. Follow the instructions in the notebook to download a dataset, train a model, deploy that model, create an endpoint and run an inference against the endpoint. NOTE: It may not be possible to train a model due to your account limits. You will see the following error if you do not have the P2 limit set to 1 or above.

The account-level service limit 'ml.p2.xlarge for training job usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.

If you get the above you will need to increase your limit for P2.xlarge instances.

11. Once you have successfully deployed an Endpoint verify the endpoint is in service by going to the AWS Console, navigating to Sagemaker, and selecting Endpoints. You should see an Endpoint titled DEMO-imageclassification-ep-timestamp with a status of . Copy the endpoint name to a notepad, it will be utilized later.

lab 0 – do this at the start of workshop9. leave elasticsearch at default setting of “small”,...

Documents