machine learning for chickens, autonomous driving and a 3-year-old who won’t sleep

32
© 2017 MapR Technologies MapR Confidential 1 Machine learning for Chickens, Autonomous Driving, And a 3-year-old who won’t sleep

Upload: mapr-data-technologies

Post on 21-Jan-2018

200 views

Category:

Data & Analytics


1 download

TRANSCRIPT

© 2017 MapR TechnologiesMapR Confidential 1

Machine learning for Chickens,

Autonomous Driving,

And a 3-year-old who won’t sleep

© 2017 MapR TechnologiesMapR Confidential 2

Introduction

Benjamin Burrus

[email protected]

• Father of 4, husband of 1

• Solutions engineer with MapR

• Previously worked for Oracle & IBM -- 15+ years in

Business Intelligence and Analytics

• Love the outdoors, enjoy coaching (baseball &

basketball) and working on the house

© 2017 MapR TechnologiesMapR Confidential 3

Agenda

• @Tensorchicken: A simple walk-through of machine learning

• A peek inside an autonomous driving project

• If You Give a Mouse a Sensor: an architecture for big data

projects

© 2017 MapR TechnologiesMapR Confidential 4

© 2017 MapR TechnologiesMapR Confidential 5

Ian Downard

© 2017 MapR TechnologiesMapR Confidential 6

Machine Learning Basic Flow

Gather Data Find Patterns

Train algorithm

Recognize Patterns

Build Model

Use Model

Prediction

New Data

Deploy Model

© 2017 MapR TechnologiesMapR Confidential 7

@Tensorchicken Flow

Gather Data Find Patterns

Train algorithm

Recognize Patterns

Build Model

Use Model

New Data

Deploy Model5000+ labeled

images

© 2017 MapR TechnologiesMapR Confidential 8

Patterns: How the algorithm sees things

What’s this? High probability that it is

100101001011010010110 101011101110111001001...

..I mean, Blue Jay

© 2017 MapR TechnologiesMapR Confidential 9

sudo apt-get update sudo apt-get install python3-pip python3-dev wget https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v1.1.0/tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl sudo pip3 Install tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl sudo pip3 uninstall mock sudo pip3 install mock

Install Tensorflow

on a Raspberry PI

sudo apt-get update sudo apt-get upgrade sudo apt-get remove libavcodec-extra-56 libavformat56 libavresample2 libavutil54 wget https://github.com/ccrisan/motioneye/wiki/precompiled/ffmpeg_3.1.1-1_armhf.deb sudo dpkg -i ffmpeg_3.1.1-1_armhf.deb sudo apt-get install curl libssl-dev libcurl4-openssl-dev libjpeg-dev libx264-142 libavcodec56 libavformat56 libmysqlclient18 libswscale3 libpq5 wget https://github.com/Motion-Project/motion/releases/download/release-4.0.1/pi_jessie_motion_4.0.1-1_armhf.deb sudo dpkg -i pi_jessie_motion_4.0.1-1_armhf.deb sudo vi /etc/motion/motion.conf

Set up the

webcam

python retrain.py --bottleneck_dir=bottlenecks-chickens --how_many_training_steps=500 --model_dir=inception --summaries_dir=training_summaries-chickens/basic --output_graph=retrained_graph-chickens.pb --output_labels=retrained_labels-chickens.txt --image_dir=chicken_photosBuild the model

Run the model#!/bin/bash sntp -s time.google.comsleep 5 find /home/pi/motion/ -type f -name "*`date +'%Y%m%d'`*.jpg" | sort | tail -n 1 | while read line; do date echo "Tweeting file '$line'"; CLASSIFICATION=`python3 /home/pi/tf_files/label_image-chickens.py $line | head -n 3`; PUBLICIP=`curl -s ifconfig.co | tr '.' '-'`MESSAGE=`echo -e "${CLASSIFICATION}\nLive video: ${PUBLICIP}.ptld.qwest.net:8081"`MEDIA_ID=`twurl -H upload.twitter.com -X POST "/1.1/media/upload.json" --file $line --file-field "media" | jq -r '.media_id_string'`; twurl "/1.1/statuses/update.json?tweet_mode=extended" -d "media_ids=$MEDIA_ID&status=$MESSAGE"; done

http://www.bigendiandata.com/2017-07-12-Tensor_Chicken/

Small-scale, tangible machine learning

© 2017 MapR TechnologiesMapR Confidential 10

© 2017 MapR TechnologiesMapR Confidential 11

JOHN DEERE SPENT $305 MILLION ON A LETTUCE-FARMING ROBOT

Blue River’s key technology is called “see and spray.” It’s a set of cameras that fix onto crop sprayers and use deep learning to identify plants. If it sees a weed, it’ll hit it with pesticide; if it sees a crop, it’ll drop some fertilizer.

Source: https://www.wired.com/story/why-john-deere-just-spent-dollar305-million-on-a-lettuce-farming-robot/

© 2017 MapR TechnologiesMapR Confidential 12

How do you deal with the scale of raw data streams?

How do you reliably archive raw data and make it searchable?

Where do you run substantial machine learning workloads?

Search Google: “TensorFlow on MapR Tutorial”https://community.mapr.com/community/exchange/blog/2017/03/23/tensorflow-on-mapr-tutorial

© 2017 MapR TechnologiesMapR Confidential 13

© 2017 MapR TechnologiesMapR Confidential 14

Connected Car > Autonomous Driving

High Frequency Decisions

Advanced Driver Assistance

Systems (ADAS)

Computer Aided Driving

Vehicle Health

Fleet Management

By 2020, more than 250 million vehicles will be connected globally

- Gartner

© 2017 MapR TechnologiesMapR Confidential 15

State of Connected Car & Autonomous Driving

• Current phase is testing & development

• Gathering data from test cars and test facilities

• Machine learning and training models are built centrally

• Models deployed/downloaded to individual cars

© 2017 MapR TechnologiesMapR Confidential 16

Machine Learning: Test Bed for Autonomous Driving

Gather Data Find Patterns

Train algorithm

Recognize Patterns

Build Model

Use Model

Prediction

New Data

Deploy Model

Each Car:

50-100 Cameras

Many small sensors

Gathers ~2 GB/second

A few hundred cars,

several test sites

Each Car:

½ Exabyte of data

Need to react in real time

Stay in sync with updates

© 2017 MapR TechnologiesMapR Confidential 17

Connected Car Production Concepts

Each car will generate so much data that data becomes immobile

2. Move the algorithms, don’t move the data

Build Model

Use Model

1. Communicate anomalies

Hundreds of thousands of cars

Use Machine Learning to understand changing environments

Packages for smart city initiatives

Add: location relevant content

Core: ready for any environment

© 2017 MapR TechnologiesMapR Confidential 18

More Details on MapR for Connected Car

NorCom Selects MapR for Deep Learning in Autonomous Driving:

https://mapr.com/company/press-releases/norcom-selects-mapr-deep-learning/

Safe Driving for Autonomous Systems:

https://mapr.com/blog/safe-driving-self-driving-enterprise/

Demystifying AI, Machine Learning and Deep Learning

https://mapr.com/blog/demystifying-ai-ml-dl/

© 2017 MapR TechnologiesMapR Confidential 19

Enterprise Machine Learning Challenges

Gather Data Find Patterns

Train algorithm

Recognize Patterns

Build Model

Use Model

Prediction

New Data

Deploy ModelData comes from many sources, some very large,

Needs ETL and cleaning

Feature Vectors

Finding the best algorithm and parameters can use a lot of CPU

Needs to run on a server

Predictions are used by another system...

Real time data?Production data from many sources?

Needs Labels

Multiple ModelsMultiple UsersMultiple UsesSecurity!

© 2017 MapR TechnologiesMapR Confidential 20

© 2017 MapR TechnologiesMapR Confidential 21

© 2017 MapR TechnologiesMapR Confidential 22

If You Give a Mouse a Sensor....

Spark / Hadoop Cluster

Storage Systems

Classic Data

Warehouse

NoSQLDB

Application Server

MessageCluster

Search Server

© 2017 MapR TechnologiesMapR Confidential 23

Spark / Hadoop Cluster

Storage Systems

Classic Data Warehouse

NoSQLDB

Application Server

MessageCluster

Search Server

Its not the circles, it’s the lines that are hard

Expensive to stitch together

Difficult to manage

Limited in scale, not global

Many security models

Many points of failure

Multiple data silos

90% of Machine Learning is Data Logistics

© 2017 MapR TechnologiesMapR Confidential 24

An Ideal Platform For Enterprise ML

• Scales with you and your data

• Give freedom to use any tool

– Open source tools: Zeppelin, Spark, H2O, Tensorflow,…

– Legacy/local tools: NLP tools, scikit-learn, R

• Allows data to be versioned and kept reliably

• Limits data movement, limits silos of data

• Supports both model building and model deployment

• Supports security when & where needed

24

© 2017 MapR TechnologiesMapR Confidential 25

MapR Converged Data Platform

CONVERGED DATA PLATFORMHigh Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace

EXISTING ENTERPRISEAPPLICATIONS

BATCH & INTERACTIVE ANALYTICS

INTELLIGENT APPLICATIONS

ANALYTICS & MACHINE LEARNING

ON-PREMISE, MULTI-CLOUD, EDGE

IOT & EDGE

CLOUD-SCALE DATA STORE

OPERATIONAL DATABASEGLOBAL EVENT STREAMS

The best platform for Enterprise Machine Learning

© 2017 MapR TechnologiesMapR Confidential 26

Specific Features of MapR for Machine Learning

• Unlimited capacity & scale on a converged data platform – storage, database, streams, ML engines

• Real files. Read/write file access in high-scale distributed system. – Not all tools can read from HDFS. But all applications can read/write on MapR.

• Avoid silos of data, limit data movement, simplify logistics. – Leave the data in place in a central cluster

– Do machine learning directly without moving the data.

• Modern approach to ML is to leverage Containers– MapR provides the platform for containers to directly (and securely) access all data.

– Containers work directly with real files at massive scale with persistent state.

© 2017 MapR TechnologiesMapR Confidential 27

Large Healthcare Group: Fraud AnalyticsEnterprise Data Science on MapR

DISCOVERY PREPARATION DATA SCIENCE DEPLOYMENT REPORTING

Data

EngineersData

Scientists

© 2017 MapR TechnologiesMapR Confidential 28

https://mapr.com/ebook/machine-learning-logistics/

Ted Dunning & Ellen Friedman

Model Management in the Real World

Machine Learning Logistics

Machine Learning Logistics (eBook)

Why successful machine learning projects involve many

models

The capabilities needed for a stream-first microservice

approach

How to achieve rapid and seamless deployment of new

models

The role of DataOps and global data fabrics in making

logistics for machine learning much easier

© 2017 MapR TechnologiesMapR Confidential 29

Best Practices: Implementing DataOps with a Data Science

Platform

Date: November 7th

Time: 10am PT/ 1pm ET / 6pm BST

http://info.mapr.com/WB_Implementing-DataOps-BestPractices_Global_DG_17.11.07_RegistrationPage.html

With the growing number of data-driven organizations new approaches are needed to drive innovation in scaling and implementing data science. We will discuss how data and data science platforms take advantage of what we are calling DataOps. We will share background on this approach and how it supports putting data science models into production. We will provide best practices and a roadmap on how to implement these techniques to become a leader in machine learning and data science.

Join us for a complimentary webinar with experts from DataScience.com & MapR to:• Learn about the benefits of applying a DataOps approach to your data science workflow• Review best practices for how IT teams can support their data science teams• Hear how customers have reaped the benefits of this new approach.

© 2017 MapR TechnologiesMapR Confidential 30

Self-Service Data Science for Leveraging ML & AI on All of Your

Data

Date: November 16th

Time: 10am PT/ 1pm ET / 6pm BST

http://info.mapr.com/WB_Data-Science-Refinery_Global_DG_17.11.16_RegistrationPage.html

Introducing the MapR Data Science Refinery

MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.

Join us Thursday, November 16th at 10 AM PST for a complimentary webinar (including demo) covering our complete, open, secure, and converged solution that can scale to fit the needs of all types of data science teams.

Rachel SilverProduct ManagerData Science & AnalyticsMapR Technologies

© 2017 MapR TechnologiesMapR Confidential 31

Q&A

ENGAGE WITH US

@mapr

[email protected]

© 2017 MapR TechnologiesMapR Confidential 32

Search Google: “TensorFlow on MapR Tutorial” or:https://community.mapr.com/community/exchange/blog/2017/03/23/tensorflow-on-mapr-tutorial

NorCom Selects MapR for Deep Learning in Autonomous Driving:https://mapr.com/company/press-releases/norcom-selects-mapr-deep-learning/

Safe Driving for Autonomous Systems:https://mapr.com/blog/safe-driving-self-driving-enterprise/

Demystifying AI, Machine Learning and Deep Learninghttps://mapr.com/blog/demystifying-ai-ml-dl/

Machine Learning eBook:Machine Learning Logisticshttps://mapr.com/ebook/machine-learning-logistics/

Register for Webinar:Best Practices: Implementing DataOps with a Data Science Platformhttp://info.mapr.com/WB_Implementing-DataOps-BestPractices_Global_DG_17.11.07_RegistrationPage.html

Register for Webinar:Self-Service Data Science for Leveraging ML & AI on All of Your Datahttp://info.mapr.com/WB_Data-Science-Refinery_Global_DG_17.11.16_RegistrationPage.html