machine learning for chickens, autonomous driving and a 3-year-old who won’t sleep
TRANSCRIPT
© 2017 MapR TechnologiesMapR Confidential 1
Machine learning for Chickens,
Autonomous Driving,
And a 3-year-old who won’t sleep
© 2017 MapR TechnologiesMapR Confidential 2
Introduction
Benjamin Burrus
• Father of 4, husband of 1
• Solutions engineer with MapR
• Previously worked for Oracle & IBM -- 15+ years in
Business Intelligence and Analytics
• Love the outdoors, enjoy coaching (baseball &
basketball) and working on the house
© 2017 MapR TechnologiesMapR Confidential 3
Agenda
• @Tensorchicken: A simple walk-through of machine learning
• A peek inside an autonomous driving project
• If You Give a Mouse a Sensor: an architecture for big data
projects
© 2017 MapR TechnologiesMapR Confidential 6
Machine Learning Basic Flow
Gather Data Find Patterns
Train algorithm
Recognize Patterns
Build Model
Use Model
Prediction
New Data
Deploy Model
© 2017 MapR TechnologiesMapR Confidential 7
@Tensorchicken Flow
Gather Data Find Patterns
Train algorithm
Recognize Patterns
Build Model
Use Model
New Data
Deploy Model5000+ labeled
images
© 2017 MapR TechnologiesMapR Confidential 8
Patterns: How the algorithm sees things
What’s this? High probability that it is
100101001011010010110 101011101110111001001...
..I mean, Blue Jay
© 2017 MapR TechnologiesMapR Confidential 9
sudo apt-get update sudo apt-get install python3-pip python3-dev wget https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v1.1.0/tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl sudo pip3 Install tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl sudo pip3 uninstall mock sudo pip3 install mock
Install Tensorflow
on a Raspberry PI
sudo apt-get update sudo apt-get upgrade sudo apt-get remove libavcodec-extra-56 libavformat56 libavresample2 libavutil54 wget https://github.com/ccrisan/motioneye/wiki/precompiled/ffmpeg_3.1.1-1_armhf.deb sudo dpkg -i ffmpeg_3.1.1-1_armhf.deb sudo apt-get install curl libssl-dev libcurl4-openssl-dev libjpeg-dev libx264-142 libavcodec56 libavformat56 libmysqlclient18 libswscale3 libpq5 wget https://github.com/Motion-Project/motion/releases/download/release-4.0.1/pi_jessie_motion_4.0.1-1_armhf.deb sudo dpkg -i pi_jessie_motion_4.0.1-1_armhf.deb sudo vi /etc/motion/motion.conf
Set up the
webcam
python retrain.py --bottleneck_dir=bottlenecks-chickens --how_many_training_steps=500 --model_dir=inception --summaries_dir=training_summaries-chickens/basic --output_graph=retrained_graph-chickens.pb --output_labels=retrained_labels-chickens.txt --image_dir=chicken_photosBuild the model
Run the model#!/bin/bash sntp -s time.google.comsleep 5 find /home/pi/motion/ -type f -name "*`date +'%Y%m%d'`*.jpg" | sort | tail -n 1 | while read line; do date echo "Tweeting file '$line'"; CLASSIFICATION=`python3 /home/pi/tf_files/label_image-chickens.py $line | head -n 3`; PUBLICIP=`curl -s ifconfig.co | tr '.' '-'`MESSAGE=`echo -e "${CLASSIFICATION}\nLive video: ${PUBLICIP}.ptld.qwest.net:8081"`MEDIA_ID=`twurl -H upload.twitter.com -X POST "/1.1/media/upload.json" --file $line --file-field "media" | jq -r '.media_id_string'`; twurl "/1.1/statuses/update.json?tweet_mode=extended" -d "media_ids=$MEDIA_ID&status=$MESSAGE"; done
http://www.bigendiandata.com/2017-07-12-Tensor_Chicken/
Small-scale, tangible machine learning
© 2017 MapR TechnologiesMapR Confidential 11
JOHN DEERE SPENT $305 MILLION ON A LETTUCE-FARMING ROBOT
Blue River’s key technology is called “see and spray.” It’s a set of cameras that fix onto crop sprayers and use deep learning to identify plants. If it sees a weed, it’ll hit it with pesticide; if it sees a crop, it’ll drop some fertilizer.
Source: https://www.wired.com/story/why-john-deere-just-spent-dollar305-million-on-a-lettuce-farming-robot/
© 2017 MapR TechnologiesMapR Confidential 12
How do you deal with the scale of raw data streams?
How do you reliably archive raw data and make it searchable?
Where do you run substantial machine learning workloads?
Search Google: “TensorFlow on MapR Tutorial”https://community.mapr.com/community/exchange/blog/2017/03/23/tensorflow-on-mapr-tutorial
© 2017 MapR TechnologiesMapR Confidential 14
Connected Car > Autonomous Driving
High Frequency Decisions
Advanced Driver Assistance
Systems (ADAS)
Computer Aided Driving
Vehicle Health
Fleet Management
By 2020, more than 250 million vehicles will be connected globally
- Gartner
© 2017 MapR TechnologiesMapR Confidential 15
State of Connected Car & Autonomous Driving
• Current phase is testing & development
• Gathering data from test cars and test facilities
• Machine learning and training models are built centrally
• Models deployed/downloaded to individual cars
© 2017 MapR TechnologiesMapR Confidential 16
Machine Learning: Test Bed for Autonomous Driving
Gather Data Find Patterns
Train algorithm
Recognize Patterns
Build Model
Use Model
Prediction
New Data
Deploy Model
Each Car:
50-100 Cameras
Many small sensors
Gathers ~2 GB/second
A few hundred cars,
several test sites
Each Car:
½ Exabyte of data
Need to react in real time
Stay in sync with updates
© 2017 MapR TechnologiesMapR Confidential 17
Connected Car Production Concepts
Each car will generate so much data that data becomes immobile
2. Move the algorithms, don’t move the data
Build Model
Use Model
1. Communicate anomalies
Hundreds of thousands of cars
Use Machine Learning to understand changing environments
Packages for smart city initiatives
Add: location relevant content
Core: ready for any environment
© 2017 MapR TechnologiesMapR Confidential 18
More Details on MapR for Connected Car
NorCom Selects MapR for Deep Learning in Autonomous Driving:
https://mapr.com/company/press-releases/norcom-selects-mapr-deep-learning/
Safe Driving for Autonomous Systems:
https://mapr.com/blog/safe-driving-self-driving-enterprise/
Demystifying AI, Machine Learning and Deep Learning
https://mapr.com/blog/demystifying-ai-ml-dl/
© 2017 MapR TechnologiesMapR Confidential 19
Enterprise Machine Learning Challenges
Gather Data Find Patterns
Train algorithm
Recognize Patterns
Build Model
Use Model
Prediction
New Data
Deploy ModelData comes from many sources, some very large,
Needs ETL and cleaning
Feature Vectors
Finding the best algorithm and parameters can use a lot of CPU
Needs to run on a server
Predictions are used by another system...
Real time data?Production data from many sources?
Needs Labels
Multiple ModelsMultiple UsersMultiple UsesSecurity!
© 2017 MapR TechnologiesMapR Confidential 22
If You Give a Mouse a Sensor....
Spark / Hadoop Cluster
Storage Systems
Classic Data
Warehouse
NoSQLDB
Application Server
MessageCluster
Search Server
© 2017 MapR TechnologiesMapR Confidential 23
Spark / Hadoop Cluster
Storage Systems
Classic Data Warehouse
NoSQLDB
Application Server
MessageCluster
Search Server
Its not the circles, it’s the lines that are hard
Expensive to stitch together
Difficult to manage
Limited in scale, not global
Many security models
Many points of failure
Multiple data silos
90% of Machine Learning is Data Logistics
© 2017 MapR TechnologiesMapR Confidential 24
An Ideal Platform For Enterprise ML
• Scales with you and your data
• Give freedom to use any tool
– Open source tools: Zeppelin, Spark, H2O, Tensorflow,…
– Legacy/local tools: NLP tools, scikit-learn, R
• Allows data to be versioned and kept reliably
• Limits data movement, limits silos of data
• Supports both model building and model deployment
• Supports security when & where needed
24
© 2017 MapR TechnologiesMapR Confidential 25
MapR Converged Data Platform
CONVERGED DATA PLATFORMHigh Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
EXISTING ENTERPRISEAPPLICATIONS
BATCH & INTERACTIVE ANALYTICS
INTELLIGENT APPLICATIONS
ANALYTICS & MACHINE LEARNING
ON-PREMISE, MULTI-CLOUD, EDGE
IOT & EDGE
CLOUD-SCALE DATA STORE
OPERATIONAL DATABASEGLOBAL EVENT STREAMS
The best platform for Enterprise Machine Learning
© 2017 MapR TechnologiesMapR Confidential 26
Specific Features of MapR for Machine Learning
• Unlimited capacity & scale on a converged data platform – storage, database, streams, ML engines
• Real files. Read/write file access in high-scale distributed system. – Not all tools can read from HDFS. But all applications can read/write on MapR.
• Avoid silos of data, limit data movement, simplify logistics. – Leave the data in place in a central cluster
– Do machine learning directly without moving the data.
• Modern approach to ML is to leverage Containers– MapR provides the platform for containers to directly (and securely) access all data.
– Containers work directly with real files at massive scale with persistent state.
© 2017 MapR TechnologiesMapR Confidential 27
Large Healthcare Group: Fraud AnalyticsEnterprise Data Science on MapR
DISCOVERY PREPARATION DATA SCIENCE DEPLOYMENT REPORTING
Data
EngineersData
Scientists
© 2017 MapR TechnologiesMapR Confidential 28
https://mapr.com/ebook/machine-learning-logistics/
Ted Dunning & Ellen Friedman
Model Management in the Real World
Machine Learning Logistics
Machine Learning Logistics (eBook)
Why successful machine learning projects involve many
models
The capabilities needed for a stream-first microservice
approach
How to achieve rapid and seamless deployment of new
models
The role of DataOps and global data fabrics in making
logistics for machine learning much easier
© 2017 MapR TechnologiesMapR Confidential 29
Best Practices: Implementing DataOps with a Data Science
Platform
Date: November 7th
Time: 10am PT/ 1pm ET / 6pm BST
http://info.mapr.com/WB_Implementing-DataOps-BestPractices_Global_DG_17.11.07_RegistrationPage.html
With the growing number of data-driven organizations new approaches are needed to drive innovation in scaling and implementing data science. We will discuss how data and data science platforms take advantage of what we are calling DataOps. We will share background on this approach and how it supports putting data science models into production. We will provide best practices and a roadmap on how to implement these techniques to become a leader in machine learning and data science.
Join us for a complimentary webinar with experts from DataScience.com & MapR to:• Learn about the benefits of applying a DataOps approach to your data science workflow• Review best practices for how IT teams can support their data science teams• Hear how customers have reaped the benefits of this new approach.
© 2017 MapR TechnologiesMapR Confidential 30
Self-Service Data Science for Leveraging ML & AI on All of Your
Data
Date: November 16th
Time: 10am PT/ 1pm ET / 6pm BST
http://info.mapr.com/WB_Data-Science-Refinery_Global_DG_17.11.16_RegistrationPage.html
Introducing the MapR Data Science Refinery
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
Join us Thursday, November 16th at 10 AM PST for a complimentary webinar (including demo) covering our complete, open, secure, and converged solution that can scale to fit the needs of all types of data science teams.
Rachel SilverProduct ManagerData Science & AnalyticsMapR Technologies
© 2017 MapR TechnologiesMapR Confidential 32
Search Google: “TensorFlow on MapR Tutorial” or:https://community.mapr.com/community/exchange/blog/2017/03/23/tensorflow-on-mapr-tutorial
NorCom Selects MapR for Deep Learning in Autonomous Driving:https://mapr.com/company/press-releases/norcom-selects-mapr-deep-learning/
Safe Driving for Autonomous Systems:https://mapr.com/blog/safe-driving-self-driving-enterprise/
Demystifying AI, Machine Learning and Deep Learninghttps://mapr.com/blog/demystifying-ai-ml-dl/
Machine Learning eBook:Machine Learning Logisticshttps://mapr.com/ebook/machine-learning-logistics/
Register for Webinar:Best Practices: Implementing DataOps with a Data Science Platformhttp://info.mapr.com/WB_Implementing-DataOps-BestPractices_Global_DG_17.11.07_RegistrationPage.html
Register for Webinar:Self-Service Data Science for Leveraging ML & AI on All of Your Datahttp://info.mapr.com/WB_Data-Science-Refinery_Global_DG_17.11.16_RegistrationPage.html