internet of things
TRANSCRIPT
Big Data and Internet of things(IOT)
Project Morpheus(Beaconstac Analytics)
Garima BatraCore Platform Engineer | MobStac
May 2015
A quick intro about Beaconstac 1
Beaconstac is a proximity marketing and analytics platform for beacons
Several beacon specific events are defined to aid proximity marketing
The events include Camp on event, beacon exit event, region enter, region exit etc.
Beaconstac analytics platform makes it easy for managers/marketers/developers to analyze event data
Components include Beaconstac iOS/Android sdk, beaconstac portal
Why Hadoop? 1
Collect event logs generated from Beaconstac SDK usageNeeded a system to answer queries like
o Heat map of beacons by the number of visits received in a specified time interval.
o Heat map of beacons by the amount of time spent in a specified time interval.
o Average time spent by users near different beaconso Last seen per usero Last seen per beacono Analyzing data with custom attributes filterso Traversed path in an area by individual users
Leveraging Amazon's EMR for Beaconstac Analytics
1
Amazon's Streaming API for writing mapper and reducer functions in Python Input - Copy programs to Amazon S3 Output – Copy the processed/output data to S3 Initial tests were run using Amazon's EMR console. Here you can define the following -
1)Cluster configuration – Name, Termination protection, Logging, logs location on S3 etc.
2)Software configuration – Hadoop AMI version, applications to be installed on startup etc.
3)Hardware configuration – Types of nodes – master, Core and Task
4)Security keys, allowed users 5)Bootstrap actions – Configure Hadoop, Custom actions
etc.6)Steps – Streaming program, Hive program, Pig program
Integrating EMR in production 1
Batch processing for Morpheus 1
AWS Data pipeline
Deep dive into EMR startup and job submission
1
How Does AWS Data Pipeline Work? 1
Pipeline definition - specifies the business logic of your data managementAWS Data pipeline web service - interprets the pipeline definition and assigns
tasks to workers to move and transform data.Task runner - polls the AWS Data Pipeline web service for tasks and then
performs those tasks.
Morpheus version of Data pipeline 1
Runs every hourRequires a
Kafka consumer script
Copy the output to Elastic Search
Run EMR jobs
Copy logs from Kafka to S3
Runs once every day
Processes each job and produces output
Each job comprises of mapper and reducer scripts
Runs once every day
Inserts output in Elastic search
Settings file in each job 1
Source: Lorem Ipsum
Questions??
1