internet of things

11
Big Data and Internet of things(IOT)

Upload: dezyre

Post on 31-Jul-2015

84 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Internet of Things

Big Data and Internet of things(IOT)

Page 2: Internet of Things

Project Morpheus(Beaconstac Analytics)

Garima BatraCore Platform Engineer | MobStac

May 2015

Page 3: Internet of Things

A quick intro about Beaconstac 1

Beaconstac is a proximity marketing and analytics platform for beacons

Several beacon specific events are defined to aid proximity marketing

The events include Camp on event, beacon exit event, region enter, region exit etc.

Beaconstac analytics platform makes it easy for managers/marketers/developers to analyze event data

Components include Beaconstac iOS/Android sdk, beaconstac portal

Page 4: Internet of Things

Why Hadoop? 1

Collect event logs generated from Beaconstac SDK usageNeeded a system to answer queries like

o Heat map of beacons by the number of visits received in a specified time interval.

o Heat map of beacons by the amount of time spent in a specified time interval.

o Average time spent by users near different beaconso Last seen per usero Last seen per beacono Analyzing data with custom attributes filterso Traversed path in an area by individual users

Page 5: Internet of Things

Leveraging Amazon's EMR for Beaconstac Analytics

1

Amazon's Streaming API for writing mapper and reducer functions in Python Input - Copy programs to Amazon S3 Output – Copy the processed/output data to S3 Initial tests were run using Amazon's EMR console. Here you can define the following -

1)Cluster configuration – Name, Termination protection, Logging, logs location on S3 etc.

2)Software configuration – Hadoop AMI version, applications to be installed on startup etc.

3)Hardware configuration – Types of nodes – master, Core and Task

4)Security keys, allowed users 5)Bootstrap actions – Configure Hadoop, Custom actions

etc.6)Steps – Streaming program, Hive program, Pig program

Page 6: Internet of Things

Integrating EMR in production 1

Page 7: Internet of Things

Batch processing for Morpheus 1

AWS Data pipeline

Page 8: Internet of Things

Deep dive into EMR startup and job submission

1

Page 9: Internet of Things

How Does AWS Data Pipeline Work? 1

Pipeline definition - specifies the business logic of your data managementAWS Data pipeline web service - interprets the pipeline definition and assigns

tasks to workers to move and transform data.Task runner - polls the AWS Data Pipeline web service for tasks and then

performs those tasks.

Page 10: Internet of Things

Morpheus version of Data pipeline 1

Runs every hourRequires a

Kafka consumer script

Copy the output to Elastic Search

Run EMR jobs

Copy logs from Kafka to S3

Runs once every day

Processes each job and produces output

Each job comprises of mapper and reducer scripts

Runs once every day

Inserts output in Elastic search

Page 11: Internet of Things

Settings file in each job 1

Source: Lorem Ipsum

Questions??

1