internet of things

Post on 31-Jul-2015

84 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data and Internet of things(IOT)

Project Morpheus(Beaconstac Analytics)

Garima BatraCore Platform Engineer | MobStac

May 2015

A quick intro about Beaconstac 1

Beaconstac is a proximity marketing and analytics platform for beacons

Several beacon specific events are defined to aid proximity marketing

The events include Camp on event, beacon exit event, region enter, region exit etc.

Beaconstac analytics platform makes it easy for managers/marketers/developers to analyze event data

Components include Beaconstac iOS/Android sdk, beaconstac portal

Why Hadoop? 1

Collect event logs generated from Beaconstac SDK usageNeeded a system to answer queries like

o Heat map of beacons by the number of visits received in a specified time interval.

o Heat map of beacons by the amount of time spent in a specified time interval.

o Average time spent by users near different beaconso Last seen per usero Last seen per beacono Analyzing data with custom attributes filterso Traversed path in an area by individual users

Leveraging Amazon's EMR for Beaconstac Analytics

1

Amazon's Streaming API for writing mapper and reducer functions in Python Input - Copy programs to Amazon S3 Output – Copy the processed/output data to S3 Initial tests were run using Amazon's EMR console. Here you can define the following -

1)Cluster configuration – Name, Termination protection, Logging, logs location on S3 etc.

2)Software configuration – Hadoop AMI version, applications to be installed on startup etc.

3)Hardware configuration – Types of nodes – master, Core and Task

4)Security keys, allowed users 5)Bootstrap actions – Configure Hadoop, Custom actions

etc.6)Steps – Streaming program, Hive program, Pig program

Integrating EMR in production 1

Batch processing for Morpheus 1

AWS Data pipeline

Deep dive into EMR startup and job submission

1

How Does AWS Data Pipeline Work? 1

Pipeline definition - specifies the business logic of your data managementAWS Data pipeline web service - interprets the pipeline definition and assigns

tasks to workers to move and transform data.Task runner - polls the AWS Data Pipeline web service for tasks and then

performs those tasks.

Morpheus version of Data pipeline 1

Runs every hourRequires a

Kafka consumer script

Copy the output to Elastic Search

Run EMR jobs

Copy logs from Kafka to S3

Runs once every day

Processes each job and produces output

Each job comprises of mapper and reducer scripts

Runs once every day

Inserts output in Elastic search

Settings file in each job 1

Source: Lorem Ipsum

Questions??

1

top related