mongodb world 2016: the best iot analytics with mongodb

62
The Best IoT Analytics with MongoDB Jake Angerman Sr. Solutions Architect MongoDB

Upload: mongodb

Post on 08-Jan-2017

711 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: MongoDB World 2016: The Best IoT Analytics with MongoDB

The Best IoT Analytics with MongoDB Jake Angerman Sr. Solutions Architect MongoDB

Page 2: MongoDB World 2016: The Best IoT Analytics with MongoDB

Sessions:

1. Building an IoT Application that Will Work Next Year

2. Building IoT Applications the Right Way

3. The Best IoT Analytics with MongoDB Track Overview

Page 3: MongoDB World 2016: The Best IoT Analytics with MongoDB

Introduction

Page 4: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Morpheus: time series data is everywhere

Morpheus picture

Page 5: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Automatic Dependent Surveillance Broadcast (ADS-B)

Primary radar

Secondary Surveillance Radar

Software defined radio

1090 MHz

1030 MHz

1090 MHz

Page 6: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Tin Can Reveal

homemade antenna (6.9mm quarter-wave whip)

NooElecNESDRMini2SDR $23.00USBextensioncable $10.00RFcableRG316femaletoMCXmale $5.50?ncan $2.87

Total: $41.37

6.9cm antenna

USB SDR

dump1090

Page 7: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

dump1090

Page 8: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Antenna Range approximately 250 miles (400km)

> db.tincan.aggregate( [{ $geoNear: { near: { type: "Point", coordinates: [ center_y, center_x ] }, distanceField: "meters", minDistance: 394289, limit: 100, spherical: true }}, {$sort: { "meters": -1}}, {$limit: 1} ])

Page 9: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Antenna Range approximately 250 miles (400km)

> db.tincan.aggregate( [{ $geoNear: { near: { type: "Point", coordinates: [ center_y, center_x ] }, distanceField: "meters", minDistance: 394289, limit: 100, spherical: true }}, {$sort: { "meters": -1}}, {$limit: 1} ])

Page 10: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

ADS-B BaseStation data format

MSG,7,111,11111,A3DC34,111111,2016/03/28,21:42:25.875,2016/03/28,21:42:25.865,,36975,,,,,,,,,,0 MSG,7,111,11111,A3DC34,111111,2016/03/28,21:42:25.884,2016/03/28,21:42:25.865,,36975,,,,,,,,,,0 MSG,8,111,11111,A33AA7,111111,2016/03/28,21:42:25.898,2016/03/28,21:42:25.865,,,,,,,,,,,,0 MSG,5,111,11111,A33AA7,111111,2016/03/28,21:42:25.961,2016/03/28,21:42:25.931,,28225,,,,,,,0,,0,0 MSG,3,111,11111,A678EF,111111,2016/03/28,21:42:26.013,2016/03/28,21:42:25.996,,34000,,,30.58369,-98.75438,,,,,,0 MSG,4,111,11111,A678EF,111111,2016/03/28,21:42:26.013,2016/03/28,21:42:25.996,,,417,283,,,0,,,,,0 MSG,3,111,11111,0D081C,111111,2016/03/28,21:42:26.280,2016/03/28,21:42:26.258,,35975,,,29.86456,-98.24018,,,,,,0 MSG,4,111,11111,0D081C,111111,2016/03/28,21:42:26.280,2016/03/28,21:42:26.258,,,429,206,,,0,,,,,0 MSG,8,111,11111,0D0648,111111,2016/03/28,21:42:26.358,2016/03/28,21:42:26.324,,,,,,,,,,,,0 MSG,3,111,11111,A678EF,111111,2016/03/28,21:42:26.454,2016/03/28,21:42:26.390,,34000,,,30.58389,-98.75544,,,,,,0 MSG,8,111,11111,A33AA7,111111,2016/03/28,21:42:26.478,2016/03/28,21:42:26.455,,,,,,,,,,,,0 MSG,7,111,11111,A678EF,111111,2016/03/28,21:42:26.679,2016/03/28,21:42:26.651,,34000,,,,,,,,,,0 MSG,7,111,11111,0D081C,111111,2016/03/28,21:42:26.759,2016/03/28,21:42:26.717,,35975,,,,,,,,,,0

altitude ICAO hex

lat/long

date & time stamp

message type

speed

Page 11: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

ADS-B in JSON { "timestamp" : ISODate("2016-01-31T20:54:35.000+0000"), "icao" : "AC4144", "callsign" : "N889WM", "altitude" : 9350, "bearing" : 150, "position" : [-98.62762, 30.03657], "ground_speed" : 152, "vertical_rate" : 192 }

Page 12: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

dump1090

dump1090 data flow

Linked List in RAM

HTTP :8080

BaseStation TCP

:30003

[{"hex":"ac741c", "squawk":"6234", "flight":"AAL2417 ", "lat": 30.619176, "lon":-97.755963, "validposition":1, "altitude":35975, "vert_rate":0,"track":202, "validtrack":1, "speed":438, "messages":557, "seen":0}]

AJAX JSON

Page 13: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

dump1090

dump1090 data flow

Linked List in RAM

HTTP :8080

BaseStation TCP

:30003

[{"hex":"ac741c", "squawk":"6234", "flight":"AAL2417 ", "lat": 30.619176, "lon":-97.755963, "validposition":1, "altitude":35975, "vert_rate":0,"track":202, "validtrack":1, "speed":438, "messages":557, "seen":0}]

AJAX JSON

ingest.py MSG,7,111,11111,A3DC34,111111,2016/03/28, 21:42:25.875,2016/03/28,21:42:25.865,,36975

MongoDB TCP

Page 14: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

What Types of Analytics Can We Do?

• Real-time dashboards (<1 second latency) = Aggregation framework •  Ad-hoc queries = Aggregation framework • Historical Reports = Aggregation framework or BI Connector • Batch processing = Hadoop • Machine Learning = Spark

Page 15: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Analytics without Data Migration

Database

Historical Analysis

Devices

Dashboards

DB

DB

ETL

ETL

Page 16: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Analytics without Data Migration

Database

Historical Analysis

Devices

DB

DB

ETL

ETL Dashboards

Page 17: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Analytics without Data Migration

Database Historical Analysis

Devices

Dashboards

• No bulk or incremental ETL required • One language for both real-time and ad-hoc queries

Page 18: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

replica set

Workload Isolation

Historical Analysis

Devices

Dashboards

primary

secondary

secondary

Page 19: MongoDB World 2016: The Best IoT Analytics with MongoDB

Aggregation Framework

Page 20: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Aggregation framework

Page 21: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

dump1090

dump1090 dashboard

Linked List in RAM

HTTP :8080

BaseStation TCP

:30003

[{"hex":"ac741c", "squawk":"6234", "flight":"AAL2417 ", "lat":30.619176, "lon":-97.755963, "validposition":1, "altitude":35975, "vert_rate":0,"track":202, "validtrack":1, "speed":438, "messages":557, "seen":0}]

AJAX JSON

ingest.py MSG,7,111,11111,A3DC34,111111,2016/03/28, 21:42:25.875,2016/03/28,21:42:25.865,,36975

MongoDB TCP

WT cache

Page 22: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Real-time Dashboards

•  Current Radar, last 5 minutes' worth of aircraft data •  pipeline = [

{"$match": {"t": {"$gte": datetime.datetime.utcnow() - datetime.timedelta(minutes=5) }}}, {"$sort": { "icao":1, "t":1 }}, {"$group": {"_id" : {"icao": "$icao"}, "events": {"$push": {"flight":"$callsign", "altitude":"$a", "track":"$b", "speed":"$s", "lon": { "$arrayElemAt":["$p", 0] }, "lat": { "$arrayElemAt":["$p", 1] }, "vert_rate":"$v" }}, "sum": {"$sum":1}}}, {"$project" :{ "_id":0, "icao":"$_id.icao", "events":"$events", "sum":"$sum"  }}  ]

$match first uses index

pre-built array avoids clumsy looping in

application

Page 23: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Ad hoc aggregations Which aircraft has the most observations?

> db.tincan.aggregate([ { $group: { _id: "$icao", "sum": {$sum: 1}, "callsigns": {"$addToSet": "$callsign"} }}, { $sort: { "sum": -1 }}, {$limit: 1}

])

{ "_id": ObjectId("5755..."), "icao": "ADE201", "callsign": "N994FE", "a": 8600, "b": 104, "p": [-98.99888, 30.93031], "s": 164, "t": ISODate("2016-02-09T02:33:01Z"), }

Page 24: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Which aircraft has the most observations?

"result": [ { "_id": "ADE201", "sum": 14373, "callsigns": [ "N994FE" ] }

{ "_id": ObjectId("5755..."), "icao": "ADE201", "callsign": "N994FE", "a": 8600, "b": 104, "p": [-98.99888, 30.93031], "s": 164, "t": ISODate("2016-02-09T02:33:01Z"), }

Page 25: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

ICAO aircraft collection $ mongoimport -d adsb -c aircraft --type csv --headerline aircraft_db.csv icao,regid,mdl,type,operator

000334,PU-PLS,ULAC,EDRA SUPER PETREL LS,PRIVATE OWNER

000D77,PU-VGA,WT9,WT-9 DYNAMIC,PRIVATE OWNER

000D82,PU-DCT,WT9,AEROSPOOL WT9 DYNAMIC,PRIVATE OWNER

001100,-,320,UNKNOWN / VARIOUS,CODE USED BY SEVERAL AIRCRAFT

001108,EJC-1108,AC90,GULFSTREAM 690D,EJERCITO DE COLOMBIA

001411,PU-BGC,RV9,AMATEUR VANS RV-9A,PRIVATE OWNER

002008,LV-S004,P208,TECNAM P-2008,PRIVATE OWNER

003106,PU-FUA,ULAC,AMATEUR GFLY,PRIVATE OWNER

004003,Z-WPB,B732,BOEING 737-2N0,AIR ZIMBABWE

...

Page 26: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

$lookup to find aircraft model > db.tincan.aggregate([

{ $group: { _id: "$icao", "sum": {$sum: 1}, "callsigns": {"$addToSet": "$callsign"} }}, { $sort: { "sum": -1 }}, { $limit: 1 }, { $lookup: { from:"aircraft", localField:"_id", foreignField:"icao", as:"description" }}

])

Page 27: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

$lookup to find aircraft model "result": [ { "_id": "ADE201", "sum": 14373, "callsigns": [ "N994FE" ], "description": [ { "_id": ObjectId("575074300cf625050f2e730e"), "icao": "ADE201", "regid": "N994FE", "mdl": "C208", "type": "CESSNA 208B GRAND CARAVAN" } ]

Page 28: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

FedEx

Page 29: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Which aircraft is seen the most number of days? > db.tincan.aggregate([

{ $group: { _id: {icao: "$icao", dayOfYear: {$dateToString: { format: "%Y%m%d", date: "$t"}}}}}, {$group:{ _id: "$_id.icao", sum: { $sum: 1 }}},

{ $sort:{ "sum": -1 }}, { $limit: 1 }, { $lookup: { from:"aircraft", localField:"_id", foreignField:"icao", as:"description" }}

])

Page 30: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Which aircraft is seen the most number of days?   "result": [     {       "_id": "A35969",       "sum": 63,       "description": [         {           "_id": ObjectId("5762e9cf6ecfc147a0503894"),           "icao": "A35969",           "regid": "N315AE",           "mdl": "B06",           "type": "BELL 206L-1 LONGRANGER II",           "operator": "AIR EVAC EMS"         }       ]

Page 31: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Page 32: MongoDB World 2016: The Best IoT Analytics with MongoDB

Business Intelligence Connector

Page 33: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

BI Connector • New in MongoDB 3.2 Enterprise Advanced • Mapping and transformation layer • Projects smaller parts of large data sets for reporting

Page 34: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

MongoDB Query Language SQL

BI Connector Data flow

MongoDBBI

Connector

Mappingmetadata

ApplicaAondata

{name: “Andrew”,address: {street:…}}

DocumentTableAnalyAcs&visualizaAon

Page 35: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

FedEx N994FE Flight Paths

Page 36: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Observations per Operator

Page 37: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Altitude vs Speed

•  Two predictable clusters: •  turbine aircraft at cruising

altitude •  piston aircraft at lower

altitude

Page 38: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Altitude vs Speed

•  Two predictable clusters: •  turbine aircraft at cruising

altitude •  piston aircraft at lower

altitude

Page 39: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Altitude vs Speed

•  Two predictable clusters: •  turbine aircraft at cruising

altitude •  piston aircraft at lower

altitude

• Outliers are Cessnas reporting 51,000+ ft

Page 40: MongoDB World 2016: The Best IoT Analytics with MongoDB

Spark

Page 41: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Spark Overview

•  fast, general data processing engine •  interactive shell • Scala, Java, Python • machine learning libraries (mllib) •  supports streaming • HDFS not required

Page 42: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Spark Connector

Connector

BSON Files

MapReduce & HDFS

Page 43: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Spark Connector Diagram

•  diagram

MongoDB Connector for Hadoop (with Spark Plug-in) https://github.com/mongodb/mongo-hadoop

MongoDB Connector for Spark https://github.com/mongodb/mongo-spark

Page 44: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Supervised Unsupervised

Classification •  Naive Bayes •  Support Vector

Machines •  Random Decision

Forests

Clustering •  K-means

Regression •  Linear •  Logistic

Dimensionality Reduction •  Principal Component

Analysis •  Singular Value

Decomposition

Spark Machine Learning

Page 45: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

K-Means Clustering

The K-Means algorithm aims to minimize the sum of squares of the distance between the points and the centroid of each cluster.

source: Lovro Iliassich, toptal.com

Page 46: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

K-Means Clustering

>>> mongo_rdd = sc.mongoRDD('mongodb://localhost:27017/adsb.tincan') OR specify a filter: >>> input_conf = {"mongo.job.input.format": "com.mongodb.hadoop.MongoInputFormat", "mongo.input.uri": "mongodb://localhost:27017/adsb.tincan", "mongo.input.query": '{"t":{"$lte":{"$date":1455494400000}}}' } >>> mongo_rdd = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, input_conf)

Page 47: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

K-Means Clustering >>> mongo_rdd = sc.mongoRDD('mongodb://localhost:27017/adsb.tincan') >>> mongo_rdd.first() {u'icao': u'A06690', u'a': 11975, u'b': 150, u'_id': ObjectId('5755bb862355da56d87895cf'), u't': datetime.datetime(2016, 2, 8, 5, 25, 4), u'p': [-98.41437, 30.29066], u's': 285, u'v': -1152}

Page 48: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

K-Means Clustering >>> mongo_rdd = sc.mongoRDD('mongodb://localhost:27017/adsb.tincan') >>> mongo_rdd.first() {u'icao': u'A06690', u'a': 11975, u'b': 150, u'_id': ObjectId('5755bb862355da56d87895cf'), u't': datetime.datetime(2016, 2, 8, 5, 25, 4), u'p': [-98.41437, 30.29066], u's': 285, u'v': -1152} >>> parsed_rdd = mongo_rdd.map(parseData) >>> parsed_rdd.first() [5, 25, 4, 1, 11975, 150, 285, -1152, -98.14857, 30.92651]

Page 49: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Choosing K

! = ! − !! !

!∈!!

!

!!!

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

0 20 40 60 80 100 120 140 160 180 200

k

Within Set Sum of Squared Error

WSSSE

Page 50: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Standard Scaling

! = ! − !

!

>>> parsed_rdd.first() [5, 25, 4, 1, 11975, 150, 285, -1152, -98.14857, 30.92651] >>> scaled_features.first() [-1.036, -1.1089, -0.2617, 0.6821, -0.8202, 0.4057, 0.8537, -1.6502, -0.6559, 0.6876]

Page 51: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

K-Means Clustering >>> k = 10 >>> clusters = KMeans.train(parsed_rdd, k, maxIterations=10, runs=1, initializationMode="random") >>> cluster_sizes = parsed_rdd.map(lambda e: clusters.predict(e)).countByValue() >>> cluster_sizes defaultdict(<type 'int'>, {0: 70122, 1: 350890, 2: 118596, 3: 104609, 4: 254759, 5: 175840, 6: 166789, 7: 68309, 8: 147826, 9: 495102})

Page 52: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Save Results Back to MongoDB def labelData(array):      result = {}      result['cluster'] = clusters.predict(array)      result['daystamp'] = str(array[0])      result['dayofweek'] = array[1]      result['hour'] = array[2]      result['minute'] = array[3]      result['second'] = array[4]      result['a'] = array[5]      result['b'] = array[6]      result['s'] = array[7]      result['v'] = array[8]      result['p'] = [ array[9], array[10] ]      return result

>>> labeled_rdd = parsed_rdd.map(labelData) >>> labeled_rdd.saveToMongoDB('mongodb://localhost:27017/adsb.labeled')

Page 53: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

K-Means Clustering >>> cluster_sizes defaultdict(<type 'int'>, {0: 70122, 1: 350890, 2: 118596, 3: 104609, 4: 254759, 5: 175840, 6: 166789, 7: 68309, 8: 147826, 9: 495102}) Hypothesis: largest cluster #9 is cruising altitude

Page 54: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Hypothesis: largest cluster #9 is cruising altitude adsb> db.labeled.aggregate([

{$match: {cluster:9}}, {$group: {_id: "summary", "avg_alt": {$avg:"$a"}, "min_alt": {$min:"$a"}, "max_alt": {$max:"$a"} }}])

Page 55: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Hypothesis: largest cluster #9 is cruising altitude   "result": [     {       "_id": "summary",       "avg_alt": 33630,       "min_alt": 30675,       "max_alt": 35825     }

Page 56: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Anomaly Detection

Page 57: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Anomaly!

• Plane appears 12,000ft out of nowhere

Page 58: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

planefinder.net video

Page 59: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Don't Worry, He's OK

•  4 days later…

Page 60: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Summary

MongoDB

Machine Learning

Devices

Historical Reporting

Real-time Dashboard

Page 61: MongoDB World 2016: The Best IoT Analytics with MongoDB

https://github.com/kerneljake/adsb

Page 62: MongoDB World 2016: The Best IoT Analytics with MongoDB

#MDBW16

Market Size

$36 Billion

Partners

1,000+

International Offices

15

Global Employees

575+

Downloads Worldwide

15,000,000+

Make a GIANT Impact www.mongodb.com/careers