![Page 1: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/1.jpg)
Mongo Analytics – Learn aggregation by example
Exploratory Analytics and Visualization using Flight Data
www.jsonstudio.com
![Page 2: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/2.jpg)
Analyzing Flight Data
• JSON data imported from CSV downloaded from:http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
• Will build aggregation pipelines and visualize data using JSON Studio (www.jsonstudio.com)
• Sample document for a flight:
{ "_id": { "$oid": "534205f61c479f6149a92709" }, "YEAR": 2013, "QUARTER": 1, "MONTH": 1, "DAY_OF_MONTH": 18, "DAY_OF_WEEK": 5, "FL_DATE": "2013-01-18”, "UNIQUE_CARRIER": "DL”, "AIRLINE_ID": 19790, "CARRIER": "DL", "TAIL_NUM": "N325US”, "FL_NUM": 1497, "ORIGIN_AIRPORT_ID": 14100, "ORIGIN_AIRPORT_SEQ_ID": 1410002, "ORIGIN_CITY_MARKET_ID": 34100, "ORIGIN": "PHL", "ORIGIN_CITY_NAME": "Philadelphia, PA", "ORIGIN_STATE_ABR": "PA”, "ORIGIN_STATE_FIPS": 42, "DEST_AIRPORT_ID": 13487, "DEST_AIRPORT_SEQ_ID": 1348702, "DEST_CITY_MARKET_ID": 31650, "DEST": "MSP", "DEST_CITY_NAME": "Minneapolis, MN", "DEST_STATE_ABR": "MN", "DEST_STATE_FIPS": 27, "DEST_STATE_NM": "Minnesota", "DEST_WAC": 63, "CRS_DEP_TIME": 805, "DEP_TIME": 758,
"DEP_DELAY": -7, "DEP_DELAY_NEW": 0, "DEP_DEL15": 0, "DEP_DELAY_GROUP": -1, "DEP_TIME_BLK": "0800-0859", "TAXI_OUT": 24, "WHEELS_OFF": 822, "WHEELS_ON": 958, "TAXI_IN": 4, "CRS_ARR_TIME": 1015, "ARR_TIME": 1002, "ARR_DELAY": -13, "ARR_DELAY_NEW": 0, "ARR_DEL15": 0, "ARR_DELAY_GROUP": -1, "ARR_TIME_BLK": "1000-1059", "CANCELLED": 0, "CANCELLATION_CODE": "",
"DIVERTED": 0, "CRS_ELAPSED_TIME": 190, "ACTUAL_ELAPSED_TIME": 184, "AIR_TIME": 156, "FLIGHTS": 1, "DISTANCE": 980, "DISTANCE_GROUP": 4, "CARRIER_DELAY": "", "WEATHER_DELAY": "", "NAS_DELAY": "", "SECURITY_DELAY": "", "LATE_AIRCRAFT_DELAY": "", "FIRST_DEP_TIME": "", "TOTAL_ADD_GTIME": "", "LONGEST_ADD_GTIME": "", "": "" }
![Page 3: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/3.jpg)
MongoDB aggregation steps/stages
• Grouping• Matching/filtering• Projection• Sorting• Unwind• Limit, skip• Added in 2.6– Out– Redact
![Page 4: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/4.jpg)
Who are the largest carriers?
![Page 5: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/5.jpg)
Which airports have the most cancellations?
![Page 6: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/6.jpg)
Which carriers are most at fault for cancellations?
![Page 7: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/7.jpg)
Arrival delays by distance
![Page 8: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/8.jpg)
Delays by distance by carrier
![Page 9: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/9.jpg)
Delays by distance by carrier – long haul only
![Page 10: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/10.jpg)
Order Does Matter
![Page 11: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/11.jpg)
An example for $unwind
• DB: companies• Collection: companies• Pipeline: movers
![Page 12: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/12.jpg)
Hub airports – try1
![Page 13: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/13.jpg)
Hub airports – try2
![Page 14: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/14.jpg)
Hub airports – try 3
{ $group: { _id: { ORIGIN: "$ORIGIN", CARRIER: "$CARRIER" }, count: { $sum: 1 } } }, { $project: { airport: "$_id.ORIGIN", carrier: "$_id.CARRIER", "count": 1 } }, { $match: { "count": { $gte: "$$hub_threshold" } } }, { $group: {
_id: { airport: "$airport" }, airlines: { $sum: 1 }, flights: { $sum: "$count" }, avg_airline: { $avg: "$count" }, max_airline: { $max: "$count" } } },
{ $project: { "airlines": 1, "flights": 1, "avg_airline": 1, "max_airline": 1, "avg_no_max": { $divide: [ { $subtract: [ "$flights", "$max_airline" ] }, "$airlines" ] } } },
{ $sort: { "flights": -1 } }
![Page 15: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/15.jpg)
Hub airports
![Page 16: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/16.jpg)
Visualizing route data (from/to & main routes)
{ $group: { _id: { UNIQUE_CARRIER: "$UNIQUE_CARRIER", ORIGIN: "$ORIGIN", DEST: "$DEST" }, count: { $sum: 1 } } },
{ $match: { "count": { $gt: "$$count_threshold" } } }
![Page 17: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/17.jpg)
Hub visualization (using routes – from/to, $$count=1, origin treemap)
![Page 18: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/18.jpg)
From-to Insensitive{ $group: { _id: { UNIQUE_CARRIER: "$UNIQUE_CARRIER", ORIGIN: "$ORIGIN",
DEST: "$DEST" }, count: { $sum: 1 } } }, { $match: { "count": { $gt: "$$count_threshold" } } },
{ $project: { _id_UNIQUE_CARRIER: "$_id.UNIQUE_CARRIER", "count": 1,
rroute: { $cond: [
{ $lt: [ { $cmp: [ "$_id.ORIGIN", "$_id.DEST" ] }, 0 ] }, { $concat: [ "$_id.ORIGIN", "$_id.DEST" ] }, { $concat: [ "$_id.DEST", "$_id.ORIGIN" ] }
] } } },
{ $group: { _id: { _id_UNIQUE_CARRIER: "$_id_UNIQUE_CARRIER", rroute: "$rroute" }, _sum_count: { $sum: "$count" } } }
![Page 19: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/19.jpg)
All routes treemap – airline fragmentation (rroute treemap)
![Page 20: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/20.jpg)
New in 2.6 aggregation
• Cursors• MaxTimeMS• $out• $redact• $let/$map• Misc - $cond with either object or array, set
expression, ..
![Page 21: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/21.jpg)
Quiz - NYC Airports
1. Import data – mongoimport (flights, airports, planes)2. Make join collection:
3. Query-Insert for all NYC data - ~500K flights per year originating in JFK/EWR/LGA
{ "Year": 2008, … "LateAircraftDelay": "NA", "to": { "city": "Detroit", "state": "MI", "lat": 42.21205889, "long": -83.34883583 }, "from": { "city": "Newark", "state": "NJ", "lat": 40.69249722, "long": -74.16866056 }, "aircraft": { "manufacturer": "EMBRAER", "model": "EMB-145EP", "year": 1997 }}
![Page 22: MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data](https://reader034.vdocuments.net/reader034/viewer/2022051610/549c987bac7959ba2a8b4780/html5/thumbnails/22.jpg)
NYC Flights – Quiz Questions• Of the three airports, who has the most flights?
– Nyc1
• Who has the most cancellations and highest cancellation ratio?– Nyc2
• Taxi in/out times?– Nyc3
• What about delays?– Nyc4
• How do delays differ by month?– Nyc5 + nyc5– (summer vs. winter / bubble size vs. y-axis)
• What about weather delays only? Which months are worse? Are the three airports equivalent?– Nyc7 + nyc7
• Where can I fly to if I work for Boeing and am very loyal (and on which aicraft)?– Nyc8 + map