sherlock demo 2019 v13 screenshots - nasabig data system + spark + jupyter • big data cluster...
TRANSCRIPT
![Page 1: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/1.jpg)
SHERL CKDATA WAREHOUSE
April 8, 2019
https://ntrs.nasa.gov/search.jsp?R=20190025090 2020-05-21T07:52:27+00:00Z
![Page 2: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/2.jpg)
Take-aways
SHERL CK
• Sherlock contains a valuable collection of flight, air traffic management, and weather data
• Sherlock is more than just a data archive
• Pique interest for follow-on workshops– Data visualization & analytics with MicroStrategy– Processing using the Big Data system from Jupyter Notebook
2
![Page 3: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/3.jpg)
Outline
SHERL CK
• What is Sherlock?• How are data stored? (Why should I care?)• Sherlock information and data access• Summary of archived data• Overviews and demos of resources
3
![Page 4: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/4.jpg)
• Platform for collection, processing, and archivingair traffic management data
• Sherlock resources include:– Sherlock home page New!– File download web UI Reorganized!– Data visualization & analytics New!– Jupyter Notebook on Big Data system New!– ATM Knowledge Graph (experimental)– Hue Browser– THREDDS Data Server– GeoServer
What is Sherlock?
SHERL CK
Demos
4
Backup slides
![Page 5: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/5.jpg)
How are data stored?(Why should I care?)
SHERL CK 5
![Page 6: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/6.jpg)
Where data stored?
SHERL CK
Operational Data Store (ODS)
Big Data System
File System• Linux file system, /home/data
• Traditional database
• Distributed data storage and processing
6
![Page 7: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/7.jpg)
How are data stored?
SHERL CK 7
FormatFunctionality
Select rows/columns of interest
View data before download
Download only the data you wantGood for small data sets
Good for large data setsAvailable from
File system /home/data
File download in web UITables and charts in web UI
Data visualization & analytics
Jupyter Notebook
Hue browser
File SystemFlat files
ODSTables
Big DataApache Parquet
NoNo
NoYes
No
YesYes
NoNo
No
No
YesYes
YesYes
No
YesYes
YesNo
Yes
NoNo
YesYes
No
No
NoNo
NoNo
Yes
Yes
![Page 8: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/8.jpg)
Sherlock information and data access
SHERL CK 8
![Page 9: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/9.jpg)
Sherlock resources
SHERL CK 9
https://atmweb.arc.nasa.gov/
![Page 10: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/10.jpg)
SHERL CK 10
Demo: Sherlock home page
![Page 11: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/11.jpg)
SHERL CK 11
![Page 12: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/12.jpg)
SHERL CK 12
![Page 13: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/13.jpg)
SHERL CK 13
![Page 14: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/14.jpg)
SHERL CK 14
![Page 15: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/15.jpg)
SHERL CK 15
![Page 16: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/16.jpg)
SHERL CK 16
![Page 17: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/17.jpg)
SHERL CK 17
![Page 18: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/18.jpg)
SHERL CK 18
![Page 19: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/19.jpg)
Summary of archived data
SHERL CK 19
![Page 20: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/20.jpg)
FAA SWIM data in Sherlock
SHERL CK 20
System Wide Information Management (SWIM) Program
• National Airspace System (NAS)-wide information system that
supports Next Generation Air Transportation System (NextGen)
goals
• Increased common situational awareness various stakeholders
• Single point of access for aviation data
– Producers of data publish it once
– Users access the information they need through a single connection
![Page 21: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/21.jpg)
FAA SWIM data in Sherlock
SHERL CK 21
Flight dataSTDDS/ASDE-X Surface dataTAIS TRACON data, including VFR flightsSFDPS En route flight dataTFMData NAS-wide flight data, flow constraintsTBFM Operational metering data
Airport dataAPDS Airport Data Service, Runway Visual Range infoNOTAM Notices to Airmen
Weather dataITWS Terminal convective weather
![Page 22: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/22.jpg)
Data in SherlockWeather dataCIWS Convective weather forecastsMETAR Airport current surface weather conditionsRR NOAA Rapid Refresh forecastsCCFP Simplified convective weather polygonsMETAR Airport weather reportsTAF Terminal Aerodrome ForecastWITI Weather Impacted Traffic IndexPIREP Pilot reports
Obsolete weather data (stored but no longer updated)RUC NOAA Rapid Update Cycle forecasts
SHERL CK 22
![Page 23: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/23.jpg)
Flight dataCTAS Text-based format, in Center/TRACON pairs
Obsolete flight data (stored but no longer updated)ASDI Flight data
Traffic managementATCSCC Strategic advisories
Facility reportsOPSNET Statistics
Data in Sherlock
SHERL CK 23
![Page 24: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/24.jpg)
Archived derived dataproduced by ATAC
SHERL CK 24
![Page 25: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/25.jpg)
Processed data from ATAC in Sherlock
Flight dataIFF Flight plan and track
EV Flight event
RD Flight summary
SHERL CK 25
![Page 26: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/26.jpg)
Legacy STARS (1/2014 to present)SWIM STARS (TAIS) (11/2017 to present)
Legacy HOST/ ERAM – from 1/2014SWIM ERAM (SFDPS) – from 5/2017
ARTS –available from 1/1/2014 ~ 11/2015 SWIM ASDE-X – from 1/2016
Analysis-ready track, flight plan, and metadata for 94 individual facilities
TRACON / Terminal
TRACON / Terminal
ARTCC / Enroute
ATCT / Surface
SHERL CK 26
![Page 27: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/27.jpg)
USA Merged data –available from 1/1/2014, *note – data prior to 1/16/2016 does not contain surface dataSHERL CK
End-to-end trajectories, flight plans, and meta data
27
![Page 28: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/28.jpg)
Performance reports/data in SherlockFacility reports Scope (Facility Type) DescriptionGo Arounds TRACON/Terminal Counts, runways, altitude, return time
Turns to Final TRACON/Terminal Overshoots, glideslope speed/altitude deviations, turn on angle
Fix Passing TRACON/Terminal Fix/Waypoint throughput
Runway Usage ATCT/Surface Runway throughput, arrival/departure rates
Taxi Time ATCT/Surface Taxi out, taxi in time
Instantaneous Counts ARTCC/Enroute Number of aircraft in sectors in 15 minute bins, summary statistics
Sector Stats ARTCC/Enroute Sector count, flight Time/distance, and transitions between sectors
Sector Activity ARTCC/Enroute Sector entries/exits/counts in 15 minute bins
Field10 Reroute ARTCC/Enroute Diversions and reroutes
NAS-wide reportsJet Airway Flow CONUS Jet Airway counts, flight time, and distance
Best Flight Plan CONUS Synthesized flight plan route closest to what was actually flown
CCFP Sector/ARTCC Coverage CONUS Percentage of Sector/ARTCC covered by CCFP convective weather (per 2hr)
CWAM Sector/ARTCC Coverage CONUS Percentage of Sector/ARTCC covered by CWAM convective weather (per 15min)
CCFP Jet Airway Coverage CONUS Percentage of Jet Airway covered by CCFP convective weather (per 15min)
SHERL CK 28
![Page 29: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/29.jpg)
Overviews and demos of resources
SHERL CK 29
![Page 30: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/30.jpg)
Sherlock resources
SHERL CK 30
https://atmweb.arc.nasa.gov/
![Page 31: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/31.jpg)
SHERL CK
https://atmweb.arc.nasa.gov/
31
![Page 32: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/32.jpg)
SHERL CK 32
Demo: File Download Web UI
![Page 33: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/33.jpg)
SHERL CK 33
![Page 34: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/34.jpg)
SHERL CK 34
![Page 35: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/35.jpg)
SHERL CK 35
![Page 36: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/36.jpg)
SHERL CK 36
![Page 37: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/37.jpg)
SHERL CK 37
![Page 38: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/38.jpg)
SHERL CK 38
![Page 39: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/39.jpg)
SHERL CK 39
![Page 40: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/40.jpg)
SHERL CK 40
![Page 41: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/41.jpg)
SHERL CK 41
![Page 42: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/42.jpg)
SHERL CK 42
![Page 43: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/43.jpg)
SHERL CK 43
![Page 44: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/44.jpg)
SHERL CK 44
![Page 45: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/45.jpg)
SHERL CK 45
![Page 46: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/46.jpg)
SHERL CK 46
![Page 47: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/47.jpg)
SHERL CK 47
![Page 48: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/48.jpg)
Data visualization & analytics
• What is MicroStrategy?:– Enterprise business intelligence (BI) application software – Allows users to create custom data tables and visualizations
• Sherlock data visualization and analytics with MicroStrategy:– Visualize Sherlock data without downloading– Visualize user-generated data on Sherlock’s Big Data system– Create custom visualizations for unique needs– Perform some basic data analytics
SHERL CK 48
![Page 49: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/49.jpg)
Data visualization
SHERL CK 49
![Page 50: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/50.jpg)
SHERL CK 50
Demo: Data Visualization & Analytics
![Page 51: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/51.jpg)
SHERL CK 51
![Page 52: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/52.jpg)
SHERL CK 52
![Page 53: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/53.jpg)
SHERL CK 53
![Page 54: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/54.jpg)
SHERL CK 54
![Page 55: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/55.jpg)
SHERL CK 55
![Page 56: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/56.jpg)
SHERL CK 56
![Page 57: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/57.jpg)
SHERL CK 57
![Page 58: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/58.jpg)
SHERL CK 58
![Page 59: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/59.jpg)
SHERL CK 59
![Page 60: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/60.jpg)
SHERL CK 60
![Page 61: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/61.jpg)
SHERL CK 61
![Page 62: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/62.jpg)
SHERL CK 62
![Page 63: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/63.jpg)
SHERL CK 63
![Page 64: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/64.jpg)
SHERL CK 64
![Page 65: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/65.jpg)
SHERL CK 65
![Page 66: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/66.jpg)
SHERL CK 66
![Page 67: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/67.jpg)
SHERL CK 67
![Page 68: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/68.jpg)
SHERL CK 68
Graph-structured queryable repository of disparate ATM data(Experimental, with limited data available)
![Page 69: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/69.jpg)
What is ATM Knowledge Graph?
SHERL CK
• Stored in a Graph Databaseo Not in traditional Operational Data Store (ODS)
• Accessible via:– Web Interface
• Query editor/executor• Visualization tool
– Programmatic API
– MicroStrategy
Subgraph describing a flight
• Highly-interconnected network-structured data store, where:o Nodes
• Represent ATM entities (flights, airports, facilities, aircraft, routes...)
• Store properties/data of entitieso Links represent interrelationships
69
![Page 70: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/70.jpg)
What is the value of ATM Knowledge Graph?
• Sherlock is not a unified database; it is a data repository– Cannot generally query across data tables or data sources
• Knowledge Graph merges/integrates/unifies data from multiple sources into one large graph structure to enable cross-source querying
• Result: You can: ü Query Sherlock as a unified databaseü Visualize and navigate through the data graphü Download integrated data
70
![Page 71: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/71.jpg)
• ASDI: Flight track data
• TFM Advisories: GDPs, reroutes, Ground Stops,…
• METAR/TAF: Airport weather observation & forecast data
• ERAM adaptations: NAS infrastructure data(facilities, routes, SIDs/STARs, fixes, airways, sectors,…)
• ASPM: Airport performance (traffic counts, delay stats,…)
• FAA Aircraft Registry: Aircraft Characteristics(registration, certification, ownership, aircraft & engine models)
• CAST/ICAO Aircraft Taxonomy: Aircraft Models and Manufacturers
• Airlines, Airport Terminals/Gates
Sherlock sources
What data are stored in ATM Knowledge Graph?
Non-Sherlock sources
Experimental: Currently very limited data in Knowledge Graph! (only July 2014 for ZNY)
71
![Page 72: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/72.jpg)
SHERL CK 72
Demo: ATM Knowledge Graph
![Page 73: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/73.jpg)
Flight UAL535 on 2014-07-15
Aircraft N589UA
Flight Plan
Departure Airport
Trajectory
Track PointTrack Points
![Page 74: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/74.jpg)
74
![Page 75: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/75.jpg)
Flight trajecotry
75
![Page 76: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/76.jpg)
Big Data system
SHERL CK 76
![Page 77: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/77.jpg)
Big Data system• What is a Big Data system?
– Built on commodity hardware– Massive storage for any kind of data– Enormous processing power– Ability to handle virtually limitless concurrent tasks
• Sherlock Big Data system– 32-node cluster in Building N233– SuperMicro Engineered System
• Total of 576 CPU Cores• Total of 800 TB Storage
– Cloudera distribution of HadoopSHERL CK 77
![Page 78: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/78.jpg)
Data sources currently on Big Data system
Facility flight dataIFF Flight plan and track
EV Flight event
RD Flight summary
USA merged flight dataIFF Flight plan and track
EV Flight event
RD Flight summary
SHERL CK 78
Please contact the Sherlock team if you would like to access a particular data source from our Big Data system.
![Page 79: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/79.jpg)
How to access Big Data
• Query Big Data • Hue browser
• Process data on Big Data system• Requires a cluster-computing framework
• Many to choose from!• Sherlock team recommends: Apache Spark
SHERL CK 79
![Page 80: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/80.jpg)
• Unified analytics engine for large-scale data processing
• Performs fault-tolerant distributed computing and parallel processing services on a cluster
• Supports 4 languages:– Scala (native)– Java (fast)– Python (easy)– R
80
What is Apache SparkTM?
SHERL CK
![Page 81: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/81.jpg)
Traditional1. Write code on local machine2. Zip code and scp to sherlock.arc.nasa.gov3. Run Spark submit job to execute code on Big Data system
Drawbacks:• Develop with a small local copy of data• Difficult to debug when deployed on the Big Data system
81
Big Data development workflow
SHERL CK
![Page 82: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/82.jpg)
Jupyter Notebook1. Develop code in web browser on sherlock.arc.nasa.gov2. Break code into segments3. Run and see output in-line
Advantages:• Code is running on Big Data system while developing• Debugging is easier in Jupyter than traditional work flow
82
Big Data development workflow
SHERL CK
![Page 83: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/83.jpg)
SHERL CK 83
![Page 84: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/84.jpg)
SHERL CK 84
Code cell input
On-the-fly inline plots
Jupyter Notebook• Open source web
application for interactive computing
• Execute code interactively in the browser
• View results and plots inline in the Notebook
![Page 85: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/85.jpg)
Machine learning use case: flight track clustering1. Join and filter track data from Big Data system2. Process data to generate features3. Detect and remove outliers4. Cluster the data using K-Means5. Evaluate resulting clustering and create plots
SHERL CK 85
![Page 86: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/86.jpg)
Outlier detection stepNumber of
Flights PlatformProcessing time
Hours Minutes
25,000Personal machine 1 26Big Data 0 13
50,000Personal machine crashedBig Data 0 32
Benefits of using Big Data system
• No need to copy hundreds of GB of data to user machine• Fast processing (distributed processing with powerful CPUs)
SHERL CK 86
![Page 87: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/87.jpg)
Big Data System + Spark + Jupyter
• Big Data Cluster provides enormous processing power
• Spark is a large scale distributed data processing engine and it processes large amount of data in memory
• Jupyter is a great tool to code, test, prototype, and share PySparkprograms
87
![Page 88: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/88.jpg)
SHERL CK 88
Demo: Jupyter Notebook for Big Data
![Page 89: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/89.jpg)
SHERL CK 89
![Page 90: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/90.jpg)
SHERL CK 90
![Page 91: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/91.jpg)
SHERL CK 91
![Page 92: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/92.jpg)
Coming soon …
SHERL CK 92
![Page 93: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/93.jpg)
Workshops
• MicroStrategyCreate custom visualizations and analytics
• Big Data systemImplement a machine learning use case usingSpark and SparkML in Jupyter Notebook
SHERL CK 93
![Page 94: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/94.jpg)
Acknowledgements• Management
Heather ArnesonAntony EvansPaul CobbKaren Gundy-Burlet
• DevelopersPallavi HegdeMichael La Scola
• Database & Big Data adminDat DuongEric Wang
• Data collection, archiving, monitoringJoe CisekPat O’Neal
• Windows, Linux adminMatt Ma
• Graph databaseRich Keller
• ATAC dataJohn SchadeKennis ChanCindy Wong(the other) Eric Wang
SHERL CK 94
![Page 95: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/95.jpg)
Home page:https://atmweb.arc.nasa.gov/
These slides:Home page à ABOUT à Overview
User support:[email protected]
SHERL CKDATA WAREHOUSE
![Page 96: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/96.jpg)
SHERL CK 96
![Page 97: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/97.jpg)
BACKUP SLIDES
SHERL CK 97
![Page 98: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/98.jpg)
What is Sherlock?
SHERL CK 98
![Page 99: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/99.jpg)
Architecture
SHERL CK 99
![Page 100: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/100.jpg)
How are data stored?
SHERL CK
File System• No insight into the data• Download full file
• Available from:– File system /home/data– File Download Pages in Web UI
Flat files
100
![Page 101: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/101.jpg)
How are data stored?
SHERL CK
Operational Data Store (ODS)• Select rows and columns of interest• View data before download• Download only the data you want• Good for small data sets• Available from:
– Tables and Charts in Web UI– Data Visualization & Analytics
Tables
101
![Page 102: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/102.jpg)
How are data stored?
SHERL CK
Big Data System• Select rows and columns of interest• View data before download• Download only the data you want• Good for large data sets• Available from:
– Hue Browser– Jupyter Notebook– Data Visualization & Analytics
ApacheParquet format
102
![Page 103: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/103.jpg)
Archived dataproduced by ATAC
SHERL CK 103
![Page 104: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/104.jpg)
Flight data and report products available in Sherlock1. Analysis-ready track, flight plan, and metadata for 94 individual facilities
(available on a next day basis)• Data types (IFF, EV, RD)
2. End-to-end trajectories, flight plans, and metadata available(available within 10 days)
3. Performance Reports(available daily for individual facilities)
4. Aggregated Trend Databases (STREND)(updated daily)
5. Traffic and weather coverage metrics (available in csv format)• Sector transition metrics• CCFP/CWAM Sector/ARTCC coverage• Sector/ARTCC counts by weather coverage• CCFP jet route coverage
6. Data completeness tool
SHERL CK 104
![Page 105: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/105.jpg)
MetaData examples: events in the EV fileEvent Type DescriptionEV_GOA Go-Around Event Occurs when go-around is detected
EV_HOFF Handoff Event Occurs when handoff is initiated, accepted and cancelled
EV_INIT Initialization Event Occurs at the beginning of a flight
EV_LND Landing Event Occurs when a flight crosses the threshold of its landing runway
EV_LOOP Looping Event Occurs when a flight track crosses over back on itself
EV_MOF Mode of Flight Event Occurs when mode of flight changes (e.g., level to descend)
EV_PASS Passing Event Occurs when a flight passes by a defined navigational element
EV_STOH Stop Handoff Event Occurs when handoff stops
EV_STOL Stop Loop Event Occurs when loop stops
EV_STOP Stop Event Occurs at the end of a flight
EV_TOC Top of Climb Event Occurs when a flight reaches initial cruise altitude
EV_TOD Top of Descent Event Occurs when a flight begins descent from cruise
EV_TOF Take Off Event Occurs when a flight crosses the threshold of its takeoff runway
EV_USER User Defined Event Occurs based on the metrics defined in the metrics file
EV_XING Crossing Event Occurs when a flight crosses an airspace volume boundary
SHERL CK 105
![Page 106: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/106.jpg)
Event type example
A vertical mode of flight event (EV_MOF) occurs when there is a change in the vertical profile of a flight. The processing detects any transition between any two of the three possible vertical states - Climb, Level, and Descent
SHERL CK 106
![Page 107: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/107.jpg)
Flight Processor
Flight Processor
/home/data/atac/SVAutoReports
xlsm, csv
Collector and
Parser
NASA Data Feed
Legacy /SWIM
Flight Processor
Flight Processor
Flight Processor
Bond /home/data/atac
(Live Bay)
tb0
/home/ transfer/atac
iff, ev, rd,
if0
Generalized data flow (data feed -> reports)
Report Generation
SHERL CK
STREND
107
![Page 108: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/108.jpg)
ERC
Max Overshoot
Angle at Intercept
Green = ground speed <180
Ground Speed / Attitude at intercept
Distance at Intercept 12.3nm
Arrival Runway
Direction of flight
FAF
Sherlock Performance Report DataTurn-to-final (TTF)
- Overshoots- Final approach path intercept
Facility report example: turn-to-final
SHERL CK 108
![Page 109: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/109.jpg)
Products Coming Soon
• Real-Time Merged Trajectories available in live format to downstream applications
• Automated Anomaly Detection of Airport Arrival Trajectories (SMARTNAS 2.4 NRA)
FAA SWIM Feed
(Solace)
NASA Collection and Forwarding
(ActiveMQ)
Transformation and
Normalization (Apache Nifi)
Real Time Flight Merging
(Apache Flink)
YOUR
APPLICATIONS
HERE
SHERL CK 109
![Page 110: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/110.jpg)
• IFF, EV, RD data file and report documentation and business rules can be found here:
https://atmjira.arc.nasa.gov:9443/conf/display/ctas/ATAC+File+Format+and+Reporting+Documentation
• Other questions or assistance:John Schade [email protected] , 408-736-2822
• Thank You!
Documentation
SHERL CK 110
![Page 111: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/111.jpg)
Big Data system
SHERL CK 111
![Page 112: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/112.jpg)
• Unified analytics engine for large-scale data processing
• Performs fault-tolerant distributed computing and parallel processing services on a cluster
• Supports 4 languages:– Scala (Native)– Java (Fast)– Python (Easy)– R (Working)
Distributed Workflow
Driver Node
ClusterManager
Task Schedule
Resource Allocation
Resource Request
Partitioned Data
Task Schedule
Resource Allocation
Worker Node(executor)
112
What is Apache SparkTM
Worker Node(executor)
![Page 113: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/113.jpg)
GeoServer
SHERL CK 113
![Page 114: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/114.jpg)
• Java-based software server that allows users to view, edit, and share geospatial data
• Enables users to visualize airspace features from latest adaptation data
• Built with the open source GeoServer tool, which includes the Postgisdatabase
• Connects to the vast amounts of airspace data stored in Sherlock as well as polygon representations of the CWIS data
• Users can form complex queries, view the data, and export it in many digital and image-based formats using GeoServer web interface
https://geowiz.arc.nasa.gov/geoserver/web/?wicket:bookmarkablePage=:org.geoserver.web.demo.MapPreviewPage
114
GeoServer
![Page 115: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/115.jpg)
GeoServer
Example: Fixes around LAX on Google Earth
115
Credit Google, Map data USGS
![Page 116: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/116.jpg)
THREDDS Data Server (TDS)
SHERL CK 116
![Page 117: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/117.jpg)
THREDDS Data Server(TDS)
• The THREDDS Data Server (TDS) is a web server that provides metadata and data access for scientific datasets, using OPeNDAP, OpenGIS Consortium(OGC), Web Map Service(WMS), Web Coverage Service(WCS), HTTP, and other remote data access protocols.
• Sherlock Data Warehouse stores parsed weather data -CIWS/RUC/CWAM in a variety of binary Gridded formats(NetCDF, Grib1, Grib2, HDF5) in THREDDS server which have their own mechanisms for parsing, viewing and downloading.
117
![Page 118: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/118.jpg)
THREDDS Data Server(TDS) • Sherlock has access to all binary datasets.• Users may query by hand or write scripts to query across large
amounts of data and export the data in many digital image and formats
• Some users are interested in the content of binary weather files such as the rapid refresh and CIWS data. For example, they might want to find a RR file with high winds over a certain fix. The UCAR THREDDS server provides the ability to query binary data written in standard scientific formats such as HDF5 or GRIB..https://geowiz.arc.nasa.gov/thredds/catalog.html
118
![Page 119: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/119.jpg)
THREDDS Data Server (TDS)
119
• Open-source THREDDS software reads weather datasets (CIWS, RR,CWAM)
• Supported binary formats -NetCDF, Grib1, Grib2, HDF5
• WMS query, visualization, export
![Page 120: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/120.jpg)
TDS Architecture
120
![Page 121: Sherlock Demo 2019 v13 ScreenShots - NASABig Data System + Spark + Jupyter • Big Data Cluster provides enormous processing power • Spark is a large scale distributed data processing](https://reader034.vdocuments.net/reader034/viewer/2022042220/5ec63341f20e1a7fde616199/html5/thumbnails/121.jpg)
TDS
121