more%than%a%buzzword:%% big%datain%the ...nemc.us/docs/2015/presentations/wed-data...
TRANSCRIPT
![Page 1: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/1.jpg)
More Than A Buzzword: Big Data in the Environmental Arena 2015 Na>onal Environmental Monitoring Conference | July 15, 2015
Brooke Roecker Senior Environmental Data Analyst Mark Packard, PG, CPG President/CEO
![Page 2: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/2.jpg)
Presenta>on Outline
• Big Data Defined • Environmental Data Past &
Present
• Today’s Tools and Approaches
• Example Projects
• Future Considera>ons
2
![Page 3: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/3.jpg)
Big Data Defined: (yet again) "Big data is high volume, high velocity, and/or high variety informa>on assets that require new forms of processing to enable enhanced decision making, insight discovery and process op>miza>on.“ Laney, Douglas. "The Importance of 'Big Data': A Definition". Gartner. Retrieved 21 June 2012.
Big Data in Environmental Monitoring/Remedia>on
3
![Page 4: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/4.jpg)
Big Data Defined: (yet again) "Big data is high volume, high velocity, and/or high variety informa>on assets that require new forms of processing to enable enhanced decision making, insight discovery and process op>miza>on.“ Laney, Douglas. "The Importance of 'Big Data': A Definition". Gartner. Retrieved 21 June 2012.
Big Data in Environmental Monitoring/Remedia>on
4
![Page 5: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/5.jpg)
5
• Velocity: high frequency data • Variety: mixed data/a^ributes • Volume: very large datasets
• VERACITY: Accuracy of data
Diya Soubra, “The 3Vs that define Big Data”. http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data
![Page 6: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/6.jpg)
Where have we come from?
• Hand-‐wri^en field logs
• Text files • Spreadsheets • Simple Reports
6
![Page 7: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/7.jpg)
Where are we today?
• Older technologies remain • Database storage • Out of the box storage/analysis tools
– EQuIS™ – ENFOS – Locus – Project Portal™
7
![Page 8: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/8.jpg)
Where are we today?
• Older technologies remain • Database storage • Out of the box storage/analysis tools
– EQuIS™ – ENFOS – Locus – Project Portal™
• Limita>ons? • What data are you not managing/analyzing?
8
![Page 9: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/9.jpg)
“Bigger” data tools available today
• High frequency advancements: – EQuIS Live – Project Portal Analy>cs Module
• Analysis and modeling tools – Spa>al: ArcGIS, EVS – Visualiza>on: Tableau
• Custom scrip>ng (R, Python, T-‐SQL…) – SSAS, Weka – MatPlotLib (Python), ggplot2 (R)
9
![Page 10: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/10.jpg)
Project Examples
10
![Page 11: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/11.jpg)
Project: Surface Water Monitoring • High velocity data, 5-‐minute
intervals • Teledyne ISCO samplers • Historical data archived in raw
MS Excel files Challenge: • Dataset too large for “Big
picture” trends • Storing/archiving data long-‐term • Centralized access for project
team
11
![Page 12: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/12.jpg)
Project: Surface Water Monitoring Solu>on: • Streamlined data resampling & import rou>ne
– Resample from 5-‐minute to 12-‐hour averages (or totals)
12
Raw Data Resampled Data
![Page 13: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/13.jpg)
Project: Surface Water Monitoring Solu>on: • Data available to project team via Project Portal
– Environmental Database module for resampled data – Documents module for raw data
13
![Page 14: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/14.jpg)
Project: RAD Site Monitoring Challenge: • High volume, high velocity
sensor data w/ telemetry • Automated database storage • Visual analysis of high volume
weather data
14
![Page 15: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/15.jpg)
Project: RAD Site Monitoring
Solu>on: • Automated data import
via web upload • Database available to
field staff via Project Portal
• Wind rose graphics to visualize data via EnviroInsite
• QA of erroneous data points (par>culates)
15
![Page 16: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/16.jpg)
Project: Mine Tunnel Monitoring
• High volume, high velocity data – On-‐site sensor data from
PLC system – Public big data streams
Challenge: • Centralized database
storage • Real-‐>me data access • Real-‐>me no>fica>ons/
alarms
See next slide
16
![Page 17: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/17.jpg)
Project: Mine Tunnel Monitoring
Solu>on: • Generic database design
op>mized for high frequency data • Predic>ve trend modeling
calcula>ons
• Data available via Project Portal • Email alerts when incoming data
parameters out of spec.
17
![Page 18: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/18.jpg)
Project: O&M Site Monitoring High velocity, automated SVE and GW treatment systems
Challenge: • Centralized storage • Centralized
monitoring • System
troubleshoo>ng…
18
Groundwater Treatment System
![Page 19: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/19.jpg)
Project: O&M Site Monitoring
19
System Solu>on: • Mul>variate analysis to review system variables • Secondary analysis to iden>fy fluctua>ons
![Page 20: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/20.jpg)
What’s Next?
• Use of emerging technologies: – Distributed data sourcing
• Hadoop HDFS • NoSQL
– Distributed processing • Batch processing (MapReduce, Apache Hive) • Real-‐>me processing/streaming (Cloudera Impala, Apache 54)
• PolyBase (cross-‐querying HDFS and SQL Server)
20
![Page 21: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/21.jpg)
Summary / Key Takeaways • Free big data – go out and use it! • Big data to inves>gate the unknown • Greater project intelligence & decision making
21
linke
din.
com
/pul
se/a
uto-
indu
stry
-bor
ed-b
ig-d
ata-
bria
n-pa
sch
![Page 22: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%](https://reader035.vdocuments.net/reader035/viewer/2022071223/607f025528baf75f63483e8d/html5/thumbnails/22.jpg)
Brooke Roecker [email protected] Mark Packard, PG, CPG [email protected]
Thank you!
Contributors: Myles Hook, ddms Jon Turner, ddms Heidi Gaedy, ddms Ed Larson, ddms
Angela Remer, ddms Emily Mulford, Earthsov Inc.