gregory leptoukh’s contributions to informatics€¦ · the other data problem: usability •data...
TRANSCRIPT
Bridging Informatics and Earth Science: a Look at Gregory Leptoukh’s Contributions
Past, present (and future)
https://ntrs.nasa.gov/search.jsp?R=20130011027 2020-06-17T16:30:11+00:00Z
Outline
1. Early Contributions
2. Recent Work
3. Unfinished business...
Science background...
• Tbilisi State University – 1975: M.S. in Theoretical Physics
– 1985: Ph. D. in Cosmic Ray Physics*
– Monte-Carlo simulation of cosmic ray propagation
• 1994: North Carolina State – Molecular dynamics computer simulation
• 1997: NASA, Goddard Distributed Active Archive Center – Data browser for SeaWiFS
*Includes work at Moscow State University and Moscow Institute of Theoretical and Experimental Physics
In the beginning, size was the issue...
Volume Requirements
40 GB/day (!)
vs.
Distribution Capacity
Approach: Reduce volume of useless bits flowing out
Greg was always out front, informatics-wise...
Email from Greg, circa 1998
The Other Data Problem: Usability
• Data in Hierarchical Data Format (HDF)
– Also with an HDF-EOS layer on top
• Required use of an API (C or Fortran) to read data
• Terminology
– Data variable names
– Stride, offset, grid, swath, fill value, SDS, …
Leptoukh – Kaufman Collaboration
• Yoram wanted to explore MODIS
Aerosol data without having to
download data or write code
• Kick-started the MODIS Online
Visualization and Analysis
System (MOVAS)
– Seed funding
– Initial requirements
– Became Giovanni Yoram Kaufman 1948 - 2006
The Giovanni Way
DO SCIENCE
Jan
Feb
Mar
May
Jun
Apr
Exploration
Use the best data for the final analysis
Write the paper
Initial Analysis
Derive conclusions
Submit the paper
Jul
Aug
Sep
Oct
Find data
Retrieve high volume data
Learn formats + develop readers
Extract parameters
Perform spatial + other subsetting
Identify quality + other flags and constraints
Perform filtering/masking
Develop analysis + visualization
Accept/discard/get more data
Pre-Science
The Giovanni Way
Web-based tools like Giovanni let scientists shrink time needed for pre-
science preliminary tasks: data discovery, access, manipulation,
visualization, and basic statistical analysis.
DO SCIENCE
Submit the paper
Minutes
Web-based Services: Jan
Feb
Mar
May
Jun
Apr
Days for exploration
Use the best data for the final analysis
Write the paper
Derive conclusions
Exploration
Use the best data for the final analysis
Write the paper
Initial Analysis
Derive conclusions
Submit the paper
Jul
Aug
Sep
Oct
The Giovanni Way:
Read Data
Subset Spatially
Filter Quality
Reformat
Analyze
Explore
Reproject
Visualize
Extract Parameter Gio
vann
i
Scientists have more time to do science!
Find data
Retrieve high volume data
Learn formats + develop readers
Extract parameters
Perform spatial + other subsetting
Identify quality + other flags and constraints
Perform filtering/masking
Develop analysis + visualization
Accept/discard/get more data
Pre-Science
DO SCIENCE
User-Driven Development
Add <Feature>?
User-Driven Development
User-Driven Development
Add <Feature>?
User-Driven Development
User-Driven Development
Add <Feature>?
Some Giovanni Features
Visualization & Analysis
• Time-averaged map
• Area-averaged Time series
• Hovmoller
• Animation
• Vertical profile
• Vertical cross-section
• Climatology + Anomaly
Comparison Features
• Scatterplot + linear regresion
• Difference Map
• Difference Time Series
• Correlation Map
• Map Overlays
Other Features
• Download data: netCDF, ASCII
• Create KML file
• Show lineage
Basic Visualization: Time-averaged Maps
Dust Storm, 30 Apr 2003
Exploring in time: Area-Averaged Time Series
Dust Storm, 30 Apr 2003
Multi-sensor Analysis: X-Y Scatterplot
How well do they correlate?
AirNow PM 2.5 vs. MODIS Terra Aerosol Optical Depth 01 Aug 2007 to 05 Aug 2007
Exploring Correlation with Correlation Maps: MODIS Aerosols, Aqua v. Terra
Decorrelation
Feature: Download Data
• Giovanni as workflow tool
• DIY publication-ready figures
Feature: DIY OMI Gridded Products
Feature: Lineage
DLR
Standards Support
• Giovanni Output – Download Formats
• Keyhole Markup Language
• netCDF / CF1
– OGC Service Formats
• Web Map Service
• Web Coverage Service
• Giovanni Input – Web Coverage Service
– OPeNDAP
WCS Server
Web Portal
NASA
Giovanni
OMI data
WCS
WMS
Giovanni Impact
• Science Research
• Applications: – Air Quality – Agriculture – Water Resources
• Education: Data-enhanced Investigations for Climate Change Education (DICCE)
• Informatics: ...coming up next
0
50
100
150
200
2004 2005 2006 2007 2008 2009 2010 2011 2012
Publications
Giovanni as Informatics Laboratory
• Does it make it “too easy” to get results?
– Spurious comparisons in multi-sensor analysis?
• Motivates attention to documentation
– Provenance and citation of dynamic results
– Documentation for variables
Recent Work
• Multi-Sensor Analysis
• Community Work
• Collaboration Enablement
Multi-Sensor Analysis
• Multi-sensor Data Synergy Advisor (MDSA)
• AeroStat
MDSA Problem Statement
Same measurement (Aerosol Optical Depth)
Same space & time
Different provenance Different results – why?
MODIS MERIS
MDSA & Semantic Web Adoption
Collaborators @ RPI: Peter Fox, Stephan Zednik, Patrick West, + students
MDSA provides users with key info about similarities and known issues when comparing or merging datasets
MDSA provides users with key info about similarities and known issues when comparing or merging datasets
MDSA provides users with key info about similarities and known issues when comparing or merging datasets
MDSA provides users with key info about similarities and known issues when comparing or merging datasets
AeroStat Goals
• Compare and combine aerosol data
• Compare aerosol satellite with ground truth
• Merge aerosol measurements from multiple instruments
• Adjust for inter-instrument biases when merging
Collaborators: C. Ichoku, R.Kahn, L. Remer, R. Levy, et al. @GSFC; David Lary, et al. @ UT-Dallas
AeroStat System
No QC QC+Bias Adjustment
Quality-filtered
Aerostat Problem Statement
MODIS Deep Blue MODIS Dark Target
Terra
Aqua
MISR
Merged Maps: 1 Mar 2003
2 Mar 2003
3 Mar 2003
4 Mar 2003
5 Mar 2003
Collaborations in Informatics
• Community Giovanni Portals – Northern Eurasian Earth Science Partnership Initiative
(NEESPI)
• Community Participation – Earth and Space Science Informatics (ESSI) – Earth Science Information Partners (ESIP) – NASA Earth Science Data Systems Working Group
(ESDSWG) – GEWEX Aerosol Panel
• Mentor for interns and fellows • Collaboration Features
– gSocial
Collaborative Annotation with gSocial*
• Results
– Annotation of results graphics
– Discussion of results
– Reproducing results
• Designed for AeroStat, but…
• …Can be integrated with other REST based services
*Implemented by Daniel DaSilva, NASA/GSFC
Annotating the result’s graphic
Discussing the Result
Discussing the Result
Reproducing the Result
Unfinished Business...
• Underway
– Data Quality
– Giovanni-4: Faster, better, cheaper
• Still nascent...
– myGiovanni
Data Quality
• Characterizing Quality
– Variations in bias
– Quality needs for different user types
– Additional quality dimensions
• Collaborations on Quality
– QA4EO
– ESIP Information Quality Cluster
• Now chaired by Brent Maddux (U. Wisconsin)
Characterizing Quality with Quality Labels
Generated for a request for 20-90 deg N, 0-180 deg E
Spatial Completeness MODIS Aqua AOD Average Daily Spatial Coverage
by Region + Season
MODIS Aqua AOD
DJF MAM JJA SON
Global 38% 42% 45% 41%
Arctic 0% 5% 19% 4%
Subarctic 3% 26% 49% 25%
N Temperate 43% 43% 51% 52%
Tropics 46% 48% 49% 44%
S Temperate 45% 59% 60% 49%
Subantarctic 32% 17% 10% 24%
Antarctic 5% 0% 0% 1%
This table and chart is Quality Evidence for the Spatial Completeness (Quality Property) of MODIS Aqua Dataset
Giovanni-4: Next Generation
• Emphasis on data exploration
• Collaboration with end users...
Exploring Data: Interactive Scatterplot
53
Zoom in on scatterplot
54
Examine a subregion
55
Current G3 Performance...
for > 1 year of MODIS Daily
AOD
Initial G3-G4 Performance Tests
Participatory Design: myGiovanni
GES DISC
other data center
fetch data
mask data
analyze data
visualize data
select
other data center
Participatory Design: myGiovanni
GES DISC
other data center
fetch data
mask data
analyze data
visualize data
select
other data center
my portal
Participatory Design: myGiovanni Participatory Design: myGiovanni
myGiovanni
GES DISC
other data center
fetch data
mask data
analyze data
visualize data
select
other data center
my portal
my palette
Participatory Design: myGiovanni
myGiovanni
GES DISC
other data center
my data
fetch data
mask data
analyze data
visualize data
select
other data center
my portal
my mask
my palette
Participatory Design: myGiovanni
myGiovanni
GES DISC
other data center
my data
fetch data
mask data
analyze data
visualize data
select
other data center
my portal
my mask
my palette
my data
Participatory Design: myGiovanni
myGiovanni
GES DISC
other data center
my data
fetch data
mask data
analyze data
visualize data
select
other data center
my portal
my mask
my palette
my data
my algorithm
Participatory Design: myGiovanni
Charge to the Informatics Community
• Carry on:
– Multi-sensor Data Analysis
– Data Quality
– Participatory design
Charge to the Informatics Community
• Carry on:
– Multi-sensor Data Analysis
– Data Quality
– Participatory design
• Build Bridges
– Data and Science
– Informatics and Earth Science
http://qa4eo.org/workshop_harwell11.html
Greg at QA4EO Meeting
Thank you