gregory leptoukh’s contributions to informatics€¦ · the other data problem: usability •data...

66
Bridging Informatics and Earth Science: a Look at Gregory Leptoukh’s Contributions Past, present (and future) https://ntrs.nasa.gov/search.jsp?R=20130011027 2020-06-17T16:30:11+00:00Z

Upload: others

Post on 10-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Bridging Informatics and Earth Science: a Look at Gregory Leptoukh’s Contributions

Past, present (and future)

https://ntrs.nasa.gov/search.jsp?R=20130011027 2020-06-17T16:30:11+00:00Z

Page 2: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Outline

1. Early Contributions

2. Recent Work

3. Unfinished business...

Page 3: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Science background...

• Tbilisi State University – 1975: M.S. in Theoretical Physics

– 1985: Ph. D. in Cosmic Ray Physics*

– Monte-Carlo simulation of cosmic ray propagation

• 1994: North Carolina State – Molecular dynamics computer simulation

• 1997: NASA, Goddard Distributed Active Archive Center – Data browser for SeaWiFS

*Includes work at Moscow State University and Moscow Institute of Theoretical and Experimental Physics

Page 4: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

In the beginning, size was the issue...

Volume Requirements

40 GB/day (!)

vs.

Distribution Capacity

Approach: Reduce volume of useless bits flowing out

Page 5: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Greg was always out front, informatics-wise...

Email from Greg, circa 1998

Page 6: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

The Other Data Problem: Usability

• Data in Hierarchical Data Format (HDF)

– Also with an HDF-EOS layer on top

• Required use of an API (C or Fortran) to read data

• Terminology

– Data variable names

– Stride, offset, grid, swath, fill value, SDS, …

Page 7: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Leptoukh – Kaufman Collaboration

• Yoram wanted to explore MODIS

Aerosol data without having to

download data or write code

• Kick-started the MODIS Online

Visualization and Analysis

System (MOVAS)

– Seed funding

– Initial requirements

– Became Giovanni Yoram Kaufman 1948 - 2006

Page 8: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

The Giovanni Way

DO SCIENCE

Jan

Feb

Mar

May

Jun

Apr

Exploration

Use the best data for the final analysis

Write the paper

Initial Analysis

Derive conclusions

Submit the paper

Jul

Aug

Sep

Oct

Find data

Retrieve high volume data

Learn formats + develop readers

Extract parameters

Perform spatial + other subsetting

Identify quality + other flags and constraints

Perform filtering/masking

Develop analysis + visualization

Accept/discard/get more data

Pre-Science

Page 9: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

The Giovanni Way

Web-based tools like Giovanni let scientists shrink time needed for pre-

science preliminary tasks: data discovery, access, manipulation,

visualization, and basic statistical analysis.

DO SCIENCE

Submit the paper

Minutes

Web-based Services: Jan

Feb

Mar

May

Jun

Apr

Days for exploration

Use the best data for the final analysis

Write the paper

Derive conclusions

Exploration

Use the best data for the final analysis

Write the paper

Initial Analysis

Derive conclusions

Submit the paper

Jul

Aug

Sep

Oct

The Giovanni Way:

Read Data

Subset Spatially

Filter Quality

Reformat

Analyze

Explore

Reproject

Visualize

Extract Parameter Gio

vann

i

Scientists have more time to do science!

Find data

Retrieve high volume data

Learn formats + develop readers

Extract parameters

Perform spatial + other subsetting

Identify quality + other flags and constraints

Perform filtering/masking

Develop analysis + visualization

Accept/discard/get more data

Pre-Science

DO SCIENCE

Page 10: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

User-Driven Development

Add <Feature>?

Page 11: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

User-Driven Development

Page 12: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

User-Driven Development

Add <Feature>?

Page 13: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

User-Driven Development

Page 14: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

User-Driven Development

Add <Feature>?

Page 15: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Some Giovanni Features

Visualization & Analysis

• Time-averaged map

• Area-averaged Time series

• Hovmoller

• Animation

• Vertical profile

• Vertical cross-section

• Climatology + Anomaly

Comparison Features

• Scatterplot + linear regresion

• Difference Map

• Difference Time Series

• Correlation Map

• Map Overlays

Other Features

• Download data: netCDF, ASCII

• Create KML file

• Show lineage

Page 16: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Basic Visualization: Time-averaged Maps

Dust Storm, 30 Apr 2003

Page 17: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Exploring in time: Area-Averaged Time Series

Dust Storm, 30 Apr 2003

Page 18: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Multi-sensor Analysis: X-Y Scatterplot

How well do they correlate?

AirNow PM 2.5 vs. MODIS Terra Aerosol Optical Depth 01 Aug 2007 to 05 Aug 2007

Page 19: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Exploring Correlation with Correlation Maps: MODIS Aerosols, Aqua v. Terra

Decorrelation

Page 20: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Feature: Download Data

• Giovanni as workflow tool

• DIY publication-ready figures

Page 21: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Feature: DIY OMI Gridded Products

Page 22: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Feature: Lineage

Page 23: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

DLR

Standards Support

• Giovanni Output – Download Formats

• Keyhole Markup Language

• netCDF / CF1

– OGC Service Formats

• Web Map Service

• Web Coverage Service

• Giovanni Input – Web Coverage Service

– OPeNDAP

WCS Server

Web Portal

NASA

Giovanni

OMI data

WCS

WMS

Page 24: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Giovanni Impact

• Science Research

• Applications: – Air Quality – Agriculture – Water Resources

• Education: Data-enhanced Investigations for Climate Change Education (DICCE)

• Informatics: ...coming up next

0

50

100

150

200

2004 2005 2006 2007 2008 2009 2010 2011 2012

Publications

Page 25: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Giovanni as Informatics Laboratory

• Does it make it “too easy” to get results?

– Spurious comparisons in multi-sensor analysis?

• Motivates attention to documentation

– Provenance and citation of dynamic results

– Documentation for variables

Page 26: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Recent Work

• Multi-Sensor Analysis

• Community Work

• Collaboration Enablement

Page 27: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Multi-Sensor Analysis

• Multi-sensor Data Synergy Advisor (MDSA)

• AeroStat

Page 28: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

MDSA Problem Statement

Same measurement (Aerosol Optical Depth)

Same space & time

Different provenance Different results – why?

MODIS MERIS

Page 29: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

MDSA & Semantic Web Adoption

Collaborators @ RPI: Peter Fox, Stephan Zednik, Patrick West, + students

Page 30: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

MDSA provides users with key info about similarities and known issues when comparing or merging datasets

Page 31: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

MDSA provides users with key info about similarities and known issues when comparing or merging datasets

Page 32: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

MDSA provides users with key info about similarities and known issues when comparing or merging datasets

Page 33: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

MDSA provides users with key info about similarities and known issues when comparing or merging datasets

Page 34: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

AeroStat Goals

• Compare and combine aerosol data

• Compare aerosol satellite with ground truth

• Merge aerosol measurements from multiple instruments

• Adjust for inter-instrument biases when merging

Collaborators: C. Ichoku, R.Kahn, L. Remer, R. Levy, et al. @GSFC; David Lary, et al. @ UT-Dallas

Page 35: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

AeroStat System

No QC QC+Bias Adjustment

Quality-filtered

Page 36: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Aerostat Problem Statement

MODIS Deep Blue MODIS Dark Target

Terra

Aqua

MISR

Page 37: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Merged Maps: 1 Mar 2003

Page 38: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

2 Mar 2003

Page 39: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

3 Mar 2003

Page 40: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

4 Mar 2003

Page 41: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

5 Mar 2003

Page 42: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Collaborations in Informatics

• Community Giovanni Portals – Northern Eurasian Earth Science Partnership Initiative

(NEESPI)

• Community Participation – Earth and Space Science Informatics (ESSI) – Earth Science Information Partners (ESIP) – NASA Earth Science Data Systems Working Group

(ESDSWG) – GEWEX Aerosol Panel

• Mentor for interns and fellows • Collaboration Features

– gSocial

Page 43: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Collaborative Annotation with gSocial*

• Results

– Annotation of results graphics

– Discussion of results

– Reproducing results

• Designed for AeroStat, but…

• …Can be integrated with other REST based services

*Implemented by Daniel DaSilva, NASA/GSFC

Page 44: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Annotating the result’s graphic

Page 45: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Discussing the Result

Page 46: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Discussing the Result

Page 47: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Reproducing the Result

Page 48: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Unfinished Business...

• Underway

– Data Quality

– Giovanni-4: Faster, better, cheaper

• Still nascent...

– myGiovanni

Page 49: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Data Quality

• Characterizing Quality

– Variations in bias

– Quality needs for different user types

– Additional quality dimensions

• Collaborations on Quality

– QA4EO

– ESIP Information Quality Cluster

• Now chaired by Brent Maddux (U. Wisconsin)

Page 50: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Characterizing Quality with Quality Labels

Generated for a request for 20-90 deg N, 0-180 deg E

Page 51: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Spatial Completeness MODIS Aqua AOD Average Daily Spatial Coverage

by Region + Season

MODIS Aqua AOD

DJF MAM JJA SON

Global 38% 42% 45% 41%

Arctic 0% 5% 19% 4%

Subarctic 3% 26% 49% 25%

N Temperate 43% 43% 51% 52%

Tropics 46% 48% 49% 44%

S Temperate 45% 59% 60% 49%

Subantarctic 32% 17% 10% 24%

Antarctic 5% 0% 0% 1%

This table and chart is Quality Evidence for the Spatial Completeness (Quality Property) of MODIS Aqua Dataset

Page 52: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Giovanni-4: Next Generation

• Emphasis on data exploration

• Collaboration with end users...

Page 53: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Exploring Data: Interactive Scatterplot

53

Page 54: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Zoom in on scatterplot

54

Page 55: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Examine a subregion

55

Page 56: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Current G3 Performance...

for > 1 year of MODIS Daily

AOD

Page 57: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Initial G3-G4 Performance Tests

Page 58: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Participatory Design: myGiovanni

GES DISC

other data center

fetch data

mask data

analyze data

visualize data

select

other data center

Participatory Design: myGiovanni

Page 59: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

GES DISC

other data center

fetch data

mask data

analyze data

visualize data

select

other data center

my portal

Participatory Design: myGiovanni Participatory Design: myGiovanni

Page 60: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

myGiovanni

GES DISC

other data center

fetch data

mask data

analyze data

visualize data

select

other data center

my portal

my palette

Participatory Design: myGiovanni

Page 61: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

myGiovanni

GES DISC

other data center

my data

fetch data

mask data

analyze data

visualize data

select

other data center

my portal

my mask

my palette

Participatory Design: myGiovanni

Page 62: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

myGiovanni

GES DISC

other data center

my data

fetch data

mask data

analyze data

visualize data

select

other data center

my portal

my mask

my palette

my data

Participatory Design: myGiovanni

Page 63: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

myGiovanni

GES DISC

other data center

my data

fetch data

mask data

analyze data

visualize data

select

other data center

my portal

my mask

my palette

my data

my algorithm

Participatory Design: myGiovanni

Page 64: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Charge to the Informatics Community

• Carry on:

– Multi-sensor Data Analysis

– Data Quality

– Participatory design

Page 65: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Charge to the Informatics Community

• Carry on:

– Multi-sensor Data Analysis

– Data Quality

– Participatory design

• Build Bridges

– Data and Science

– Informatics and Earth Science

http://qa4eo.org/workshop_harwell11.html

Greg at QA4EO Meeting

Page 66: Gregory Leptoukh’s Contributions to Informatics€¦ · The Other Data Problem: Usability •Data in Hierarchical Data Format (HDF) –Also with an HDF-EOS layer on top •Required

Thank you