nsf expeditions in computing understanding climate change:   a data driven approach

17
© Vipin Kumar UCC –Aug 15, 2011 NSF Expeditions in Computing Understanding Climate Change: A Data Driven Approach Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar

Upload: dustin-boyle

Post on 03-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach. Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar. Climate Change: The defining issue of our era. The planet is warming Multiple lines of evidence Credible link to human GHG - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC –Aug 15, 2011

NSF Expeditions in Computing

Understanding Climate Change:  A Data Driven Approach

Vipin KumarUniversity of Minnesota

[email protected]/~kumar

Page 2: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 2

Climate Change: The defining issue of our era

• The planet is warming• Multiple lines of evidence

• Credible link to human GHG (green house gas)

emissions

• Consequences can be dire• Extreme weather events,

regional climate and ecosystem shifts, abrupt climate change, stress on key resources and critical infrastructures

• There is an urgency to act• Adaptation: “Manage the

unavoidable”

• Mitigation: “Avoid the unmanageable”

• The societal cost of both action and inaction is largeKey outstanding science challenge: Actionable predictive insights to credibly inform policy

Russia Burns, Moscow ChokesNATIONAL GEOGRAPHIC, 2010

The Vanishing of the Arctic Ice capecology.com, 2008

Page 3: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 3

Understanding Climate Change - Physics based Approach

General Circulation Models: Mathematical models with physical equations based on fluid dynamics

Figure Courtesy: ORNL

An

om

alie

s fr

om

188

0-19

19 (

K)

Parameterization and non-linearity of differential equations are sources for uncertainty!

Cell

Clouds

LandOcean

Projection of temperature increase under different Special Report on Emissions Scenarios (SRES) by 24 different GCM configurations from 16 research centers used in the Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report.

A1B: “integrated world” balance of fuelsA2: “divided world” local fuelsB1: “integrated world” environmentally conscious

IPCC (2007)

Page 4: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 4

Understanding Climate Change - Physics based Approach

General Circulation Models: Mathematical models with physical equations based on fluid dynamics

Figure Courtesy: ORNL

An

om

alie

s fr

om

188

0-19

19 (

K)

Parameterization and non-linearity of differential equations are sources for uncertainty!

Cell

Clouds

LandOcean

“The sad truth of climate science is that the most crucial information is the least reliable” (Nature, 2010)

Physics-based models are essential but not adequate– Relatively reliable predictions at global scale for ancillary variables such as

temperature– Least reliable predictions for variables that are crucial for impact assessment

such as regional precipitation

Regional hydrology exhibits large variations among major IPCC model projections

Disagreement between IPCC models

Page 5: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 5

Data-Driven Knowledge Discovery in Climate Science

From data-poor to data-rich transformation– Sensor Observations: Remote sensors like satellites and

weather radars as well as in-situ sensors and sensor networks like weather station and radiation measurements

– Model Simulations: IPCC climate or earth system models as well as regional models of climate and hydrology, along with observed data based model reconstructions

"The world of science has changed ... data-intensive science [is] so different that it is worth distinguishing [it] … as a new, fourth paradigm for scientific exploration." - Jim Gray

Data guided processes can complement hypothesis guided data analysis to develop predictive insights for use by climate scientists, policy makers and community at large.

Page 6: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 6

NSF Expedition: Understanding Climate Change - A Data-Driven

Approach

Predictive Modeling Enable predictive modeling of typical and extreme behavior from multivariate spatio-temporal data

Relationship MiningEnable discovery of complex dependence structures: non-linear associations or long range spatial dependencies

Complex Networks Enable studying of collective behavior of interacting eco-climate systems

High Performance Computing Enable efficient large-scale spatio-temporal analytics on exascale HPC platforms with complex memory hierarchies

Transformative Computer Science Research•Science Contributions

– Data-guided uncertainty reduction by blending physics models and data analytics

– A new understanding of the complex nature of the Earth system and mechanisms contributing to adverse consequences of climate change

•Success Metric– Inclusion of data-driven analysis as a standard part of climate

projections and impact assessment (e.g., for IPCC)

“... data-intensive science [is] …a new, fourth paradigm for scientific exploration." - Jim Gray

Project aim:A new and transformative data-driven approach that complements physics-based models and improves prediction of the potential impacts of climate change

Page 7: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 7

Transformative Computer Science Research

High Performance ComputingEnable efficient large-scale spatio-temporal analytics on

future generation exascale HPC platforms with complex memory hierarchies

Large scale, spatio-temporal, unstructured, dynamic

kernels , features, dependencies

Relationship MiningEnable discovery of complex dependence structures such as non linear associations or long range spatial dependenciesNonlinear, spatio-temporal, multi-variate, persistence, long memory

Predictive Modeling

Enable predictive modeling of typical and extreme behavior from multivariate spatio-temporal dataNonlinear, spatio-temporal, multivariate

Complex NetworksEnable studying of collective behavior of interacting eco-climate systemsNonlinear, space-time lag, geographical, multi-scale

relationships Community structure- function-dynamics

Enabling large-scale data-driven science for complex, multivariate, spatio-temporal, non-linear, and dynamic systems: • Fusion

plasma • Combustion • Astrophysics • ….

The Expedition is an end-to-end demonstration of this major paradigm for future knowledge discovery process.

Page 8: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 8

Relationship Mining

Objective

• To develop methods for discovery of complex dependence structures (e.g., nonlinear associations, long-range spatial dependencies)

Computer Science Innovations

• Define a notion of “transactions” for association analysis in spatio-temporal data

• Adapt association analysis to continuous data

• Reduce the number of redundant patterns and assess statistical significance of the identified patterns

• Define a new approach to association analysis that trades off completeness for a smaller, simpler set of patterns and far smaller computational requirements

Impact

• Facilitate discovery of complex dependence structures in dynamic systems

• Understand relationships between fire size and frequency to precipitation, land cover, ocean temperature, etc

• Characterize impacts of sea surface temperature on hurricane frequency and intensity

• Understand feedback effects, the key uncertainty in climate science

Dynamic behavior of the high and low pressure fields corresponding to NOA climate index (Portis et al, 2001)

Page 9: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 9

Complex Networks

Computer Science Innovations

• Construct multivariate networks to capture nonlinear spatio-temporal interactions

• Characterize network dynamics at multiple spatial scales over different time periods

• Scale graph mining algorithms to the required size and number for comparative analysis of multiple real-world networks

Impact

• Apply network structure-function-dynamics methods to complex systems

• Understand intricate interplay between topology and dynamics of the climate system over many spatial scales

• Explain the 20th century great climate shifts from the collective behavior of interacting subsystems

Node degree in a correlation-based climate network

Objectives

• To develop algorithms for characterization of network structure, function, and dynamics

• To enable large-scale comparative analysis of climate networks to increase prediction confidence

Page 10: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 10

• Take into account nonlinear spatial, temporal, and multivariate dependencies

• Minimize assumptions on temporal evolution, e.g., no`Markov’ assumptions

• Deal with curse of dimensionality in nonlinear extreme value analysis (recency, frequency, duration) and quantiles

• Scale algorithms yet ensure uncertainty quantification

Objectives

• To advance nonparametric multivariate spatio-temporal probabilistic regression models to incorporate nonlinear dependencies

• To develop latent variable models to characterize spatio-temporal climate states

• To advance multivariate extreme value theory through geometric and probabilistic generalizations of quantiles

Alok Choudhary, NWUNagiza Samatova, NCSU

Vipin Kumar, UMNPredictive Modeling

Correlated Gaussian Processes for multivariate spatiotemporal regression w/ nonlinear dependencies

Computer Science Innovations

• Uncertainty quantified predictions of climate variables, e.g., precipitation, over space-time

• Abrupt climate change detection, typical climate characterization

• Modeling regional climate change and extreme climate events, e.g., hurricanes

• Applications beyond climate data, e.g., finance, healthcare, bioinformatics, networks

Impact

= +

GP1 GP2

Multivariate quantiles for extreme value analysis

Page 11: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 11

High Performance Computing

Objectives

• To investigate a “co-design” approach to scalable analysis of complex dependence structures

• To develop parallel and scalable statistical and data mining functions and kernels

• To develop sophisticated parallel I/O techniques that fully exploit complex memory hierarchy (disks, SSDs, DRAMs, accelerators)

Impact

• Analytical kernels that will scale many data mining algorithms

• Analytics algorithms designed for effective exploration of complex spatio-temporal dependence structures

Computer Science Innovations

• The first“co-design” approach that will take into account exploration of nonlinear associations, long-range spatial dependencies, I/O requirements and optimizations, presence of accelerators and multiple suitable programming models for HPC systems

• Novel optimizations to deal with data-and-compute intensive computations on exascale platforms with complex memory hierarchies

An architecture with accelerators such as GPUs. Some/all nodes have accelerators and many

cores.

Page 12: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 12

CS and Climate Science Synergies

• Physics based models inform data mining

• More credible variables (e.g., SST) more crucial variables (e.g., precipitations/hurricanes)

• Better modeled processes (e.g., atmospheric physics) more crucial processes (e.g., land surface hydrology)

• Global and century scale regional and decadal scale

• Data mining informs physics based models

• Better understanding of climate processes

• Improved insights for parametrization schemes

Page 13: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 13

Does the proposed research enable significantly better data analysis than before or enable new kinds of analysis?

Measures of Success in CS Research

Predictive Modeling– Improved capabilities for multivariate

spatio-temporal regression that can simultaneously capture the dependencies between spatial-temporal objects (e.g., temperature, pressure) and help find novel climate patterns not found by standard approaches

– New capabilities for detecting extreme events using quantiles and quantile regression for multivariate data

Complex Networks– Detection of known climate patterns and

tracking of the evolution of climate patterns in time using complex networks that capture non-linear relationships in multivariate and multiscale data

Relationship Mining– Creation of entirely new capabilities in

association mining from approaches that extend / modify traditional approaches to handle spatio-temporal data

– Creation of a completely novel approach for association analysis that reduces the number of patterns and the time required to find them

– Improvement in the performance of complex networks and predictive models by using nonlinear relationships that more faithfully represent reality

High Performance Computing– Scalable analytics code for spatio-temporal data– Enabling of large scale data driven science that

serves as a demonstration of the value of the data driven paradigm

Page 14: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 14

Ultimate success is to have data-driven analysis included as a standard part of climate projections and impact assessment (e.g., for IPCC).

Achieving this will require measuring and demonstrating success in two key areas

– Filling critical gaps in climate science (Nature, 2010) Reduction of uncertainty in climate predictions at regional and decadal scales Improved predictive insights for key precipitation processes Providing credible assessments of extreme hydro-meteorological events New knowledge about climate processes (e.g., teleconnection patterns)

– Improve climate change impact assessments and inform policy Distinguishing between natural and anthropogenic causes Improved assessments of climate change risks across multiple sectors

Specific use cases will provide the context for these evaluations

Climate-based Measures of Success

Page 15: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 15

Science Impact View: Evaluation Methodology

• Can we improve the projection (e.g., hurricane frequency, intensity)?• Can the data-driven model identify:

• which regional climate variables are informative/causal (e.g., SST, wind speed)?• what is the relationship (+/- feedback) between these variables?

• To what extent does the data-driven model inform climate scientists?

1950 2010 (now) 21001870

Train Model

Forecast & Compare w/ Observation

Data

• Reanalysis data• Observed data • Model projection data

Hindcast Prediction

Forecast & Multi-model Comparison

• Observed data

Illustrative Example for Hurricane Frequency Use Case

Page 16: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 16

Conceptual View: Driving Use-Cases

Data Challenges

Characterize impacts of sea surface temperature on hurricane frequency and intensity

Project climate extremes at regional and decadal scales

CS Technologies

Yr1

Yr2

Yr3

Yr4

Yr5

Yr5+

Non-stationary non-i.i.d

Multivariate

Non-linear Long-memory, long-range, multiscale

Networks for multivariate nonlinear space-time interactions

Understand relationships between fire size and frequency to precipitation, land cover, ocean temperature, etc.

Identify drought and pluvial event triggers to improve prediction Understand and quantify

the uncertainty of feedback effects in climateMultivariate extreme

value theory based on geom./prob. quantiles

Co-design-based scalable analytics of complex dependence structures

Nonparametric multivariate spatio-temporal probabilistic regression models

Association analysis-based interplay between complex dependence structures

Heterogeneous

Characterize feedbacks contributing to abrupt climate regime changes

Page 17: NSF Expeditions in Computing Understanding Climate Change:   A Data Driven Approach

© Vipin Kumar UCC - Aug 15 2011 17

Example Driving Use Cases

Impact of Global Warming on Hurricane Frequency

Regime Shift in Sahel

1930s Dust Bowl Discovering Climate Teleconnections

Find non-linear

relationships

Validate w/ hindcasts

Build hurricane models

Sahel zone

Regime shift can occur without any advanced warning and may be triggered by isolated events such as storms, drought

Onset of major 30-year drought over the Sahel region in 1969

Affected almost two-thirds of the U.S. Centered over the agriculturally productive Great Plains

Drought initiated by anomalous tropical SSTs (Teleconnections)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

longitude

latitu

de

Correlation Between ANOM 1+2 and Land Temp (>0.2)

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180

90

60

30

0

-30

-60

-90

El Nino Events

Nino 1+2 Index