meteorology and space weather data mining portal

15
Meteorology and Space Weather Data Mining Portal Mikhail ZHIZHIN, Geophysical Center RAS Dmitry MISHIN, Institute of Physics of the Earth, RAS Alexei POYDA, Moscow State University

Upload: iris-jensen

Post on 01-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Meteorology and Space Weather Data Mining Portal. Mikhail ZHIZHIN, Geophysical Center RAS Dmitry MISHIN, Institute of Physics of the Earth, RAS Alexei POYDA, Moscow State University. Environmental Scenario Search Engine (ESSE). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Meteorology and Space Weather Data Mining Portal

Meteorology and Space Weather Data Mining Portal

Mikhail ZHIZHIN, Geophysical Center RASDmitry MISHIN, Institute of Physics of the Earth, RAS

Alexei POYDA, Moscow State University

Page 2: Meteorology and Space Weather Data Mining Portal

Environmental Scenario Search Engine (ESSE)

• Portal for interactive searching for events over a Grid of environmental data services hosted by OGSA-DAI

• The web services are Grid proxies for the database clusters with terabytes of high-resolution meteorological and space weather reanalysis data over the past 20-50 years

• The data mining is based on fuzzy logic to search for events in natural language terms, such as “very cold day”

• Parallel data mining across disciplines for correlated events in space, atmosphere and ocean

• In cooperation with the National Geophysical Data Center NOAA and supported by the grant from the Microsoft Research Ltd.

Page 3: Meteorology and Space Weather Data Mining Portal

Environmental Data SourcesAvalanche in the amount of available data:• Monitoring (ground observatories, satellites etc.); • Reanalysis data (models that build regular grids of specific parameters

based on available irregular data)

Examples:• SPIDR (Space Physics Interactive Data Archive)

– From 1930 year– ~120 numerical parameters – ~0.5 TB

• NCEP/NCAR Weather Reanalysis Project – From 1950 year – Weather parameters on regular grid

• Time resolution 6 hrs • Spatial resolution 2.5 deg

– ~1 TB • CLASS (Comprehensive Large Array-data Stewardship System

– From 1992 year– Satellite images from ~100 spectral channels – ~1.2 PB, growing ~0.5 PB per year

Page 4: Meteorology and Space Weather Data Mining Portal

Environmental Data ModelsBasic data element is a time series, i.e. an array of values of a parameter at different times at a specific grid point, observatory location, or on specific satellite trajectory

These arrays has typical dimension of 106. And basic operations are not joins, but “extracting subrange” or “resampling”

Page 5: Meteorology and Space Weather Data Mining Portal

Environmental Data Service: OGSA-DAI plugin

NCEPdatabase

NWSdatabase

SPIDRdatabases

Tomcat

DAI

Clients

Dataexport

getMetadata

Metadata XML

getProperty: sources

sources list

getXMLData

data XML

getNetCDFData

URL to NetCDF file

NetC

DF

fileserialisation

User

NetCDF file

DMSPdatabase

IDEASportal

MS Excel

Any client

Page 6: Meteorology and Space Weather Data Mining Portal

Environmental Data MiningCurrently available environmental data mining portals (GCMD, ESG)

search metadata and subset the data:

• How to find appropriate databases?

In addition, ESSE searches for events inside the data:

• How to interpret a question of a scientist? • How to build set of database queries that can answer the question? • How to synthesize and present results of a distributed query? Typical ESSE questions:

• How often do typical Florida spring storms occur? Have the frequency been increasing in the last 10 years?

• Find day-time DMSP satellite images above Florida with spring storms

Page 7: Meteorology and Space Weather Data Mining Portal

How to interpret a question of a scientist?

1. Introduce the notion of an Environmental Scenario (ES) as a basic building block for scientific question

2. Interpret ES as a fuzzy query expression a. Each basic condition in a ES translates into membership

function of a fuzzy set, a term in a resulting expression b. An expression is built using traditional fuzzy logic operations

plus “time shift” operator

3. Query terms are evaluated at individual data sources 4. The ESSE engine collects the data and performs fuzzy

query operation.

The ESSE engine is being built as a Web Service. This enables cascading queries, but raises new research challenges, e.g. optimization of query execution.

Page 8: Meteorology and Space Weather Data Mining Portal

Defining fuzzy search criteria

Set the fuzzy constraints on the parameters for the event state, for example:

(VERY HIGH TEMPERATURE) and (VERY HIGH HUMIDITY)

Page 9: Meteorology and Space Weather Data Mining Portal

Working with Environmental Scenarios

The user may search for a desired scenario by describing several subsequent events. Scenario example: (HEAVY RAIN) followed by (VERY LOW TEMPERATURE)

Page 10: Meteorology and Space Weather Data Mining Portal

How to synthesize and present results of a distributed query?

• Environmental Scenario search result is a scored list of candidate events. “Score” represents the “likeliness” of each event in a numerical form

• The result page provides links to visualization and data export pages

• Each event can be viewed as– time series– dynamic 5D volume– satellite images animation

• Data subset for each event can be exported in XML and NetCDF formats

Page 11: Meteorology and Space Weather Data Mining Portal

Scenario search results: scored event list

• “Score” represents the “likeliness” of each event in a numerical form.

• The results page provides links to visualization and data export pages.

Page 12: Meteorology and Space Weather Data Mining Portal

Viewing the event in time and space

Vis5D time-space-parameteranimation

Page 13: Meteorology and Space Weather Data Mining Portal

Viewing the event from satellites

Page 14: Meteorology and Space Weather Data Mining Portal

Where do we use Grid infrastructure?

EGEE

MetadataXML

UserESSE Portal

OGSA-DAI

DataData

Discover Sources

Select: parameters stations probes date interval ...

Visualise Data

Fuzzy search scenario

Download Data

FuzzySearch

Data Request

UpdateMetadata

Data

DataexportgridFTP

ReturnData

ReturnData

MetadataSearch

Event DataSubset

WorkflowControl

Page 15: Meteorology and Space Weather Data Mining Portal

Online demo scenario

1. User login on ESSE portal2. Search for a database with “cloud cover”

parameter and coverage around Moscow3. Select the database “NCEP Reanalysis”, the

location “Moscow”, and the parameter “Cloud cover”

4. Compose the event scenario “Low cloud cover”5. Search for day events in the summer 20056. Show the most likely event found with time

series and satellite images