esml, subsetting, mining tools - nasa...esml, subsetting, mining tools sara graves rahul...

28
ESML, Subsetting, Mining Tools Sara Graves Sara Graves Rahul Ramachandran Rahul Ramachandran Information Technology and Systems Center (ITSC) Information Technology and Systems Center (ITSC) University of Alabama in Huntsville (UAH) University of Alabama in Huntsville (UAH) www.itsc.uah.edu MODIS Science Team Meeting July 24, 2002

Upload: others

Post on 11-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

ESML, Subsetting, Mining Tools

Sara GravesSara Graves

Rahul RamachandranRahul Ramachandran

Information Technology and Systems Center (ITSC)Information Technology and Systems Center (ITSC)

University of Alabama in Huntsville (UAH)University of Alabama in Huntsville (UAH)

www.itsc.uah.edu

MODIS Science Team Meeting July 24, 2002

Page 2: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Tools Encompassing All Phasesof Scientific Analysis

• Science Data Usability– Data/Application Interoperability

• Earth Science Markup Language (ESML)

• Science Data Preprocessing– Subsetting

• Various Subsetting Tools such as HEW

• Science Data Analysis– Data Mining

• Algorithm Development and Mining (ADaM) System

• Mission/Project/Field Campaign Coordination– Electronic Collaboration

Page 3: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Science Data Usability

http://esml.itsc.uah.edu

Page 4: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Earth Science Data Characteristics

• Different formats,types and structures(18 and counting forAtmospheric Sciencealone!)

• Different states ofprocessing ( raw,calibrated, derived,modeled orinterpreted )

• Enormous volumes

Ø Heterogeneity leadsto Data usabilityproblem

HDF HDF-EOS

netCDFASCII

Binary GRIB

Page 5: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Data Usability Problem

DATA FORMAT 1

DATA FORMAT 1

DATA FORMAT 2

DATA FORMAT 2

DATA FORMAT 3

DATA FORMAT 3

APPLICATION

READER 1 READER 2

FORMATCONVERTER

• Requires specialized code for every format• Difficult to assimilate new data types• Makes applications tightly coupled to data

• One possible solution - enforce a Standard Data Format• Not practical, especially for legacy datasets

Page 6: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

ESML Solution

• ESML (external metadata) files containing the structuraldescription of the data format

• Applications utilize these descriptions to figure out how toread the data files resulting in data interoperability forapplications

ESML LIBRARY

APPLICATION

ESMLFILE

ESMLFILE

ESMLFILE

DATA FORMAT 1

DATA FORMAT 1

DATA FORMAT 2

DATA FORMAT 2

DATA FORMAT 3

DATA FORMAT 3

Page 7: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

What is ESML?

• It is a specialized markup language for Earth Sciencemetadata based on XML

• It is a machine-readable and -interpretable representationof the structure and content of any data file, regardless ofdata format

• ESML description files contain external metadata that canbe generated by either data producer or data consumer (atcollection, data set, and/or granule level)

• ESML provides the benefits of a standard, self-describingdata format (like HDF, HDF-EOS, netCDF, geoTIFF, …)without the cost of data conversion

• ESML is an Interchange Technology that allowsdata/application interoperability

Page 8: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

ESML Tools/Products Availablehttp://esml.itsc.uah.edu

Ability to browse data files using the ESMLdescription files

ESML Data Browser

Ability to write and validate ESML descriptionsESML Editor

•WINDOWS and LINUX

•URL access

•Handles ASCII, Binary (McIDAS), HDF-EOS,GRIB files

•Handles preprocessing, wild card and symbols

ESML Library

(C++/Java JNI )

•Defines ASCII, Binary (McIDAS), HDF-EOS,GRIB formats (more formats to follow)

•Provides preprocessing, wild card, symbols andsemantic tagging capabilities

ESML Schema

FeaturesTools/Products

Page 9: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

MODIS/CERES Collocation Application

ESMLfile

ESMLfile

ESMLfile

ESML Library

Collocation Algorithm

255

256

257

258

259

260

261

262

263

264

265

200 210 220 230 240 250 260 270 280 290 300

Sea Surface Temperature (TMI) Degree Kelvin

Ch

n 5

Tem

per

atu

re (

AM

SU

) D

egre

e K

elvi

n

MODIS CERES MISR/Others

Network

Analysis

Scientists can:• Select remote files across

the network• Select fields by modifying

semantic tags in the ESMLfile

Purpose:• To study the relationship

between shortwave flux andcloud/aerosol properties

• Important for climate changestudies

Page 10: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Science Data Preprocessing

http://subset.org

Page 11: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Currently Available/PlannedSubsetting Applications

• HEW Subsetting– Complete System (available)– Subsetting Engine Only (available)– Subsetting Center (available)– SPOT - Subsettability Checker (available)– HEW Integration with ECS (in work)– Remote Subsetting Service (planned)– Subsetting as a Web Service (planned)

• Customized Subsetting– MODIS tools (available)– Coarse-grain SSM/I Subsetter (available)

• General Purpose Customizable Subsetting– Based on ADaM Data Mining Engine (available)– Subsetting Tool using ESML (in work)

Page 12: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

• MODIS – Land, Quality Assessment

•modland – subsetter for MODIS gridded data

• stitcher – pieces together 2 or 4 contiguous MODIS tiles

• MODIS – Atmosphere

•modair - specialized subsetter for MODIS swaths

Tools developed for MODISScientists

Page 13: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville
Page 14: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

HEW integration with ECS

ECS EDG System

EDG ECS

Subsetter Input data

Output data

Enduser Order

submission (HTML)

Data order and reply

Subset ODL and reply

Subsetting System

Output data (Reingested)

1 2

34

5

6

7

• UAH/ITSC-writtensubsetting and interfacesoftware

• Ongoing testing with ECS6a.05 and EDG 3.4 atNSIDC, LP DAAC,GDAAC

• Enhancements for DAACsmay be made

Page 15: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

ESML enabled generic Subsetter

ESMLfile

ESMLfile

ESMLfile

ESML Library

Subsetting Algorithm

HDF-EOS Binary/ASCII

OtherFormats

Network

Subsetted Data

For HDF-EOS data notformatted for subsettingwith the HDF-EOS library:ESML file can be used tocorrect the semantic tagrequired to subset HDF-EOS data without the needto recreate the data file

Page 16: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Science Data Analysis

http://datamining.itsc.uah.edu

Page 17: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Data Mining

• Data Mining is the task of discoveringinteresting patterns/anomalies andextracting novel information from largeamounts of data

• Data Mining is an interdisciplinary fielddrawing from areas such as statistics,machine learning, pattern recognition andothers

Page 18: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Iterative Nature of the DataMining Process

DATA

PREPROCESSING CLEANING

AndINTEGRATION

MINING SELECTION

AndTRANSFORMATION

DISCOVERY

KNOWLEDGEEVALUATIONAnd

PRESENTATION

Page 19: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

ADaM Engine Architecture

PreprocessedData

PreprocessedData

DataDataTranslated

Data

Patterns/ModelsPatterns/Models

ResultsResults

OutputGIF ImagesHDF-EOSHDF Raster ImagesHDF SDSPolygons (ASCII, DXF)SSM/I MSFC

Brightness TempTIFF ImagesOthers...

Preprocessing AnalysisClustering K Means Isodata MaximumPattern Recognition Bayes Classifier Min. Dist. ClassifierImage Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion HistogramOperations PolygonCircumscript Spatial Filtering Texture OperationsGenetic AlgorithmsNeural NetworksOthers...

Selection and Sampling Subsetting Subsampling Select by Value Coincidence SearchGrid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find HolesImage Processing Cropping Inversion ThresholdingOthers...

Processing

InputHDFHDF-EOSGIF PIP-2SSM/I PathfinderSSM/I TDRSSM/I NESDIS Lvl 1BSSM/I MSFC

Brightness TempUS RainLandsatASCII GrassVectors (ASCII Text)

Intergraph RasterOthers...

Page 20: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Reasons for Building a Data MiningEnvironment

Reasons for Building a Data MiningEnvironment

• Provide scientists with the capabilities to iterate

• Allow the flexibility of creative scientific analysis

• Provide data mining benefits of

• Automation of the analysis process

• Reduction of data volume

• Provide a framework to allow a well defined structure forthe entire analysis process

• Provide a suite of mining algorithms for creative analysis

• Provide capabilities to add “science algorithms” to theframework

Page 21: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

ADaM : Mining Environment forScientific Data

• The system provides knowledge discovery, feature detection andcontent-based searching for data values, as well as for metadata.

•contains over 120 different operations•Operations vary from specialized science data-set specificalgorithms to various digital image processing techniques,processing modules for automatic pattern recognition, machineperception, neural networks, genetic algorithms and others

Page 22: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Extensibility of ADaM

Visualization/AnalysisPackages

TrainingData

InputData

ADaM Mining EngineAnalysisModules

InputModules

OutputModules

General PurposeAlgorithms

TrainableClassifiers

User DefinedAlgorithms

Page 23: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

• Provide scientists with the capabilities to iterate

• Allow the flexibility of creative scientific analysis

• Is a powerful tool for research and analysis given the volume ofscience data

• Extremely useful when manual examination of data is impossible

• Allows scientists to add problem specific algorithms to theADaM toolkit

• Minimizes scientists’ data handling to allow them to maximizeresearch time

• Reduces “reinventing the wheel”

Reasons for using ADaM forScientific Data Analysis

Page 24: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Mission/Project/Field CampaignCoordination

Electronic Collaboration

Page 25: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

Strategic and Tactical Coordination

• Data acquisitionand integrationfrom multipleplatforms,instruments andagencies for quickexploitation

• Intra-projectcommunicationsbefore, during,and after CAMEXcampaigns

Technologies to coordinate complex projects

Page 26: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

CAMEX-4Coordination:

pre-flight

NASA Aircraft

NOAA Aircraft

USAF Aircraft

Radars

RDBMS

CoordinationClearinghouse

Experiment PI: Coordinates with allparticipants, posts plan of the day

Aircraft Crew: Perform aircraftmaintenance and report status.

Forecaster: Contacts local weather, forecastcenters, weather support web sites to prepareand post daily morning weather briefing

NASA managers may review status ofaircraft, instruments, flight plans atvarious times throughout the mission

Preflight mission briefing andflight planning

Scalable, reliable data management

Web-based interface with customizedinformation access for different user groups;rapid development, scalability and portability

Page 27: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

CAMEX-4Coordination:

in flight

NASA Aircraft

NOAA Aircraft

USAF Aircraft

RDBMS

CoordinationClearinghouse

Download latest satelliteimagery to web

Radars

Instrument scientists:Collect, process, and storedata on board aircraft

Forecaster: Continue monitoring satelliteimagery, radar data, and landing forecasts

CommunicationsSatellite

Transmit selected data toNational Hurricane Centerfor inclusion in computerforecast models

Experiment PI: Modify flight planas needed in response to changingweather events

Page 28: ESML, Subsetting, Mining Tools - NASA...ESML, Subsetting, Mining Tools Sara Graves Rahul Ramachandran Information Technology and Systems Center (ITSC) University of Alabama in Huntsville

CAMEX-4Coordination:

post-flight

NASA Aircraft

NOAA Aircraft

USAF Aircraft

Radars

RDBMS

CoordinationClearinghouse

Mission and Instrumentscientists: Post sortie andinstrument reports andquicklook data

Forecaster: Prepare post-flight weather briefing

Aircraft Crew: Prepare aircraftand instruments for next flight andupdate status.