nist big data | nist

14
N A S A From Data Curation to Data A l ti A Bi Dt Science Mission Directorate N A S A Analytics: A Big Data Experiment in NASA Earth Science Tsengdar J. Lee, Ph.D. Weather Data Analysis Program Manager High-End Computing Manager

Upload: others

Post on 24-Apr-2022

39 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NIST Big Data | NIST

NASAFrom Data Curation to Data A l ti A Bi D t

Science Mission

Directorate

NASAAnalytics: A Big Data Experiment in NASA Earth Science

Tsengdar J. Lee, Ph.D.Weather Data Analysis Program ManagerHigh-End Computing Manager

Page 2: NIST Big Data | NIST

Turning Observations into Knowledge Products

2

Page 3: NIST Big Data | NIST

Data Acquisition to Data Access

Data Acquisition

Tracking &

Distribution, Access, Interoperability &

Reuse

Flight Operations, Data Capture, Initial

Processing & Backup Archive

Data Transport to DAACs

Science Data Processing, Data

Mgmt., Data Archive & Distribution

SpacecraftData Relay

Satellite (TDRS) Research

Education

Value-AddedProviders

Interagency

NASA Integrated Services

Data Processing & Mission

Control

EOSDIS ScienceData Systems

(DAACs)

GroundStations

g yData Centers

InternationalPartners

Network (NISN)

Mission Services

Control WWWIP

InternetROSES NRA

Polar Gro nd Stations

Use in EarthSystem Models

BenchmarkingDSS

ScienceTeams Measurement

T

3

Polar Ground Stations

TECHNOLOGY

Teams

Page 4: NIST Big Data | NIST

Where Can We Research/Play with Big Data Problems?

• Ian Foster talked about data processing workflow all the way to data distribution but why stop there?

• Michael Stonebraker talked about moving compute to d t b t th i t d h ll (I ’tdata but there is a tremendous challenge. (I can’t even do it at one NASA center).

Page 5: NIST Big Data | NIST

Funding ($M)

TOPS1999

NEX Precursors and Development HistoryTOPS

IDU (AERO)(Pl i d S h d li SOA)

1999

2000-02

0.3

1.5(Planning and Scheduling, SOA)

REASoN(Automated data acquisition, first external user = NPS, NEED: Web Interface, Data summarization) 2003-07

1.5

4.5

Inv

R & A(NEED: Flexible analysis

capabilities)

Applied Sci(NOAA, EPA, 20+ partners)

ACCESS(OGC, Web-based maps)2008-09 7.1

vestment

AIST(Anomaly Detection and Analysis Framework)

ARRA/vTOPS/Water (Social Network, Big Data, Technology

prototyping)2009-11 15.5

ts

NASA Earth ExchangeNEX

(Community, Supercomputing,Collaboration)

NASA Earth ExchangeNEX

(Community, Supercomputing,Collaboration)

AIST (1)(Workflow Capabilities)

ACCESS (1)

R & A (11)CMS, NCA, LCLUC,CC,TE, IDS

A li d S i (9)

2012-152.6 (FY12)

))ACCESS (1)(Data and tool management)

Applied Sci. (9)(ECOFOR, WATER)HECC (2)

(NEX Infrastructure)

Page 6: NIST Big Data | NIST

Access to community/knowledge (240 members) Access to ready-to-use data (24 products, 350TB)

NEX: the Big Data experiment and work completed to date

Access to models/analysis toolsClimate, weather, carbon, hydrology

Access to workflows to build upon

(VisTrails)

Ready for sophisticated users, needs proper integration for the rest…..

Page 7: NIST Big Data | NIST

NEX SCIENCE USERSEARTH SCIENCECOMMUNITY USERS NEX HPC USERS• Access to Portal

• Access to compute resources• Access to NAS

supercomputing resources

Portal Sandbox NAS HPC

supercomputing resources

• Workflow Management

Point of entry to NEX collaborative environment

• Project Information• Collaboration and Social

Networking

Virtualized NEXcompute environment

Environment forComputing at scale

• Execution of Jobs migrated from SandboxSt f lt i NEX D t

Domain Platform

Infrastructure Platform

g• Provenance• Rich semantic search• Data/Model/Tool access (API)

Networking • Document Publication• Resource Requests• Data Discovery

• Storage of results in NEX Data Management environment

Infrastructure Platform• Virtualization Support• Model and Analytic Tool

Execution

Runs on NAS web servers

Runs on NAS supercomputers and storage

C t

Data Management

• Data acquisition and pre-processing

• Data storageComponent Architecture

• Data storage

Runs on dedicated NEX servers and storage

Page 8: NIST Big Data | NIST

NEX Portal

Search capabilities

Who is doing what whereScience network through

abstracts and papersWho's the expertWorkflowsArchived seminars

Reporting capabilities

Annual reportsHi hli htHighlightsPublicationsSpatial distribution of funding

Page 9: NIST Big Data | NIST

Virtual Institute

• Following the lead from Astrobiology and NASA Lunar Science Institutes to create a virtual institute, and ffoffer• Summer short courses• Seminars• Conferences• Presentations

Have each funded project• Have each funded project do one or two seminars that can be archived

Page 10: NIST Big Data | NIST

Benefits

Science 2007 GRL 2010

Would have taken less than a week in NEX Instead of oneweek in NEX, Instead of one year it took to repeat the analysis

Efficient use of resourcesLowering barriers to entryInterdisciplinary workTransparency, repeatability, extensibilityextensibilityCost reduction

TravelHardwareIT personnelData acquisitionNetwork costs

Page 11: NIST Big Data | NIST

30 years of change analysis (CC)Global Land Survey from Landsat (LCLUC)

Benefits (Unprecedented Opportunities)

Global monthly Landsat (WELD, MEASURES)Biophysical products from Landsat (CMS)

AGB, t/ha

Crop water management with LandsatWorldView-2, 50cm, 8 bands (NGA), , ( )

11

Page 12: NIST Big Data | NIST

Big Data Metric?

Space-time-resolution metric(higher spatial resolution, larger extent,longer time-series studies)

Page 13: NIST Big Data | NIST

NEX Experiments Demonstrate Value of Collaborative Environment

• In a first application of NEX, a research team from around the U.S. used the

i t t dj i denvironment to adjoin and atmospherically correct a mosaic of 9,000 LandsatThematic Mapper scenes and retrieve global gvegetation density at a 30-meter resolution.

• The entire processing of the nearly 340 billion pixels in the composite took just athe composite took just a few hours on the Pleiades supercomputer, allowing the team to experiment with new algorithms and

• NEX’s collaboration and knowledge-sharing platform for the Earth science community combines supercomputing, Earth system modeling, workflow management and NASA remote sensing data feeds toproducts within just a few

days.

POC: Petr Votava petr votava@nasa gov (650) 604-4675;

management, and NASA remote sensing data feeds to deliver a complete work environment for users to explore/analyze large datasets, run modeling codes, collaborate, and share results.

POC: Petr Votava, [email protected], (650) 604-4675; Ramakrishna Nemani, [email protected], (650) 604-6185, NASA Ames Research Center

High End Computing Capability Project

Page 14: NIST Big Data | NIST

Future Work

• Data ManagementHow to load and offload data in a multioHow to load and offload data in a multi-tier storage environment?

oDistributed Multi-site Analytics?• Workflow Reuse

oWhy create your own if a similaranalytics was done before?analytics was done before?

oUse workflow to capture the data provenance?

• Workflow DiscoveryoWhy not mine the literature to uncover

the workflow?the workflow?