power point (9.7mby) - advanced computing for science

57
Computing and Data Grids for Science and Engineering www.ipg.nasa.gov doesciencegrid.org William E. Johnston http://www-itg.lbl.gov/~wej/ Computational Research Division, DOE Lawrence Berkeley National Laboratory and NASA Advanced Supercomputing (NAS) Division, NASA Ames Research Center 6/24/03

Upload: datacenters

Post on 07-Jul-2015

332 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power Point (9.7MBy) - Advanced Computing for Science

Computing and Data Gridsfor Science and Engineering

www.ipg.nasa.govdoesciencegrid.org

William E. Johnstonhttp://www-itg.lbl.gov/~wej/

Computational Research Division,DOE Lawrence Berkeley National Laboratory

and

NASA Advanced Supercomputing (NAS) Division,

NASA Ames Research Center

6/24/03

Page 2: Power Point (9.7MBy) - Advanced Computing for Science

2

The Process of Large-Scale Science is Changing

• Large-scale science and engineering problems require collaborative use of many compute, data, and instrument resources all of which must be integrated with application components and data sets that are– developed by independent teams of researchers– or are obtained from multiple instruments– at different geographic locations

The evolution to this circumstance is what has driven my interest in high-speed distributed computing – now Grids – for 15 years.

E.g., see [1], and below.

Page 3: Power Point (9.7MBy) - Advanced Computing for Science

Complex Infrastructure is Needed forSupernova Cosmology

Page 4: Power Point (9.7MBy) - Advanced Computing for Science

Carbon Assimilation

CO2 CH4

N2O VOCsDust

HeatMoistureMomentum

ClimateTemperature, Precipitation,Radiation, Humidity, Wind

ChemistryCO2, CH4, N2O

ozone, aerosols

MicroclimateCanopy Physiology

Species CompositionEcosystem StructureNutrient Availability

Water

DisturbanceFiresHurricanesIce StormsWindthrows

EvaporationTranspirationSnow MeltInfiltrationRunoff

Gross Primary ProductionPlant RespirationMicrobial RespirationNutrient Availability

Ecosystems

Species CompositionEcosystem Structure

WatershedsSurface Water

Subsurface WaterGeomorphology

Biogeophysics

En

erg

y

Wa

ter

Ae

ro-

dyna

mic

s

Biogeochemistry

MineralizationDecomposition

Hydrology

Soi

l W

ater

Sno

w

Inte

r-ce

pted

Wat

er

Phenology

Bud Break

Leaf Senescence

HydrologicCycle

VegetationDynamics

Min

ute

s-T

o-H

ou

rsD

ays-

To

-We

eks

Yea

rs-T

o-C

entu

rie

sThe Complexity of a “Complete” Approach to Climate Modeling –

Terrestrial Biogeoscience Involves Many Interacting Processes and Data

(Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002.)

Page 5: Power Point (9.7MBy) - Advanced Computing for Science

5

Cyberinfrastructure for Science

• Such complex and data intensive scenarios require sophisticated, integrated, and high performance infrastructure to provide the– resource sharing and distributed data management,– collaboration, and– application frameworks

that are needed to successfully manage and carry out the many operations needed to accomplish the science

• This infrastructure involves– high-speed networks and services– very high-speed computers and large-scale storagehighly capable middleware, including support for

distributed data management and collaboration

Page 6: Power Point (9.7MBy) - Advanced Computing for Science

6

The Potential Impact of Grids

• A set of high-impact science applications in the areas of– high energy physics– climate– chemical sciences– magnetic fusion energy

have been analyzed [2] to characterize their visions for the future process of science – how must science be done in the future in order to make significant progress

Page 7: Power Point (9.7MBy) - Advanced Computing for Science

7

The Potential Impact of Grids

• These case studies indicate that there is a great deal of commonality in the infrastructure that is required in every case to support those visions – including a common set of Grid middleware

• Further, Grids are maturing to the point where it is providing useful infrastructure for solving the computing, collaboration, and data problems of science application communities (e.g. as illustrated by the case studies, below)

Page 8: Power Point (9.7MBy) - Advanced Computing for Science

8

Grids: Highly Capable Middleware

• Core Grid services / Open Grid Services Infrastructure– Provide the consistent, secure, and uniform

foundation for managing dynamic and administratively heterogeneous pools of compute, data, and instrument resources

• Higher level services / Open Grid Services Architecture– Provide value-added, complex, and aggregated

services to users and application frameworks• E.g. information management –

Grid Data services that will provide a consistent and versatile view of data – real and virtual – of all descriptions

Page 9: Power Point (9.7MBy) - Advanced Computing for Science

NERSCSupercomputing

& Large-Scale Storage

PNNL

LBNLANL

ESnet

Europe

DOEScience Grid ORNL

ESNet

X.509CA

Grid Managed ResourcesAsia-Pacific

Funded by the U.S. Dept. of Energy, Office of Science,Office of Advanced Scientific Computing Research,

Mathematical, Information, and Computational Sciences Division

Sys

tem

s m

anag

emen

t an

d a

cces

s

Com

mun

icat

ion

Se

rvic

es

Aut

hen

ticat

ion

Aut

horiz

atio

n

Sec

urity

S

ervi

ces

Grid

In

form

atio

n

Ser

vice

Uni

form

Com

putin

gA

cces

s

Uni

x an

d O

GS

I ho

stin

g

Glo

bal E

vent

Ser

vice

s,

Au

ditin

g,

Mon

itorin

g

Co-

Sch

edul

ing

Uni

form

Dat

a A

cce

ss

Supernova Observatory

scientific instruments

Synchrotron Light Source

User Interfaces

Higher-level Services / OGSA (Data Grid Services, Workflow management, Visualization, Data Publication/Subscription, Brokering, Job Mg’mt, Fault Mg’mt, Grid System Admin., etc.)

Core Grid Services / OGSI: Uniform access to distributed resources

Applications (Simulations, Data Analysis, etc.)

Application Frameworks (e.g. XCAT, SciRun) and Portal Toolkits (e.g. XPortlets)

Page 10: Power Point (9.7MBy) - Advanced Computing for Science

10

Grids: Highly Capable MiddlewareAlso ….• Knowledge management

– Services for unifying, classifying and “reasoning about” services, data, and information in the context of a human centric problem solving environment – the Semantic Grid

– Critical for building problem solving environments that• let users ask “what if” questions• ease the construction of multidisciplinary systems

by providing capabilities so that the user does not have to be an expert in all of the disciplines to build a multidisciplinary system

Page 11: Power Point (9.7MBy) - Advanced Computing for Science

11

• Grids are also– A worldwide collection of researchers and

developers– Several hundred people from the US, European,

and SE Asian countries working on best practice and standards at the Global Grid Forum (www.gridforum.org)

– A major industry effort to combine Grid Services and Web Services (IBM, HP, Microsoft) (E.g. see [3])

– Vendor support from dozens of IT companies

Grid Middleware

Page 12: Power Point (9.7MBy) - Advanced Computing for Science

12

Web Services and Grids

• Web services provide for– Describing services (programs) with sufficient

information that they can be discovered and combined to make new applications (reusable components)

– Assembling groups of discovered services into useful problem solving systems

– Easy integration with scientific databases that use XML based metadata

Page 13: Power Point (9.7MBy) - Advanced Computing for Science

13

Web Services and Grids

• So …– Web Services provide for defining, accessing, and

managing serviceswhile– Grids provide for accessing and managing

dynamically constructed, distributed compute and data systems, and provide support for collaborations / Virtual Organizations

Page 14: Power Point (9.7MBy) - Advanced Computing for Science

14

Combining Web Services and Grids

• Combining Grid and Web services will provide a dynamic and powerful computing and data system that is rich in descriptions, services, data, and computing capabilities

• This infrastructure will give us the basic tools to deal with complex, multi-disciplinary, data rich science models by providing– for defining the interfaces and data in a

standard way – the infrastructure to interconnect those

interfaces in a distributed computing environment

Page 15: Power Point (9.7MBy) - Advanced Computing for Science

15

Combining Web Services and Grids

• This ability to utilize distributed services is important in science because highly specialized code and data is maintained by specialized research groups in their own environments, and it is neither practical nor desirable to bring all of these together on a single system

• The Terrestrial Biogeoscience climate system is an example where all of the components will probably never run on the same system – there will be manysub-models and associated data that are built and maintained in specialized environments

Page 16: Power Point (9.7MBy) - Advanced Computing for Science

Carbon Assimilation

CO2 CH4

N2O VOCsDust

HeatMoistureMomentum

ClimateTemperature, Precipitation,Radiation, Humidity, Wind

ChemistryCO2, CH4, N2O

ozone, aerosols

MicroclimateCanopy Physiology

Species CompositionEcosystem StructureNutrient Availability

Water

DisturbanceFiresHurricanesIce StormsWindthrows

EvaporationTranspirationSnow MeltInfiltrationRunoff

Gross Primary ProductionPlant RespirationMicrobial RespirationNutrient Availability

Ecosystems

Species CompositionEcosystem Structure

WatershedsSurface Water

Subsurface WaterGeomorphology

Biogeophysics

En

erg

y

Wa

ter

Ae

ro-

dyna

mic

s

Biogeochemistry

MineralizationDecomposition

Hydrology

Soi

l W

ater

Sno

w

Inte

r-ce

pted

Wat

er

Phenology

Bud Break

Leaf Senescence

HydrologicCycle

VegetationDynamics

Min

ute

s-T

o-H

ou

rsD

ays-

To

-We

eks

Yea

rs-T

o-C

entu

rie

sTerrestrial Biogeoscience – A “Complete” Approach to Climate Modeling –

Involves Many Complex, Interacting Processes and Data

(Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002.)

Page 17: Power Point (9.7MBy) - Advanced Computing for Science

17

Combining Web Services and Grids

• The complexity of the modeling done in Terrestrial Biogeoscience is a touchstone for this stage of evolution of Grids and Web Services – this is one of the problems to solve in order to provide a significant increase in capabilities for science

• Integrating Grids and Web Services is a major thrust at GGF – e.g. in the OGSI and Open Grid Services Architecture Working Groups.Also see http://www.globus.org/ogsa/

Page 18: Power Point (9.7MBy) - Advanced Computing for Science

18

The State of Grids

• Persistent infrastructure is being built - this is happening, e.g., in– DOE Science Grid– NASA’s IPG– International Earth Observing Satellite Committee

(CEOS)– EU Data Grid– UK eScience Grid– NSF TeraGrid– NEESGrid (National Earthquake Engineering

Simulation Grid)

all of which are focused on large-scale science and engineering

Page 19: Power Point (9.7MBy) - Advanced Computing for Science

19

The State of Grids – Some Case Studies

• Further, Grids are becoming a critical element of many projects – e.g.– The High Energy Physics problem of

managing and analyzing petabytes of data per year has driven the development of Grid Data Services

– The National Earthquake Engineering Simulation Grid has developed a highly application oriented approach to using Grids

– The Astronomy data federation problem has promoted work in Web Services based interfaces

Page 20: Power Point (9.7MBy) - Advanced Computing for Science

20

High Energy Physics Data Management

• Petabytes of data per year must be distributed to hundreds of sites around the world for analysis

• This involves– Reliable, wide-area, high-volume data

management– Global naming, replication, and caching of

datasets– Easily accessible pools of computing resources

• Grids have been adopted as the infrastructure for this HEP data problem

Page 21: Power Point (9.7MBy) - Advanced Computing for Science

Tier 1

Tier2 Center

Online System

eventreconstruction

French Regional Center

German Regional Center

Institute

Institute

Institute

Institute ~0.25TIPS

Workstations

~100 MBytes/sec

~0.6-2.5 Gbps

100 - 1000 Mbits/sec

Physics data cache

~PByte/sec

Tier2 CenterTier2 CenterTier2 Center

~0.6-2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

CERN/CMS data goes to 6-8 Tier 1 regional centers, and from each of these to 6-10 Tier 2 centers.

Physicists work on analysis “channels” at 135 institutes. Each institute has ~10 physicists working on one or more channels.

2000 physicists in 31 countries are involved in this 20-year experiment in which DOE is a major player.

CERN LHC CMS detector

15m X 15m X 22m, 12,500 tons, $700M.

human=2m

analysis

Italian Center FermiLab, USA Regional Center

Courtesy Harvey

Newman, CalTech

High Energy Physics Data Management CERN / LHC Data: One of Science’s most challenging data

management problems

~2.5 Gbits/sec

event simulation

Page 22: Power Point (9.7MBy) - Advanced Computing for Science

22

High Energy Physics Data Management

• Virtual data catalogues and on-demand data generation have turned out to be an essential aspect– Some types of analysis are pre-defined and

catalogued prior to generation - and then the data products are generated on demand when the virtual data catalogue is accessed

– Sometimes regenerating derived data is faster and easier than trying to store and/or retrieve that data from remote repositories

– For similar reasons this is also of great interest to the EOS (Earth Observing Satellite) community

Page 23: Power Point (9.7MBy) - Advanced Computing for Science

US-CMS/LHC Grid Data Services Testbed:International Virtual Data Grid Laboratory

Virtual Data Tools

Data Generation Request

Planning &Scheduling Tools

Data GenerationRequest

Execution & Management Tools

Transforms

Distributed resources(code, storage, CPUs,networks)

Interactive User Tools

Raw datasource

•Metadata catalogues•Virtual data catalogues

ResourceManagement

Security andPolicy

Other GridServices Core Grid Services

metadatadescriptionof analyzed

data

Page 24: Power Point (9.7MBy) - Advanced Computing for Science

CMS Event Simulation Productionusing GriPhyN Data Grid Services

• Production Run on the Integration Testbed (400 CPUs at 5 sites)– Simulate 1.5 million full CMS events for physics studies– 2 months continuous running across 5 testbed sites– Managed by a single person at the US-CMS Tier 1site Nearly 30 CPU years delivered 1.5 Million Events to CMS

Physicists

Page 25: Power Point (9.7MBy) - Advanced Computing for Science

25

Partnerships with the Japanese Science Community

• Comments of Paul Avery [email protected], director iVDGL– iVDGL is specifically interested in partnering with the Japanese

HEP community and hopefully the National Research Grid Initiative will opens doors for collaboration

– Science drivers are critical – existing international HEP collaborations in Japan provide natural drivers

– Different Japanese groups could participate in existing or developing Grid applications oriented testbeds, such as the ones developed in iVDGL for the different HEP experiments

• These testbeds have been very important for debugging Grid software while serving as training grounds for existing participants and new groups, both at universities and national labs.

– Participation in and development of ultra-speed networking projects provides collaborative opportunities in a crucial related area. There are a number of new initiatives that are relevant

• Contact Harvey B Newman <[email protected]> for a fuller description and resource materials.

Page 26: Power Point (9.7MBy) - Advanced Computing for Science

26

National Earthquake Engineering Simulation Grid

• NEESgrid will link earthquake researchers across the U.S. with leading-edge computing resources and research equipment, allowing collaborative teams (including remote participants) to plan, perform, and publish their experiments

• Through the NEESgrid, researchers will– perform tele-observation and tele-operation of

experiments – shake tables, reaction walls, etc.; – publish to, and make use of, a curated data repository

using standardized markup; – access computational resources and open-source

analytical tools; – access collaborative tools for experiment planning,

execution, analysis, and publication

Page 27: Power Point (9.7MBy) - Advanced Computing for Science

27

NEES Sites

• Shake Table Research Equipment– University at Buffalo, State

University of New York – University of Nevada, Reno – *University of California, San

Diego

• Centrifuge Research Equipment– *University of California, Davis – Rensselaer Polytechnic

Institute

• Tsunami Wave Basin– *Oregon State University,

Corvallis, Oregon

• Large-Scale Lifeline Testing– Cornell University

• Large-Scale Laboratory Experimentation Systems– University at Buffalo, State

University of New York – *University of California at

Berkeley– *University of Colorado,

Boulder – University of Minnesota-Twin

Cities – Lehigh University – University of Illinois, Urbana-

Champaign

• Field Experimentation and Monitoring Installations– *University of California, Los

Angeles– *University of Texas at Austin – Brigham Young University

Page 28: Power Point (9.7MBy) - Advanced Computing for Science

Field Equipment

Laboratory Equipment

Remote Users

Remote Users: (K-12 Faculty and Students)

High-Performance Network(s)

Instrumented Structures and Sites

Large-scale Computation

Curated Data Repository

Laboratory Equipment

Global Connections

Simulation Tools Repository

NEESgrid Earthquake Engineering Collaboratory

Page 29: Power Point (9.7MBy) - Advanced Computing for Science

29

NEESgrid Approach

• Package a set of application level services and the supporting Grid software in a single“point of presence” (POP)

• Deploy the POP to a select set of earthquake engineering sites to provide the applications, data archiving, and Grid services

Assist in developing common metadata so that the various instruments and simulations can work together

• Provide the required computing and data storage infrastructure

Page 30: Power Point (9.7MBy) - Advanced Computing for Science

30

NEESgrid Multi-Site Online Simulation (MOST)

• A partnership between the NEESgrid team, UIUC and Colorado Equipment Sites to showcase NEESgrid capabilities

• A large-scale experiment conducted in multiple geographical locations which combines physical experiments with numerical simulation in an interchangeable manner

• The first integration of NEESgrid services with application software developed by Earthquake Engineers (UIUC, Colorado and USC) to support a real EE experiment

• See http://www.neesgrid.org/most/

Page 31: Power Point (9.7MBy) - Advanced Computing for Science

31

NEESgrid Multi-Site Online Simulation (MOST)

U. Colorado U. Colorado Experimental Experimental

SetupSetup

UIUC Experimental UIUC Experimental SetupSetup

Page 32: Power Point (9.7MBy) - Advanced Computing for Science

32

Multi-Site, On-Line Simulation Test (MOST)

ColoradoColorado

Experimental Model

gx

f2m1, θ1

F2

F1

e

gx=

gx

f1, x1

UIUCUIUC

Experimental Model

gx

m1

f1 f2

NCSANCSA

Computational Model

SIMULATIONSIMULATION

COORDINATORCOORDINATOR

NEESpop NEESpop

NEESpop

UIUC MOST-SIM•Dan Abrams•Amr Elnashai•Dan Kuchma•Bill Spencer• and othersColorado FHT•Benson Shing•and others

Page 33: Power Point (9.7MBy) - Advanced Computing for Science

1994 Northridge Earthquake SimulationRequires a Complex Mix of Data and Models

Pier #7

Pier #8

Pier #5

Pier #6

Amr Elnashai, UIUC

NEESgrid provides the common data formats, uniform dataarchive interfaces, and computational services needed to supportthis multidisciplinary simulation

Page 34: Power Point (9.7MBy) - Advanced Computing for Science

Laboratory EquipmentInstrumented

Structures and Sites

Large-scale Computation

Curated Data Repository Simulation

Tools Repository

NEESgrid Architecture

Data AcquisitionSystem

Large-scale Storage

Video Services

E-NotebookServices

GridFTP

MetadataServices

CompreHensive collaborativEFramework (CHEF)

NEESGrid StreamingData System

NEESpop

Java AppletWeb

BrowserUser Interfaces

NEES distributed resources

Grid Services

SIMULATIONSIMULATION

COORDINATORCOORDINATOR

Accounts &MyProxy

NEESgridMonitoring

NEES Operations

ExperimentsMultidisciplinary

SimulationsCollaborations

Page 35: Power Point (9.7MBy) - Advanced Computing for Science

35

Partnerships with the Japanese Science Community

• Comments of Daniel Abrams <[email protected]>, Professor of Civil Engineering, University of Illinois and NEESGrid project manager– The Japanese earthquake research community has expressed interest

in NEESgrid– I am aware of some developmental efforts between one professor and

another to explore feasibility of on-line pseudodynamic testing - Professor M. Watanabe at the University of Kyoto is running a test in his lab which is linked with another test running at KAIST (in Korea) with Professor Choi. They are relying on the internet for transmission of signals between their labs.

– International collaboration with the new shaking table at Miki is being encouraged and thus they are interested in plugging in to an international network. There is interest in NEESgrid in installing a NEESpop there so that the utility could be evaluated, and connections made with the NEESGrid sites.

– We already have some connection to the Japanese earthquake center known as the Earthquake Disaster Mitigation Center. We have an MOU with EDM and the Mid-America Earthquake Center in place. I am working with their director, Hiro Kameda, and looking into establishing a NEESGrid relationship.

Page 36: Power Point (9.7MBy) - Advanced Computing for Science

36

The Changing Face of Observational Astronomy

• Large digital sky surveys are becoming the dominant source of data in astronomy: > 100 TB, growing rapidly– Current examples: SDSS, 2MASS, DPOSS, GSC,

FIRST, NVSS, RASS, IRAS; CMBR experiments; Microlensing experiments; NEAT, LONEOS, and other searches for Solar system objects …

– Digital libraries: ADS, astro-ph, NED, CDS, NSSDC– Observatory archives: HST, CXO, space and ground-

based– Future: QUEST2, LSST, and other synoptic surveys;

GALEX, SIRTF, astrometric missions, GW detectors

• Data sets orders of magnitude larger, more complex, and more homogeneous than in the past

Page 37: Power Point (9.7MBy) - Advanced Computing for Science

37

The Changing Face of Observational Astronomy

• Virtual Observatory: Federation of N archives– Possibilities for new discoveries grow as O(N2)

• Current sky surveys have proven this– Very early discoveries from Sloan (SDSS),

2 micron (2MASS), Digital Palomar (DPOSS)

• see http://www.us-vo.org

Page 38: Power Point (9.7MBy) - Advanced Computing for Science

38

Sky Survey Federation

Page 39: Power Point (9.7MBy) - Advanced Computing for Science

Mining Data from Dozens of Instruments / Surveys is Frequently a Critical Aspect of Doing Science

• The ability to federate survey data is enormously important

• Studying the Cosmic Microwave Background – a key tool in studying the cosmology of the universe – requires combined observations from many instruments in order to isolate the extremely weak signals of the CMB

• The datasets that represent the material “between” us and the CMB are collected from different instruments and are stored and curated at many different institutions

• This is immensely difficult without approaches like National Virtual Observatory in order to provide a uniform interface for all of the different data formats and locations

(Julian Borrill, NERSC, LBNL)

Page 40: Power Point (9.7MBy) - Advanced Computing for Science

40

NVO Approach

• Focus is on adapting emerging information technologies to meet the astronomy research challenges– Metadata, standards, protocols (XML, http)– Interoperability– Database federation– Web Services (SOAP, WSDL, UDDI)– Grid-based computing (OGSA)

• Federating data bases is difficult, but very valuable– An XML-based mark-up for astronomical tables and catalogs -

VOTable– Developed metadata management framework– Formed international “registry”, “dm” (data models), “semantics”,

and “dal” (data access layer) discussion groups

• As with NEESgrid, Grids are helping to unify the community

Page 41: Power Point (9.7MBy) - Advanced Computing for Science

41

NVO Image Mosaicking

• Specify box by position and size• SIAP server returns relevant images

• Footprint• Logical Name• URL

Can choose:

standard URL:http://.......

SRB URLsrb://nvo.npaci.edu/…..

Page 42: Power Point (9.7MBy) - Advanced Computing for Science

42

Atlasmaker Virtual Data System

Metadata repositoriesFederated by OAI

Data repositoriesFederated by SRB

Compute resourcesFederated by TG/IPG

Mosaicked data is on

file

2a. Mosaicked data is not on file

2d: Store result &

return result

2c: Compute on TG/IPG

Userrequest

Request manager

2b. Get raw data from NVO resources

Core Grid Services

Higher LevelGrid Services

Page 43: Power Point (9.7MBy) - Advanced Computing for Science

43

Background Correction

Uncorrected Corrected

Page 44: Power Point (9.7MBy) - Advanced Computing for Science

44

NVO Components

Simple Image Access ServicesCone Search Services

UCDs

Visualization

Web Services

Grid Services

Cross-Correlation Engine

Resource/Service Registries

Streaming

VOTable VOTable

UCDs

Data archives Computing resources

Page 45: Power Point (9.7MBy) - Advanced Computing for Science

45

International Virtual Observatory Collaborations

• Astrophysical Virtual Observatory (European Commission)

• AstroGrid, UK e-scienceprogram

• Canada

• VO India

• VO Japan(leading the work on VO query language)

• VO China

• German AVO

• Russian VO

• e-Astronomy Australia

• IVOA(International Virtual Observatory Alliance)

US contacts: Alex Szalay [email protected], Roy Williams [email protected],Bob Hanisch <[email protected]>

Page 46: Power Point (9.7MBy) - Advanced Computing for Science

Where to in the Future?The potential of a Semantic Grid / Knowledge Grid:

Combining Semantic Web Services and Grid Services

• Even when we have well integrated Web+Grid services we still do not provide enough structured information and tools to let us ask “what if” questions, and then have the underlying system assemble the required components in a consistent way to answer such a question.

Page 47: Power Point (9.7MBy) - Advanced Computing for Science

47

Beyond Web Services and Grids

• A commercial example “what if” question:– What does my itinerary look like if I wish to go

SFO to Paris, CDG, and then to Bucharest.– In Bucharest I want a 3 or 4 star hotel that is

within 3 km of the Palace of the Parliament, and the hotel cost may not exceed the U. S. Dept. of State, Foreign Per Diem Rates.

Page 48: Power Point (9.7MBy) - Advanced Computing for Science

48

Beyond Web Services and Grids

• To answer such a question – a relatively easy task, but tedious, for a human – the system must “understand” the relationships between maps and locations, between per diem charts and published hotel rates, and it must be able to apply constraints (< 3 km, 3 or 4 star,cost < $ per diem rates, etc.)

• This is the realm of “Semantic Grids”

Page 49: Power Point (9.7MBy) - Advanced Computing for Science

49

Semantic Grids / Knowledge Grids

• Work is being adapted from the Artificial Intelligence community to provide [4]

– “Ontology languages” to extend metadata to represent relationships

– Language constructs to express rule based / constraint relationships among, and generalizations of, the extended terms

Page 50: Power Point (9.7MBy) - Advanced Computing for Science

50

Future Cyberinfrastructureimpacttechnology

Object-oriented structure: Class definitions can be derived from multiple superclasses, and property definitions can specify domain and range constraints.

Can now represent tree structured information (e.g. Taxonomies)

Resource Description Framework Schema (RDFS) [7]

An extensible, object-oriented type system that effectively represents and defines classes.

Can ask questions like “What are a particular property’s permitted values, which types of resources can it describe, and what is its relationship to other properties.”

Resource Description Framework (RDF) [7]

Expresses relationships among “resources” (URI(L)s) in the form of object-attribute-value (property). Values of can be other resources, thus we can describe arbitrary relationships between multiple resources.

RFD uses XML for its syntax.

Page 51: Power Point (9.7MBy) - Advanced Computing for Science

51

Future Cyberinfrastructure

Knowledge representation and manipulation that have well defined semantics and representation of constraints and rules for reasoning

OWL (DAML+OIL) +…… [9]

impacttechnology

OIL can state conditions for a class that are both sufficient and necessary. This makes it possible to perform automatic classification: Given a specific object, OIL can automatically decide to which classes the object belongs.

This is functionality that should make it possible to ask the sort of constraint and relationship based questions illustrated above.

Ontology Inference Layer (OIL) [8]

OIL inherits all of RDFS, and adds expressing class relationships using combinations of intersection (AND), union (OR), and compliment (NOT). Supports concrete data types (integers, strings, etc.)

Page 52: Power Point (9.7MBy) - Advanced Computing for Science

52

Semantic Grid Capabilities

• Based on these technologies, the emerging Semantic Grid [6] / Knowledge Grid [5] services will provide several important capabilities

1) The ability to answer “what if” questions by providing constraint languages that operate on ontologies that describe content and relationships of scientific data and operations, thus “automatically” structuring data and simulation / analysis components into Grid workflows whose composite actions produce the desired information

Page 53: Power Point (9.7MBy) - Advanced Computing for Science

53

Semantic Grid Capabilities

2) Tools, content description, and structural relationships so that when trying to assemble multi-disciplinary simulations, an expert in one area can correctly organize the other components of the simulation without having to involve experts in all of the ancillary sub-models (components)

Page 54: Power Point (9.7MBy) - Advanced Computing for Science

54

Future Cyberinfrastructure

• Much work remains to make this vision a reality

• The Grid Forum has recently established a Semantic Grid Research Group [10] to investigate and report on the path forward for combining Grids and Semantic Web technology.

Page 55: Power Point (9.7MBy) - Advanced Computing for Science

55

Thanks to Colleagues who Contributed Material to this Talk

• Dan Reed, Principal Investigator, NSF NEESgrid; Director, NCSA and the Alliance; Chief Architect, NSF ETF TeraGrid; Professor, University of Illinois - [email protected]

• Ian Foster, Argonne National Laboratory and University of Chicago, http://www.mcs.anl.gov/~foster

• Dr. Robert Hanisch, Space Telescope Science Institute, Baltimore, Maryland

• Roy Williams, Cal Tech; Dan Abrams, UIUC; Paul Avery, Univ. of Florida; Alex Szalay, Johns Hopkins U.; Tom Prudhomme, NCSA

Page 56: Power Point (9.7MBy) - Advanced Computing for Science

Grid Services: secure and uniform access and management for distributed resources

Science Portals: collaboration and problem solvingWeb Services

Supercomputing andLarge-Scale Storage

High Speed Networks

Spallation Neutron Source

High Energy Physics

Advanced Photon Source

Macromolecular Crystallography

Advanced Engine Design

Computing and Storage

of Scientific Groups

Supernova Observatory

Advanced Chemistry

Page 57: Power Point (9.7MBy) - Advanced Computing for Science

57

Notes

[1] “The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications,” William E. Johnston. http://www.itg.lbl.gov/~johnston/Grids/homepage.html#CI2002

[2] “DOE Office of Science, High Performance Network Planning Workshop.”August 13-15, 2002: Reston, Virginia, USA.http://doecollaboratory.pnl.gov/meetings/hpnpw

[3] “Developing Grid Computing Applications” in IBM developerWorks : Web services : Web services articleshttp://www-106.ibm.com/developerworks/library/ws-grid2/?n-ws-1252

[4] See “The Semantic Web and its Languages,” an edited collection of articles in IEEE Intelligent Systems, Nov. Dec. 2000. D. Fensel, editor.

[5] For an introduction to the ideas of Knowledge Grids I am indebted to Mario Cannataro, Domenico Talia, and Paolo Trunfio (CNR, Italy). See www.isi.cs.cnr.it/kgrid/

[6] For an introduction to the ideas of Semantic Grids I am indebted to Dave DeRoure (U. Southampton), Carol Gobel (U. Manchester), and Geoff Fox (U. Indiana). See www.semanticgrid.org

[7] “The Resource Description Framework,” O. Lassila. ibid.[8] “FAQs on OIL: Ontology Inference Layer,” van Harmelen and Horrocks. ibid. and

“OIL: An Ontology Infrastructure for the Semantic Web.” Ibid.[9] “Semantic Web Services,” McIlraith, Son, Zeng. Ibid. and

“Agents and the Semantic Web,” Hendler. Ibid.[10] See http://www.semanticgrid.org/GGF This GGF Research Group is co-chaired by

David De Roure <[email protected]>, Carole Goble <[email protected]>, and Geoffrey Fox <[email protected]>