27 june 2005 national virtual observatory 1 the national virtual observatory: publishing astronomy...

23
27 June 2005 National Virtual Observatory 1 The National Virtual The National Virtual Observatory: Publishing Observatory: Publishing Astronomy Data Astronomy Data Robert J. Hanisch Robert J. Hanisch US National Virtual Observatory US National Virtual Observatory Space Telescope Science Institute Space Telescope Science Institute Baltimore, MD USA Baltimore, MD USA Reagan Moore Reagan Moore San Diego Supercomputer Center San Diego Supercomputer Center

Upload: antonio-lucas

Post on 27-Mar-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory1

The National Virtual Observatory: The National Virtual Observatory: Publishing Astronomy DataPublishing Astronomy Data

Robert J. HanischRobert J. HanischUS National Virtual ObservatoryUS National Virtual Observatory

Space Telescope Science InstituteSpace Telescope Science InstituteBaltimore, MD USABaltimore, MD USA

Reagan MooreReagan MooreSan Diego Supercomputer CenterSan Diego Supercomputer Center

Page 2: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory2

Topics

• Virtual Observatory description (VO)• Discovery Services• Data Management Services• Interactions with the GGF

– Astrophysics Research Group

Page 3: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory3

The Virtual Observatory

“The Virtual Observatory will provide a ‘virtual sky’ based on the enormous data sets being created now and the even larger ones proposed for the future. It will enable a new mode of research for professional astronomers and will provide to the public an unparalleled opportunity for education and discovery.”

—Astronomy and Astrophysics

in the New Millennium

Page 4: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory4

Astronomy is Facing a Data Avalanche

Multi-Terabyte (soon: multi-Petabyte) sky surveys and archives over a broad range of wavelengths

Billions of detected sources, hundreds of measured attributes per source

1 microSky (DPOSS)

1 nanoSky (HDF-S)

Page 5: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory5

Composition of Results from Multiple Collections

…revealsa more completephysical picture

The resultingcomplexity ofdata translatesinto increaseddemands fordata analysis,visualization, and understanding

Page 6: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory6

Storage Resource Broker Collections at SDSC(6/18/2005)

GBs ofdata

stored

Numberof files

Userswith

ACLsData Grid Ź Ź Ź

NSF/ITR - National Virtual Observatory 53,862 9,536,751 100NSF - National Partnership for Advanced Computational Infrastructure 35,021 7,263,936 380

Static collections Š Hayden planetarium 8,013 161,352 227

Pzone Š public collections 8,322 4,080,025 67

NSF/NPACI - Biology and Environmental collections 36,295 71,932 67

NSF/NPACI Š Joint Center for Structural Genomics 15,405 1,510,697 55

NSF - TeraGrid, ENZO Cosmology simulations 168,660 1,467,591 3,267

NIH - Biomedical Informatics Research Network 10,371 6,855,388 292

Digital Library Ź Ź Ź

NSF/NPACI - Long Term Ecological Reserve 256 9,033 36

NSF/NPACI - Grid Portal 2,620 53,048 460

NIH - Alliance for Cell Signaling microarray data 712 83,062 21

NSF - National Science Digital Library SIO Explorer collection 2,715 1,081,042 27

NSF/ITR - Southern California Earthquake Center 114,000 2,666,969 70

Persistent Archive Ź Ź Ź

NHPRC Persistent Archive Testbed (Kentucky, Ohio, Michigan, Minnesota) 97 380,448 28

UCSD Libraries archive 4,147 408,050 29

NARA- Research Prototype Persistent Archive 1,478 893,434 58

NSF - National Science Digital Library persistent archive 3,593 27,034,043 136

TOTAL 465 TB 63 million 5,320

Page 7: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory7

Large-scale Synoptic Survey Telescope

• LSST will take pictures of the entire observable sky every 3 days– Compare images to detect changes

• Asteroids - sizes down to 250 meters

• Micro-lensing events - structure of dark matter

• Supernovae

– Expect to generate 100 PBs of data– Expect to sustain over 50 TeraFlops computation

• Distributed architecture– Processing at telescope (14,000 feet, perhaps Chile)– Processing at base station (perhaps Chile)– Processing in the US

Page 8: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory8

An overview of the Large Synoptic Survey TelescopeJim Brase, LLNL

• 8.4 meter aperture telescope surveying the full sky every 3-4 nights to visual magnitude 23-24

• Primary missions are to study dark energy - dark matter, transient universe, outer solar system and near-earth> objects (NEO)

• > 13 TB / night

• > 100 PB over its 10 year mission

• Event detections on the Web in < 1 minute

• Pioneering new way of doing science – mining petabyte image databases

• First light January 2012

Page 9: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory9

Publication of Results

• What does it mean to publish large scientific collections?

• Requirements include:– Authenticity and integrity, the characterization

of the source of the material and an assurance that the data is uncorrupted

– Discovery mechanisms to identify sets of appropriate data

– Access mechanisms to support expected usage patterns and analyses

Page 10: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory10

Research Problems that Drive Publication Requirements

• Statistical astronomy done right – Precision cosmology, Galactic structure, stellar

astrophysics …– Discovery of significant patterns and multivariate

correlations– Access to observations from multiple collections

• Systematic exploration of the observable parameter spaces– Searches for rare or unknown types of objects and

phenomena– Low surface brightness universe, the time domain

– Confronting massive numerical simulations with massive data sets

– Access to large portions of a collection

Page 11: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory11

Comparison of Images within Large Collections

Megaflares on normal main sequence stars (DPOSS)

Page 12: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory12

Scientific Data Publication

• Standard vocabulary– Uniform content descriptors for all physical variables

registered in astronomy catalogs

• Standard data format– FITS encoding format for astronomy images

• Standard services for accessing collections– Simple image access service– Cone search for catalog access– Sky query node for distributed search across catalogs

• Enable large-scale applications– Support access to tens of terabytes of data and millions of

catalog entries

Page 13: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory13

Data Publishing Roles(who is using the system?)

Roles

Authors

Publishers

Curators

Consumers

Traditional

Scientists

Journals

Libraries

Scientists

read->analyze

Emerging

Collaborations

Project www site

Massive Archives

Scientists & public

query-> analyze

Page 14: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory14

Interactions with Publishers

• Provide validation of tabular digital data submitted to astronomy journals– Validate semantics - Uniform Content

Descriptors for each table column– Validate coordinates for each named object– Check consistency of coordinates across

objects– Aggregate data into a common catalog for

future queries - CDS– Provide an archive of tabular data

• Current size is about 5 billion records

Page 15: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory15

Interactions with Publishers

• Validate image data submitted to astronomy journals– Validate encoding format - FITS– Check semantic terms in the FITS header

• Naming conventions for coordinates, resolution, wavelength

– Check consistency of header variables– Support archiving of the original image

• Build consistent collection of all images published• Cross correlate to other images of the same object• Current aggregate survey size is about 50 Terabytes

(50,000 Gbytes)

Page 16: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory16

Virtual Observatory Publication Services

• A suite of international standards for the discovery, exchange, intercomparison, and analysis of network-accessible astronomical data

• A data access and analysis environment that exploits the emerging computation/software/data Grid

• A framework for data processing that enables and encourages the re-use of algorithms

• A tool for astronomy research• A catalyst for world-wide access to astronomical

archives• A vehicle for education and public outreach

Page 17: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory17

Types of Grid Services

• VOTable - standard table structure for data from catalogs

• Conesearch - retrieve entries from an object catalog that are spatially located within a circle mapped on the sky

• Simple Image Access Protocol - retrieve an image from an image archive, cropped to the desired size

• Simple Spectrum Access Protocol - retrieve a spectrum from a catalog

• Skyquery - distribute queries across multiple object catalogs, join results

• Mosaic service - create composite of multiple images

Page 18: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory18

Data Management Services

• VOStore - interface for simple get, put of files from an image archive

• VOSpace - data management interface for assembling uniform name spaces across multiple image archives

• Uniform Content Descriptors - standard naming conventions for all physical quantities in catalogs

• VO Ontology - relationships between the UCDs, also a time-space coordinate ontology for astronomy

Page 19: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory19

International VO Alliance

• The IVOA brings together the astronomers, developers, and managers of the VO initiatives world-wide– Agreements on standards for data access (VOTable,

catalog queries, image retrieval, resource descriptions, etc.)

– Coordination of development activities– Sharing of software and experience– International policies on data sharing and publication

• 13 participating organizations: Astrogrid, AVO, US-NVO, VO-Australia, VO-Canada, VO-China, VO-France, VO-Germany (GAVO), VO-India, VO-Italy (DRACO), VO-Japan, VO-Korea, VO-Russia

• http:www.ivoa.net

Page 20: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory20

Data Management Approaches in Scientific Disciplines

• Data Grids– Focus on shared collections that may be

distributed across multiple sites

• Digital Libraries– Provide discovery and display services for

scientific collections

• Persistent Archives– Assert authenticity and integrity of collection

while underlying systems evolve

Page 21: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory21

NVO Digital Library Interactions

• Dublin Core metadata standard– Describe provenance of all objects

• Open Archives Initiative - Protocol for Metadata Harvesting– Used to populate service registry

• Carnivore v 1.0 service registry – Register all of NVO services– http://mercury.cacr.caltech.edu:8080/carnivore

• DSpace - digital library– Port of top of data grids for distributed data management

• Fedora - digital library

Page 22: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory22

Characteristics

• Standard vocabularies, data formats, services• Collection management

– Descriptive, administrative metadata– Access controls on creation of data, metadata, annotations– Audit trails, versions, locking, pinning, containers

• Distributed data– Data created at multiple sites– Data used at multiple sites– Replicas at multiple sites

• Persistence– All systems must manage technology evolution

• Federation– Sharing of data between independent collections

Page 23: 27 June 2005 National Virtual Observatory 1 The National Virtual Observatory: Publishing Astronomy Data Robert J. Hanisch US National Virtual Observatory

27 June 2005 National Virtual Observatory23

Questions

Reagan W. Moore

[email protected]

http://www.sdsc.edu/srb/