the large synoptic survey telescope project bob mann wide-field astronomy unit university of...
TRANSCRIPT
User requirements for astronomical data
management
The Large Synoptic Survey Telescope Project
Bob MannWide-Field Astronomy Unit
University of Edinburgh
Requirements from astronomy Several large astro projects represented here
e.g. LSST, SKA, Euclid, LOFAR
All have significant computing challenges Some similarities & some differences between them Expert data centres crucial to success of all of them
But what about requirements from users? How will astronomers use the data from these
projects?
Contrasting particle physics & astronomy
PARTICLE PHYSICISTS*
Mainly work within 1 expt. Expts. are multi-institutional
and multi-national (CERN) strong central control:
top-down structure Expt. uses its own data
Take code to the data now
Have long-term preservation of data in usable form ensured by CERN
ASTRONOMERS
Work within several consortia Consortia are multi-institutional
and multi-national No central control: any structure
is built bottom-up Cons. use data from many
sources
Will take code to data soon
Have raw and pipeline-reduced data in telescope archives No guarantee that complete
multi-wavelength dataset will be preserved after cons. dissolves*as I understand it, at least
Example: XMM Cluster Survey Consortium
4 UK groups, 2 US, Germany, Portugal, South Africa People in other groups involved for some parts of project
Data XMM: ESA (proprietary to CoIs, then public) Sloan Digital Sky Survey: (prop. to SDSS, then public) ESO Public Surveys: (prop. to ESO, then public) Our own NOAO imaging survey: (prop. to us, then public) Gemini spectroscopy: (prop. to us, then public) Dark Energy Survey: (prop. to DES, shared via agreement) AAO spectroscopy: (prop. to CoIs, shared via agreement) …
Data management Undertaken part-time by science postdocs on best-efforts basis:
ftp directories, file links from wikis and webpages, etc
Requirements suggested by XCS
1. Ready access to many distributed data sources through standard protocols and respecting
proprietary restrictions where they exist2. Sharable storage space (files and databases)
Physically distributed, but logically unitary3. Analysis tools accessing the distributed data
and writing results to shared storage space
But these are the same requirements that motivated the Virtual Observatory in 2001!
VO Architecture – Jan 2007Provided
by AstroGrid
IVOAStandard
AGStandard
Images
SIAP
Catalogues
Cone CEC
DSA
Spectra
SSAP
Community JES
Serverapps
CEA Server
CEC
RegistryHarvest
QueryMySpace
Internet
Workbench
AstroRuntime
RMI
XML-RPC
http Client App 1
Client App 2
ScriptingEnvironment
PLASTIC
From talkto MScstudents
Contrasting particle physics & astronomy
PARTICLE PHYSICS
Founded 2001 Developed innovative
solutions meeting needs of user community
Funding continues today
ASTRONOMY
Founded 2001 Developed innovative
solutions meeting needs of user community
Funding stopped in 2010
VO development has continued Internationally:
International Virtual Observatory Alliance
21 members; activity varies Fairly complete set of standard protocols now▪ Many standardising what AstroGrid prototyped years
earlier
Within the UK Modest amounts of EU funding Modest amounts of funding within data centres
Example: WFAU’s “Firethorn” project
Initial goals Include our DBs in distributed queries through VO▪ Using IVOA Table Access Protocol (TAP)
Allow publication of user-owned tables in our DBs Allow TAP services to be composed
Soon realised this could support research consortia Give them working space with proprietary restrictions Publish to (potentially) long-term storage afterwards
Firethorn objective
What’s missing
• User data not published through TAP yet
• Need group management for sharing
• Experimenting with Docker for data analysis
• Not exposing endpointsfor composed TAP services yet
How could UK-T0 help?
1. Authentication IVOA has protocol for credential delegation, but
we need trusted, simple-to-use credentials first2. Data transport under the hood
VOSpace (file) storage protocol has been implemented on iRODS; other possibilities?
3. Hardware Some needs to be at/near main astro data
centres, but not all, if #2 works well PPE experience in coupling compute and data
…and could any of this be re-used elsewhere?
Summary
Good data centres are vital for large projects Many challenges for LSST, Euclid, SKA, etc
But they are just part of the story for most users
Multi-wavelength astronomy in multi-national consortia is the norm for most people now Need computing systems to support how they work and
to preserve their multi-wavelength datasets ▪ The Virtual Observatory gives (most of) the standards ▪ We lack (some of) the production software and the hardware
Some of this must be generic across PPAN area Some of it may have been solved for the LHC But astronomy is different in some ways from PPE
Firethorn now