experience building the world wide telescope aka: the virtual observatory
Post on 07-Jan-2016
Embed Size (px)
DESCRIPTIONExperience Building The World Wide Telescope aka: The Virtual Observatory. Jim Gray Alex Szalay. The Evolution of Science. Observational Science Scientist gathers data by direct observation Scientist analyzes data Analytical Science Scientist builds analytical model Makes predictions. - PowerPoint PPT Presentation
Experience Building The World Wide Telescope aka: The Virtual ObservatoryJim GrayAlex Szalay
The Evolution of ScienceObservational Science Scientist gathers data by direct observationScientist analyzes dataAnalytical Science Scientist builds analytical modelMakes predictions.Computational Science Simulate analytical modelValidate model and makes predictions Data Exploration Science Data captured by instruments Or data generated by simulatorProcessed by softwarePlaced in a database / filesScientist analyzes database / files
Information AvalancheIn science, industry, government,. better observational instruments and and, better simulations producing a data avalancheExamplesBaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information CERN: LHC will generate 1GB/s .~10 PB/yVLBA (NRAO) generates 1GB/s todayPixar: 100 TB/MovieNew emphasis on informatics:Capturing, Organizing, Summarizing, Analyzing, Visualizing
Image courtesy C. Meneveau & A. Szalay @ JHUBaBar, StanfordSpace TelescopeP&E Gene Sequencer Fromhttp://www.genome.uci.edu/
World Wide TelescopeVirtual Observatoryhttp://www.ivoa.net/Premise: Most data is (or could be online)The Internet is the worlds best telescope:It has data on every part of the skyIn every measured spectral band: optical, x-ray, radio..As deep as the best instruments (2 years ago).It is up when you are up. The seeing is always great (no working at night, no clouds no moons no..).Its a smart telescope: links objects and data to literature on them.
The WWT ComponentsData SourcesLiteratureArchivesUnified DefinitionsUnits, Semantics/Concepts/Metrics, Representations, ProvenanceObject modelClasses and methodsPortals
Data SourcesLiterature online and cross indexedSimbad, ADS, NED, http://simbad.u-strasbg.fr/Simbad, http://adswww.harvard.edu/, http://nedwww.ipac.caltech.edu/Many curated archives onlineFIRST, DPOSS, 2MASS, USNO, IRAS, SDSS, VizeR,Typically files with English meta-data and some programs Groups, Researchers, Amateurs PublishDatasets online in various formatsDocumentation variesPublications are Ephemeral Unknown provenance
Unified DefinitionsUniversal Content Definitions http://vizier.u-strasbg.fr/doc/UCD.htxCollated all table heads from all the literature100,000 terms reduced to ~1,500Rough consensus that this is the right thing.Refinement in progress as people use UCDsDefines Units: gram, radian, second, ... Semantic Concepts / Metrics Std error, Chi2 fit, magnitude, flux @ passband, velocity,
ProvenanceMost data will be derived.To do science, need to trace derived data back to source.So programs and inputs must be registered.Must be able to re-run them.Example: Space Telescope Calibrated DataRun on demandCan specify software version (to get old answers)Scientific Data Provenance and Curation are largely unsolved problems (some ideas but no science).
Object ModelGeneral acceptance of XML Recent acceptance of XML Schema (XSD over DTD)Wait-and-See about SOAP/WSDL/ Web Services are just Corba with angle brackets.FTP is good enough for me.Personal opinion:Web Services are much more than Corba + Huge focus on interopHuge focus on integrated toolsBut the community says Show me!Many technologists sold, but not the astronomers
Classes and MethodsFirst Class: VO table http://www.us-vo.org/VOTable/VOTable-1-0.htmRepresents an answer set in XMLDefined by an XML Schema (XSD) Metadata (in terms of UCDs)Data representation(numbers and text)First methodCone Search: Get objects in this cone
Other ClassesSpace-Time class http://hea-www.harvard.edu/~arots/nvometa/STCdoc.pdf Image Class (returns pixels)SdssCutoutSimple Image Access Protocol http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/ACF8DE.pdfHyperAtlas http://bill.cacr.caltech.edu/usvo-pubs/files/hyperatlas.pdfSpectral Simple Spectral Access Protocol 500K spectra available at http://voservices.net/waveQuery ServicesADQL and SkyNode http://skyservice.pha.jhu.edu/develop/vo/adql/Registry: see below
The RegistryUDDI seemed inappropriateComplex Irrelevant questionsRelevant questions missingEvolved Dublin CoreRepresent Datasets, Services, PortalsNeeds to be machine readableFederation (DNS model)Push & Pull: register then harvesthttp://www.ivoa.net/twiki/bin/view/IVOA/IvoaResReg
SkyQueryA Prototype WWTStarted with SDSS data and schemaImported about 9 other datasets into that spine schema. Unified them with a portal Implicit spatial join among the datasets.All built on Web ServicesPure XMLPure SOAPUsed .NET toolkit
DemoSkyServer: navigator showing cutout web serviceList: showing many calls and variant use.SkyQuery:Show integration of various archives.Explain spatial join xMatch operator.
MyDBPortal allows federation of data butIntermediate results may be large. Intermediate results feed into next analysis step.Sending them back-and-forth to client is costly and sometimes infeasible.Solution: create a working DB for client at Portal: MyDB
MyDBAnyone can create a personal DB at SkyServer portal. It is about 100 MBIt is privateSimple queries done immediatelyComplex queries done by batch schedulerAll queries can create/read/write MyDB tablesVery popular with serious users.MyDB will be sharable with by a group.
Open SkyQuerySkyQuery being adopted by AstroGrid as reference implementation for OGSA-DAI (Open Grid Services Architecture, Data Access and Integration).SkyNode basic archive object http://www.ivoa.net/twiki/bin/view/IVOA/SkyNodeSkyQuery Language (VoQL) is evolving. http://www.ivoa.net/twiki/bin/view/IVOA/IvoaVOQL
The WWT ComponentsOutlineData SourcesLiteratureArchivesUnified DefinitionsUnits, Semantics/Concepts/Metrics, Representations, ProvenanceObject modelClasses and methodsPortalsWWT is a poster child for the Data Grid.What we learnedAstro is a community of 10,000 Homogenous & CooperativeIf you cant do it for Astro, do not bother with 3M bio-info.Agreement Takes time Takes endless meetings Big problems are non-technicalLegacy is a big problem.Plumbing and tools are there ButWhat is the object modelWhat do you want to save.How document provenance.