the vao is operated by the vao, llc. ashish mahabal ([email protected])[email protected]...
TRANSCRIPT
The VAO is operated by the VAO, LLC.
Ashish Mahabal ([email protected])
Ciro DonalekMatthew Graham
Ray PlanteGeorge Djorgovski
Data 2 Knowledge study project
VAO-LSST Meeting, NOAO, 24 March 2011
March 23, 2011Ashish Mahabal
2
Goals
• Feasibility study•What is out there• What is needed
• Milestones• What can be done
Exploration of observable parameter spaces and searches for rare or new types of objects
Djorgovski
March 23, 2011Ashish Mahabal
4
Overview – many connections
Astroinformatics (next meeting in Sep. 2011) VOStat and other R/Statistics tools Data challenges Various sky surveys
Related issues Semantics Classification/characterization Distributed data GPUs
Focus on time domain
March 23, 2011Ashish Mahabal
Focus on time-domain5
Expertise, and it encompasses all aspects of data mining (save one)Plus, real-time forces us to be fast.
Portfolio building – growing columns of tablesBayesian networks utilizing auxiliary informationLightcurve techniques for characterizing objects
March 23, 2011Ashish Mahabal
Missing stat and CS tools6
March 23, 2011Ashish Mahabal
Missing stat and CS tools7
Bootstrap aggregatingMixture of expertsBoostingSimulated annealingSemi-supervised learning….
From IVOA KDD User guide for Data Mining (Nick Ball)
March 23, 2011Ashish Mahabal
8
Science goal: to solve the growing gap between the huge generation of data and our understanding of it
Data Gathering (e.g., new generation instruments …)
Data Farming: Storage/ArchivingIndexing, SearchabilityData Fusion, Interoperability, ontologies, etc.
Data Mining (or Knowledge Discovery in Databases):Pattern or correlation searchClustering analysis, automated classificationOutlier / anomaly searchesHyperdimensional visualizationData visualization and understanding
Computer aided understandingKDDEtc.New Knowledge
Data storage , PbytesData access >103 access
Scalability: Petaflops, ExaflopsComputing power (multicore)Algorithm: parallelismVisualization: N-dimensional
March 23, 2011Ashish Mahabal
9
Currently on the plate
• DAME• Knime (Konstanz Information Miner)• Orange (Visual/python)• Weka (ML/Java)• Rapidminer (standalone)
March 23, 2011Ashish Mahabal
10
Comparison matrix for DM/Viz tools
Accuracy Scalability Interpretability Usability Robustness Versatility Speed Popularity
March 23, 2011Ashish Mahabal
11
Related activities
Skyalert integration (Graham) – adding data and methods Solicitation of examples from community
WD, Blazars’ example Making R more astronomy friendly
Various datasets Differing number of rows, columns For supervised/unsupervised classification
TA on GPUs – incorporate in pipeline
March 23, 2011Ashish Mahabal
Slide from Budavari12
CUDA zone, PyCUDA, …
March 23, 2011Ashish Mahabal
VAO People working on this13
• Ashish Mahabal, Ciro Donalek, Matthew Graham, George Djorgovski (Caltech)
• Ray Plante (NCSA)
• But we are in touch with many others in astro/CS/stats and relying on many groups including LSST transients and informatics working groups