data mining and virtual observatory

41
1 Data Mining and Virtual Data Mining and Virtual Observatory Observatory Yanxia Zhang National Astronomical Observatories,CAS DEC.2 2004

Upload: easter

Post on 13-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Data Mining and Virtual Observatory. Yanxia Zhang National Astronomical Observatories,CAS DEC.2 2004. Outline. Why What How. Astronomy is Facing a Major “ Data Avalanche” : - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining and Virtual Observatory

1

Data Mining and Virtual Data Mining and Virtual ObservatoryObservatory

Yanxia Zhang

National Astronomical Observatories,CAS

DEC.2 2004

Page 2: Data Mining and Virtual Observatory

2

OutlineOutline

Why

What

How

Page 3: Data Mining and Virtual Observatory

3

Astronomy is Astronomy is Facing a Major Facing a Major Data AvalancheData Avalanche

Astronomy is Facing a Major “Data Avalanche”:

Multi-Terabyte Sky Surveys and Archives (Soon: Multi-Petabyte), Billions of Detected Sources, Hundreds of Measured Attributes per Source …

Page 4: Data Mining and Virtual Observatory

4

Understanding of Complex Astrophysical Phenomena Requires Complex and Information-Rich Data Sets,and the Tools to Explore them …

… This Will Lead to a Change in the nature of the Astronomical Discovery Process …

… Which Requires A New Research Environment for Astronomy: VO

Necessity Is the Mother of Invention

Page 5: Data Mining and Virtual Observatory

5

DM

Database system,Data warehouse,

OLAPstatistics

Other disciplines

Information science

ML&AI Visualization

DM: Confluence of Multiple Disciplines

Page 6: Data Mining and Virtual Observatory

6

What is DM?What is DM?

The search for interesting patterns,

in large databases,

that were collected for other applications,

using machine learning algorithms,

high-performance computers

and others methods

for science and society!

Page 7: Data Mining and Virtual Observatory

7

Data Mining: A KDD ProcessData Mining: A KDD Process

Data mining: the core of knowledge discovery process.

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 8: Data Mining and Virtual Observatory

8

Data Mining Data Mining

Increasing potentialto support decisions

End User

scientist Analyst

DataAnalyst

DBA

KwonledgeDiscovery

Data Presentation

Visualization Techniques

Data MiningInformation Discovery

Data ExplorationOLAP, MDA,

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

Data Sources(Paper, Files, Information Providers, Database Systems, OLTP)

Page 9: Data Mining and Virtual Observatory

9

Architecture: Typical Data Mining SystemArchitecture: Typical Data Mining System

Data Warehouse

Data cleaning & data integration Filtering

Databases

Database or data warehouse server

Data mining engine

Pattern evaluation

Graphical user interface

Knowledge-base

Page 10: Data Mining and Virtual Observatory

10

The ratio of every DM stepThe ratio of every DM step

0

10

20

30

40

50

60

Decide target Data preparing Data mining Evaluation

Page 11: Data Mining and Virtual Observatory

11

DM: On What Kind of Data?DM: On What Kind of Data?

Relational databases Data warehouses Transactional databases Advanced DB systems and information repositories

Object-oriented and object-relational databases

Spatial databases Time-series data and temporal data

Text databases and multimedia databases

Heterogeneous and legacy databases

WWW

Page 12: Data Mining and Virtual Observatory

12

Data Mining FunctionalityData Mining Functionality

Concept description

Association

Classification and Prediction

Clustering

Time-series analysis

Other pattern-directed or statistical analysis

Page 13: Data Mining and Virtual Observatory

13

RA Dec

WavelengthTime

Flux

Taking a Broader View: The Observable Parameter Space

Along each axis the measurements are characterized by the position, extent, sampling and resolution. All astronomical measurements span some volume in this parameter space.

Propermotion

Non-EM …

Polarization

Morphology / Surf.Br.

What is the coverage?Where are the gaps?Where do we go next?

Page 14: Data Mining and Virtual Observatory

14

How and Where are Discoveries Made?How and Where are Discoveries Made?

Conceptual Discoveries: e.g., Relativity, QM, Brane World, Inflation … Theoretical, may be inspired by observations

Phenomenological Discoveries: e.g., Dark Matter, QSOs, GRBs, CMBR, Extrasolar Planets, Obscured Universe …

Empirical, inspire theories, can be motivated by them

New TechnicalCapabilities

ObservationalDiscoveries Theory

IT/VO (VO)

Phenomenological Discoveries:

Pushing along some parameter space axis VO useful

Making new connections (e.g., multi-) VO critical!

Understanding of complex astrophysical phenomena requirescomplex, information-rich data (and simulations?)

Page 15: Data Mining and Virtual Observatory

15

Exploration of observable parameter spaces and Exploration of observable parameter spaces and searches for searches for rare or new types of objectsrare or new types of objects

Page 16: Data Mining and Virtual Observatory

16

But Sometimes You Find a Surprise…But Sometimes You Find a Surprise…

Page 17: Data Mining and Virtual Observatory

17

Precision Cosmology and LSSPrecision Cosmology and LSS Better matching of theory and observations Better matching of theory and observations

DPOSS Clusters (Gal et al.) LSS Numerical Simulation (VIRGO)

Clustering on a clustered background Clustering with a nontrivial topology

Page 18: Data Mining and Virtual Observatory

18

A Possible Example of an “Orphan Afterglow” (GRB?) discovered in DPOSS: an 18th mag transient associated with a 24.5 mag galaxy.At an estimated z ~ 1, the observed brightness is ~ 100 times that of a SN at the peak.

Or, is it something else, new?

Exploration of the Time Domain: Optical Transients

DPOSS

Keck

Page 19: Data Mining and Virtual Observatory

19

Exploration of the Time Domain:Exploration of the Time Domain:Faint, Fast Transients (Tyson et al.)Faint, Fast Transients (Tyson et al.)

Page 20: Data Mining and Virtual Observatory

20

Comparison between HI, H, and 100 Diffuse Emission

IRAS 100 Micron ImageDPOSS red image

Brunner et al.

Exploring the Low Surface Brightness(Low Contrast) Universe

Page 21: Data Mining and Virtual Observatory

21

Background Enhancement Technique demonstratedon two knownM31 dwarf spheroidals

(Brunner et al.)

Page 22: Data Mining and Virtual Observatory

22

Data Mining in the Image Domain: Can We Discover NewTypes of Phenomena Using Automated Pattern Recognition?(Every object detection algorithm has its biases and limitations)

Page 23: Data Mining and Virtual Observatory

23

An OLAM ArchitectureAn OLAM Architecture

Data Warehouse

Meta Data

MDDB

OLAMEngine

OLAPEngine

User GUI API

Data Cube API

Database API

Data cleaning

Data integration

Layer3

OLAP/OLAM

Layer2

MDDB

Layer1

Data Repository

Layer4

User Interface

Filtering&Integration Filtering

Databases

Mining query Mining result

Page 24: Data Mining and Virtual Observatory

24

View of Warehouses and HierarchiesView of Warehouses and Hierarchies

Importing data Table Browsing Dimension creation Dimension browsing Cube building Cube browsing

Page 25: Data Mining and Virtual Observatory

25

Selecting a Data Mining TaskSelecting a Data Mining Task

Major data mining functions: Summary

(Characterization) Association Classification Prediction Clustering Time-Series Analysis

Page 26: Data Mining and Virtual Observatory

26

Mining Characteristic RulesMining Characteristic Rules

Characterization: Data

generalization/summarization at

high abstraction levels.

An example query: Find a

characteristic rule for Cities

from the database ‘CITYDATA'

in relevance to location,

capita_income, and the

distribution of count% and

amount%.

Page 27: Data Mining and Virtual Observatory

27

Browsing a Data CubeBrowsing a Data Cube

Powerful visualization OLAP capabilities Interactive manipulation

Page 28: Data Mining and Virtual Observatory

28

Visualization of Data Dispersion: Boxplot Visualization of Data Dispersion: Boxplot AnalysisAnalysis

Page 29: Data Mining and Virtual Observatory

29

Mining Association Rules ( Table Form )Mining Association Rules ( Table Form )

Page 30: Data Mining and Virtual Observatory

30

Association Rule in Plane FormAssociation Rule in Plane Form

Page 31: Data Mining and Virtual Observatory

31

Association Rule GraphAssociation Rule Graph

Page 32: Data Mining and Virtual Observatory

32

Mining Classification RulesMining Classification Rules

Page 33: Data Mining and Virtual Observatory

33

Prediction: Numerical DataPrediction: Numerical Data

Page 34: Data Mining and Virtual Observatory

34

Prediction: Categorical DataPrediction: Categorical Data

Page 35: Data Mining and Virtual Observatory

35

DMiner: ArchitectureDMiner: Architecture

Graphic User Interface

Infrared DB ……. DB Radio DB

Comparator

Characterizer

Classifier

Cluster Analyzer

Associator

Future Modules Future Modules

Database and Cube Server

Optical DB

Page 36: Data Mining and Virtual Observatory

36

Image features

Keywords

WordNet

Keyword Hierarchy

Metadata

Pre-built Concept Hierarchiesfor colour, texture, format, etc.

Pre-processingData Cubes and

Numeric Hierarchies Real-time Interaction

Pattern discoveries

A System Prototype for MultiMedia Data Mining

Internet Domain Hierarchy

Simon Fraser University

WWW

Page 37: Data Mining and Virtual Observatory

37

WWW

Media Descriptors

Data CubeDimensions

Mining Engine

Discoveries

Database

Simon Fraser University

Page 38: Data Mining and Virtual Observatory

38

WebLogMiner ArchitectureWebLogMiner Architecture

Web log is filtered to generate a relational database

A data cube is generated form database OLAP is used to drill-down and roll-up in the

cube OLAM is used for mining interesting knowledge

1Data Cleaning

2Data CubeCreation

3OLAP

4Data Mining

Web logDatabase

Data Cube Sliced and dicedcube

Knowledge

Page 39: Data Mining and Virtual Observatory

39

VO: Conceptual ArchitectureVO: Conceptual Architecture

Data ArchivesData Archives

Analysis toolsAnalysis tools

Discovery toolsDiscovery toolsUser

Gateway

Page 40: Data Mining and Virtual Observatory

40

ConclusionConclusion

◆ Development and application of DM in astronomy;

◆ Automated DM, visulized DM and audio DM;

◆ Integrate VO and DM.

The next golden age of discovery in astronomy

come eariler!

Page 41: Data Mining and Virtual Observatory

41

Q&A?Q&A?

Thank you !!!Thank you !!!