grid-based data stream processing in e-science

Post on 07-Jan-2016

35 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Grid-based Data Stream Processing in e-Science. Richard Kuntschke 1 , Tobias Scholl 1 , Sebastian Huber 1 , Alfons Kemper 1 , Angelika Reiser 1 , Hans-Martin Adorf 2 , Gerard Lemson 3 , and Wolfgang Voges 3. 2 Max-Planck-Institut 2 für Astrophysik. 3 Max-Planck-Institut - PowerPoint PPT Presentation

TRANSCRIPT

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science 1

Grid-basedData Stream Processingin e-Science

Richard Kuntschke1, Tobias Scholl1, Sebastian Huber1,Alfons Kemper1, Angelika Reiser1,Hans-Martin Adorf2, Gerard Lemson3, and Wolfgang Voges3

1Lehrstuhl Informatik III:1Datenbanksysteme1Fakultät für Informatik1Technische Universität München

2Max-Planck-Institut2für Astrophysik

3Max-Planck-Institut3für extraterrestrische3Physik

2

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Important Challenges in e-Science

In general: Large and exponentially growing amounts of data Distributed data archives No unique identifiers Uncertainty

In astrophysics:Spectral Energy Distributions (SEDs)

Used to classify celestial objects (active galactic nuclei, brown dwarfs, neutron stars, ...)

Generation requires spatial (astrometric) matching

3

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Spatial (Astrometric) Matching

Current solutions … … load all data into main memory

Uses a lot of memory Infeasible if memory size is insufficient

… process all data at once and deliver the complete result at the end Inefficient No results until all processing has completed

4

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Our Contributions

StarGlobe Grid-based P2P Data

Stream Management System implemented on top of Globus

In-network processing Early filtering Parallelization Pipelining Load-balancing

Mobile user-defined operators

Astrophysical Example Workflow Astrometric matching Performance evaluation

5

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

The StarGlobe Architecture

Super-Peer BackboneQuery 1

Stream 0

Publish

Subscribe

filter

transform

Load mobile operators

Fct-Provider

filter

transform

Stream 1

Publish

Query 2

Subscribe

6

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Traditional Approach: Bring Data to Code

union

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. BT

... ... ...

... ... ...

... ... ...

Data-Prov. CT

... ... ...

... ... ...

... ... ...

Data-Prov. DT

... ... ...

... ... ...

... ... ...

Data-Prov. A

7

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

New Approach: Bring Code to Data

T... ... ...... ... ...... ... ...

Data-Prov. A

scan

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. B

scan

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. C

scan

NN_10

T... ... ...... ... ...... ... ...

Data-Prov. D

scan

NN_10

union

NN_10

Fct-Provider

NN_10

8

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Mobile User-Defined Operators

Load user-defined operators from function provider servers in the network

Common interface for integrating external operators

Push-based iterator

Flexibility

9

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

StreamIterator Interface

open(Config, StreamWriter) Configuration parameters Writer for result stream

next(StreamIteratorEvent) Next element in input stream Writing output to result stream using

StreamWriter.write() close()

10

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Communication betweenStreamProcessor and StreamIterator

StreamIterator

StreamHandler 1

StreamHandler 2

StreamHandler n

StreamWriter

StreamProcessor

...

XML InputStream 1

XML InputStream 2

XML InputStream n

XML OutputStream

...

Item 1 Item 2 Item n Result Item

11

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Astrophysical Example Workflow

peer-10

peer-9 peer-8

peer-6 peer-4 peer-5peer-7

peer-2peer-1peer-0 peer-3

Input ListRASS-BSC

2MASS FIRST USNOB1

NVSS GSC-2

SED assembly

12

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Distributed Query Evaluation Planplan-10

at peer-10

plan-8at peer-8

plan-5at peer-5enrichσ-5

transform-5

stream-5

χ²filter-2

join-2

plan-4at peer-4enrichσ-4

transform-4

stream-4

plan-9at peer-9

χ²filter-3

join-3

plan-7at peer-7

plan-2at peer-2enrichσ-2

transform-2

stream-2

plan-3at peer-3enrichσ-3

transform-3

stream-3

χ²filter-1

join-1

plan-6at peer-6

χ²filter-0

join-0

plan-1at peer-1enrichσ-1

transform-1

stream-1

χ²filter-4

join-4

display

plan-0at peer-0enrichσ-0

transform-0

stream-0

13

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Distributed Query Evaluation Plan

plan-6at peer-6

χ²filter-0

join-0

plan-1at peer-1enrichσ-1

transform-1

stream-1

plan-0at peer-0enrichσ-0

transform-0

stream-0

14

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Distributed Query Evaluation Plan

plan-10at peer-10

χ²filter-4

join-4

display

15

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Evaluation of Early Filtering

16

Lehrstuhl Informatik III: Datenbanksysteme

Grid-based Data Stream Processing in e-Science

Conclusion

Synergies between research in computer science and other scientific disciplines, e.g., astrophysics

StarGlobe Handling large data volumes efficiently

Early filtering, parallelization, pipelining Returning first results early on

Pipelining Flexible support of domain-specific application logic

Mobile user-defined operators

Results also applicable to other domains

top related