implementation of a streaming database management system on a blue gene architecture for measurement...

20
Implementation of a streaming database management system on a Blue Gene architecture for measurement data processing. Erik Zeitler Uppsala data base lab www.it.uu.se/research/group/udbl

Upload: bilal-harman

Post on 15-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Implementation of astreaming database management system

on a Blue Gene architecture for measurement data processing.

Erik ZeitlerUppsala data base lab

www.it.uu.se/research/group/udbl

Looking out into space:Use large radio telescopes!

Problem:Size matters

We have hit the limit

Use many large radio telescopes?

Augment the measurements using signal processingThey act together as a HUGE telescope

• Look in one direction only • Expensive…

SolutionUse a huge amount of small antennas

This enables new scientific applications (and challenges)

}• Broad band

• Multi direction receivers

Scientific applications

• Re-ionization epoch• the 1st 105 years – hydrogen forming

• Deep Extragalactic Surveys• To boldly go…

• Transient Sources• All-sky surveys of

– gamma bursts– flare stars– supernovae

• Ultra High Energy Cosmic Rays• Pulsars

Antennas, antennas, antennas…

• Broad band radio receiver• 80…300 MHz, 3 dimensions

• Produces 0.9 Gbps raw data

• Central site + 20 outstations

located within a circular area, diameter 350 km

13103 antennas

System overview

• Antennas• Basic beam forming

• FPGAs

• Network• GbE, 10GbE

• Central Processing facility• Linux clusters, IBM Blue Gene/L

• Off line analysis• PCs, workstations, Blue Gene

System overview

Central processing tasks

• FFT

• Signal correlation

• Calibration• RFI mitigation (noise from human activities)

• Stratosphere plasma

• Subtracting known objects

• Transient analysis• Peak detection

Computing challenges

• Multiple incoming data streams• 20 Tbps

• Multiple experiments• Complex computations

• Demand for rapid reconfiguration of computing systems• Use case: On-line transient analysis

Central processing facilities

• On line processing• Linux cluster (buffering)

• Light weight BG/L (beam)• 6 racks 6144 compute nodes + 96 I/O nodes

• Off-line processing• Linux clusters, SAN, GRID, …

Blue GeneDataflow supercomputer

• LLNL installation: 64 racks (65536 CPUs)

70 TFLOPS on the size of a tennis court

BG/L architecture• I/O node:

• 2x PPC440@700MHz• Linux• Each I/O node coordinates 64 compute nodes• 512 MB RAM

• Compute node:• 2x PPC440@700MHz

• Single threaded light weight OS• Typically:

– 1 CPU for computation– 1 CPU for communication

• 512 MB RAM

Co

ntin

uo

us

qu

ery

Qu

ery

resu

lt st

rea

m

(Scientist)user

BG/L dataflow computerIncoming

measurement datastreams

Co

ntin

uo

us

qu

ery

Qu

ery

resu

lt st

rea

m

Co

ntin

uo

us

qu

ery

Qu

ery

resu

lt st

rea

m

(Scientist)user

User agent

UDBL project

• Implement a very high performance stream database manager• based on AmosII DB kernel (http://user.it.uu.se/~udbl/amos/)

• Utilize the BG/L computing environment for• scalable data stream queries• involving user-defined computations

• Implement specialized query optimization:• Planning BG/L node configuration for given stream queries• Re-configuration when interesting phenomena occur

This far (after 4 months)• Implementing primitives for data ~

• Computation• Aggregation• Communication• Fusion

• Proof of concept cases• Signal processing• Peak detection• Stream join

• Benchmark• Based on real LOFAR/LOIS data• Performance analysis for stream databases

A simple example• gnuplot(peakdetect(vector_elements(winagg(vector_elements(readlofarvectorfile("temp.DAT")),256,256))));

Other application areas

• Other space physics research areas• projects at IRFU

• Network traffic analysis

• Financial (stock market) information

• Content analysis of streaming media

Questions?