1 scientific data management (sdm) center for enabling technologies (cet) lead institution: lbnl...

22
1 Scientific Data Management (SDM) Scientific Data Management (SDM) Center For Enabling Technologies (CET) Center For Enabling Technologies (CET) Lead Institution: LBNL Lead Institution: LBNL Coordinating PI: Arie Shoshani Coordinating PI: Arie Shoshani

Upload: job-williams

Post on 28-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

1

Scientific Data Management (SDM)Scientific Data Management (SDM)

Center For Enabling Technologies (CET)Center For Enabling Technologies (CET)

Lead Institution: LBNLLead Institution: LBNL

Coordinating PI: Arie ShoshaniCoordinating PI: Arie Shoshani

Page 2: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

2

Scientific Data Management CenterScientific Data Management Center

Center PI: Arie Shoshani LBNL

DOE Laboratories co-PIs:

Bill Gropp, Rob Ross* ANLArie Shoshani, Doron Rotem LBNLTerence Critchlow*, Chandrika Kamath LLNLNagiza Samatova* ORNL

Universities co-PIs :Mladen Vouk North Carolina State Alok Choudhary Northwestern Bertram Ludaescher, Ilkay Altinas UC Davis + SDSCSteve Parker U of Utah

* Area Leaders

Participating Institutions

Page 3: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

3

A Typical SDM ScenarioA Typical SDM Scenario

Control Flow Layer

Applications &Software Tools

Layer

I/O System Layer

Storage & NetworkResouces

Layer

Flo

w T

ier

Wo

rk T

ier

+

DataMover

SimulationProgram

ParallelR

PostProcessing

TerascaleBrowser

Task A:Generate

Time-Steps

Task B:Move TS

Task C:Analyze TS

Task D:Visualize TS

ParallelNetCDF

PVFS SabulHDF5

LibrariesSRM

Page 4: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

4

Technology Details by LayerTechnology Details by Layer

Hardware, OS, and MSS (HPSS)

WorkFlowManagement

Tools

Web Wrapping

Tools

EfficientParallel

Visualization(pVTK)

Efficientindexing(Bitmap Index)

DataAnalysis

tools(PCA, ICA)

ASPECT:integration Framework

Parallel NetCDFSoftware

Layer

ParallelVirtual

FileSystem

StorageResourceManager

(To HPSS)

ROMIOMPI-IOSystem

DataMining &Analysis(DMA)Layer

StorageEfficientAccess(SEA)Layer

ScientificProcess

Automation(SPA)Layer

Hardware, OS, and MSS (HPSS)

WorkFlowManagement

Tools

Web Wrapping

Tools

EfficientParallel

Visualization(pVTK)

Efficientindexing(Bitmap Index)

DataAnalysis

tools(PCA, ICA)

ASPECT:integration Framework

Parallel NetCDFSoftware

Layer

ParallelVirtual

FileSystem

StorageResourceManager

(To HPSS)

ROMIOMPI-IOSystem

DataMining &Analysis(DMA)Layer

StorageEfficientAccess(SEA)Layer

ScientificProcess

Automation(SPA)Layer

Analysis

Parallel R

Statistical

Page 5: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

5

Example Data Flow in TSIExample Data Flow in TSI

InputData

HighlyParallelCompute

Output~500x500files

Aggregate to ~500 files (< 2 to 10+ GB each)

Archive

Data Depot

Logistic NetworkL-Bone

Local MassStorage 14+TB)

Aggregate to one file (1+ TB each)

VizWall

Viz Client

Local 44 Proc.Data Cluster- data sits on local nodes for weeks

Viz Software

Logistical Network

Courtesy: John Blondin

Page 6: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

6

Using the Scientific Workflow Tool (Kepler)Using the Scientific Workflow Tool (Kepler)Emphasizing Dataflow Emphasizing Dataflow (SDSC, NCSU, LLNL)(SDSC, NCSU, LLNL)

Automate data generation, transfer and visualization of a large-scale simulation at ORNL

Page 7: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

7

FastBitFastBit

A compressed bitmap indexing A compressed bitmap indexing technology for efficient searching of technology for efficient searching of

read-only dataread-only data

http://sdm.lbl.gov/fastbithttp://sdm.lbl.gov/fastbit

Page 8: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

8

FastBit OverviewFastBit Overview

• FastBit is designed to search multi-FastBit is designed to search multi-dimensional datadimensional data• Conceptually in table format

• rows objects• columns attributes

• FastBit uses vertical (column-FastBit uses vertical (column-oriented) organization for the dataoriented) organization for the data• Efficient for analysis of read-only data

• FastBit uses FastBit uses compressed bitmap compressed bitmap indicesindices to speed up searches to speed up searches• Proven in analysis to be optimal for single-

attribute queries• Superior to other optimal indices because they

are also efficient for multi-attribute queries

rowcolumn

Page 9: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

9

Basic Bitmap IndexBasic Bitmap Index

• Compact: one bit per Compact: one bit per distinct value per objectdistinct value per object

• Easy to build: faster than Easy to build: faster than common B-treescommon B-trees

• Efficient to query: only Efficient to query: only bitwise logical operationsbitwise logical operations

• A < 2 b0 OR b1

• 2<A<5 b3 OR b4

• Efficient for multi-Efficient for multi-dimensional queriesdimensional queries• Use bitwise operations to

combine the partial results

Datavalues015312041

100000100

010010001

000001000

000100000

000000010

001000000

=0 =1 =2 =3 =4 =5

b0 b1 b2 b3 b4 b5

Page 10: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

10

Grid Collector FeaturesGrid Collector Features

Key features of the Grid Collector:Key features of the Grid Collector:• Providing transparent object access• Selecting objects based on their attribute values• Improving analysis system’s throughput• Enabling interactive distributed data analysis

Page 11: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

11

Grid Collector Speeds up Grid Collector Speeds up AnalysesAnalyses

0

1

2

3

4

5

0 0.2 0.4 0.6 0.8 1

selectivity

sp

ee

du

p

Sample 1

Sample 2

Sample 3

• Test machine: 2.8 GHz Xeon, 27 MB/s read speedTest machine: 2.8 GHz Xeon, 27 MB/s read speed

• When searching for rare events, say, selecting one event out of 1000, When searching for rare events, say, selecting one event out of 1000, using GC is using GC is 20 to 5020 to 50 times faster times faster

• Using GC to read 1/2 of events, speedup > 1.5, 1/10 events, Using GC to read 1/2 of events, speedup > 1.5, 1/10 events, speed up > 2.speed up > 2.

1

10

100

1000

0.00001 0.0001 0.001 0.01 0.1 1

selectivity

sp

ee

du

p

Sample 1

Sample 2

Sample 3

Page 12: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

12

FastBit-Based Multi-Attribute Region FastBit-Based Multi-Attribute Region Finding is Theoretically OptimalFinding is Theoretically Optimal

On 3D data with over On 3D data with over 110 million points110 million points,,

region finding takes region finding takes less than 2 secondsless than 2 seconds

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

10000 110000 210000 310000 410000

Number of line segments

reg

ion

gro

win

g t

ime

(sec

)Flame Front discovery

(range conditions for multiple measures) in a combustion simulation (Sandia)

Time required to identify regions in 3D Supernova simulation (LBNL)

Page 13: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

13

Objects On-Demand: Objects On-Demand:

from Files to Object Managementfrom Files to Object Management

A Scientific Application Partership (SAP)A Scientific Application Partership (SAP)

Lead Institution: BNLLead Institution: BNL

Coordinating PI: Jerome LauretCoordinating PI: Jerome Lauret

Page 14: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

14

Participating InstitutionsParticipating Institutions

• Participating Institutions Participating Institutions • BNL : Jerome Lauret• LBNL : John Wu• SLAC: Andy Hanushevsky

• TechnologiesTechnologies• FastBit• SRM (DRM, HRM)• xrootd

Page 15: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

15

Xrootd:Single Level SwitchXrootd:Single Level Switch

ClientClient RedirectorRedirector(Head Node)

Data ServersData Servers

open file X

AA

BB

CC

go to C

open file X

Who has file X?

I have

Cluster

Client sees all servers as xrootd data serversClient sees all servers as xrootd data servers

2nd open X

go to C

RedirectorsRedirectorsCache fileCache filelocationlocation

Page 16: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

16

Xrootd:Single Level SwitchXrootd:Single Level Switch

ClientClient RedirectorRedirector(Head Node)

Data ServersData Servers

open file X

AA

BB

CC

go to C

open file X

Who has file X?

I have

Cluster

Client sees all servers as xrootd data serversClient sees all servers as xrootd data servers

2nd open X

go to C

RedirectorsRedirectorsCache fileCache filelocationlocation

DRM

DRM

DRM

HRM

MSS

archive

Page 17: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

17

Objects on-demandObjects on-demand

xrootd

Page 18: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

18

Storage Resource Management (SRM)Storage Resource Management (SRM)

Center For Enabling Technologies (CET)Center For Enabling Technologies (CET)

Lead Institution: LBNLLead Institution: LBNL

Coordinating PI: Alex SimCoordinating PI: Alex Sim

Page 19: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

19

Participating InstitutionsParticipating Institutions

• BNL : Jerome LauretBNL : Jerome Lauret

• FNAL : Don Petravick, Timur PerelmutovFNAL : Don Petravick, Timur Perelmutov

• TJNAF : Andy KowalskiTJNAF : Andy Kowalski

• LBNL : Alex Sim, Arie ShoshaniLBNL : Alex Sim, Arie Shoshani

• UCSD : Abhishek Singh RanaUCSD : Abhishek Singh Rana

• U. of Wisc: Miron LivnyU. of Wisc: Miron Livny

Page 20: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

20

Proposed workProposed work

• Development of new functional features as part Development of new functional features as part of the SRM collaboration (coordinated by LBNL)of the SRM collaboration (coordinated by LBNL)• Authorization• Monitoring• Performance estimation

• Development of new versions of SRMs by Development of new versions of SRMs by participating institutionsparticipating institutions• Disk systems and HPSS (LBNL)• dCache (FNAL)• Jasmine (TJNAF)

Page 21: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

21

New AspectsNew Aspects

• Development of monitoring components for Development of monitoring components for bandwidth and networking availability bandwidth and networking availability (U. Wisc, FNAL)(U. Wisc, FNAL)• Better control of SRM behavior• Performance estimation

• Integration of Lambda station interface into the Integration of Lambda station interface into the SRM middleware (FNAL)SRM middleware (FNAL)

• Development of an authorization framework Development of an authorization framework (UCSD)(UCSD)• To enforce access privileges• used by SRMs for policy declaration by VO and Sites

Page 22: 1 Scientific Data Management (SDM) Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

22

SRM CollaborationSRM Collaboration

• Continued support of SRMs in experiments and projectsContinued support of SRMs in experiments and projects• ATLAS (BNL, FNAL)• CLAS (TJNAF)• CMS (FNAL)• CPES (LBNL)• ESG (LBNL)• Lattice QCD (TJNAF, FNAL)• Phenix (BNL)• STAR (BNL, LBNL)

• Coordination with other centers and institutesCoordination with other centers and institutes(including LCG, RAL, EGEE).(including LCG, RAL, EGEE).

• Goal: joint specification of SRM through regular Goal: joint specification of SRM through regular meetings, joint documents, and GGF participationmeetings, joint documents, and GGF participation