lcg /aa/root 1 proposal for improvements rene brun 14 january 2004 lcg/aa/root relationship some...

76
LCG /AA/ROOT 1 Proposal for Improvements Rene Brun 14 January 2004 LCG/AA/ROOT Relationship Some slides of this talk were presented at the Architects Forum 30 October 2003

Upload: archibald-dennis

Post on 27-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

LCG /AA/ROOT 1

Proposal for Improvements

Rene Brun14 January 2004

LCG/AA/ROOT Relationship

Some slides of this talk were presented at the Architects Forum 30 October 2003

LCG /AA/ROOT relationship 2

Applications Area Organisation

Applicationsmanager

Architectsforum

Applicationsarea

meeting

Simulationproject

PIproject

SEALproject

POOLproject

SPIproject

decisionsstrategy

consultation

ROOTUser - provider

The ‘user/provider’ relationship is working

Good ROOT/POOL cooperation . POOL gets needed modifications, ROOT gets debugging/development input

ROOT will be the principal analysis tool; full access to its capability is a blueprint requirement

ALICE directly using ROOT as their framework

Torre

LCG /AA/ROOT relationship 3

User/Provider relationship

• It works in the sense that teams did not show unwillingness to cooperate.

• The cooperation is ambiguous. The wheel is reinvented many times.

• The duplication of efforts will give problems in the near future (dictionaries, plug-in managers, collections and many more (coming slides))

• Manpower would be better used in improving critical areas.

• Alice has not joined the train.

LCG /AA/ROOT relationship 4

User/Provider relationship

• The current orientation is OK if the idea is to use ROOT as a back-end in a few places and alternative solutions are seriously considered with clear deliverables.

• If ROOT is the choice for the two essential areas: event storage and interactive data analysis, this has important implications.• In this case the user/provider relationship is not

appropriate:• ROOT must be better integrated in the LCG. This has

implications for the LCG/AA plans and also for the ROOT planning.

LCG /AA/ROOT relationship 5

Motivation for this presentation

• We have two options in front of us:• Continue the current process assuming that everything is

OK in the best of the worlds. ROOT is happy, LCG/AA is happy.

• Take advantage of the useful internal review to rethink the general orientation.

• We have a unique opportunity now, with enough experience with all the projects, to take the necessary actions to decrease the entropy in the interest of the LHC and also non-LHC users.

• We must capitalize on one year of useful experience in AA to setup a convergent and coherent process.

LCG /AA/ROOT relationship 6

MAIN Motivation

Make it simplerfor our users

Current system is too complexFar too many layers

LCG /AA/ROOT relationship 7

Plan of talk

• In the following slides, I review the main projects: POOL, SEAL, SIMU and PI,ARDA with a proposal for a better integration with ROOT.

• I start with a few slides indicating where we are with ROOT. Our current developments are relevant to the LCG work.

SEAL: single dictionary,plug-in manager, mathlibs

POOL: collections, performance, goals

SIMU: VMC, geometry and geometries interfaces

PI: what next?

ARDA: Distributed Analysis and ROOT/PROOF

SPI: using/moving to the infrastructure

LCG /AA/ROOT 1

ROOT status

• Version 3.05/07 released in July 2003

• Version 3.10/02 released in December

• Working on major new version 4.0

LCG /AA/ROOT relationship 9

ROOT version 4 Highlights (1)

Support for Automatic Schema evolution for foreign classes without using a class version number.Support for large files (> 2 GBytes)New data type Double32_t (double in memory, saved as float on the file)Native support for STL: automatic Streamers, no code generation anymore.

Tree split mode with STL vector/list Plug-in Manager upgrade (rlimap) with

automatic library/class discovery/load.•

LCG /AA/ROOT relationship 10

ROOT version 4 Highlights (2)

PROOF/Alien in productionXrootd (collaboration with Babar)New Linear Algebra packageGeometry interface to OpenGL/Coin3DSupport for Qt (alternative option to x11).GUI builder with GUI code generationNew GUI Histogram editor Interface with Ruby

First development release just before the ROOT workshop (25 February SLAC)

Final PRO release scheduled for June.

LCG /AA/ROOT relationship 11

LCG /AA/ROOT 1

SPI

LCG /AA/ROOT relationship 13

ROOT and SPI

• If the model evolves from a “user-provider” relationship to a real and effective integration of ROOT in the LCG plans, it will become obvious that ROOT should use the same infrastructure (SPI).

• The current work from Torre is an essential ingredient to simplify the development and build procedures, a prerequisite for convergence.

• It is too early to take a practical decision as it depends on the acceptation of this plan and on real achievements.

LCG /AA/ROOT 1

SEAL

LCG /AA/ROOT relationship 15

SEAL: Duplications

• Due to well known historical reasons, SEAL is duplicating systems already provided by ROOT,eg:• Object dictionary• Plug-in manager• Regular Expressions• Compression algorithms

• In the following, I will discuss only the dictionary and the plug-in manager.

LCG /AA/ROOT relationship 16

Seal libraries size and dependencies

SealBase6.60 MB

SealUtil0.85 MB

SealServices1.58 MB

IOTools1.29 MB

SealZIP2.15 MB

SealKernel1.62 MB

ReflectionBuilder1.02 MB

Reflection2.40 MB

PluginManager1.28 MB

SealCLHEPdict4.09 MB

CLHEP1.50 MB

SealSTLdict5.13 MB

GMinuit2.45 MB

LCG /AA/ROOT relationship 17Technology dependent

.h.h

ROOTCINTROOTCINT

CINT dictionary codeCINT dictionary code

DictionaryGeneration

CIN

T

dic

tio

nar

yC

INT

d

icti

on

ary

I/O I/O

Data I/O

SEAL Dictionary: Reminder

GCC-XMLGCC-XML

LCG dictionary codeLCG dictionary code

.xml.xml

Code GeneratorCode Generator

LC

Gd

icti

on

ary

LC

Gd

icti

on

ary

Gat

eway

Gat

eway

Reflection

Oth

er

Cli

en

tsO

the

r C

lie

nts

Hum !All boxes aretechnology dependent!

LCG /AA/ROOT relationship 18

SEAL: The dictionary saga

• There were 4 reasons to develop an alternative dictionary:• Make it independent of ROOT/CINT.• Make it available with other languages.• Remove parsing limitations of rootcint.• Necessary for POOL alternative backend.

• The alternative language is a false problem. All collaborations are heavily investing in C++, and anyhow the SEAL dictionary is not appropriate for languages coming with introspection/reflection capabilities.

• The other 3 reasons must be seen with a different angle, if ROOT is the choice for storage manager and analysis engine.

• Everybody agrees that having 2 dictionaries is a nightmare, a source of more and more conflicts and new problems.

LCG /AA/ROOT relationship 19

LCG Dictionary size Atlas (Nov version)

In November, we investigated the size of the LCG dictionary in case of Atlas, CMS and ROOT itself. LHCb were not in a position to estimate the size because they did not have the code generator yet.

As a positive effect of this exercise, the SEAL team has been able meanwhile to gain a factor 3 in the size of the dictionary on disk, but no estimation of the gain (if any) in memory.

Library Classes.o LCGdict.o LCGdict/class CINTdict.o-------------------------------------------------------------------SimpleTrack 10.7k 144k 13.45EventHeader 12.7k 89k 7.00FourMom 49k 13k 0.26GenerateObject(HepMC) 388k 326k 0.84LArSimEvent 26k 88k 3.38 EventInfo 33k 120k 3.63 65k

ATLAS (27 classes)

4.7 +- 4.4

LCG /AA/ROOT relationship 20

LCG Dictionary size CMS (Nov version)

• Bill compared the sizes of the same CMS dictionary object files (*.o) (COBRA+ORCA) on disk produced by lcgdict versus that for rootcint produced dictionaries.

• Total number of dictionaries = 30• Total number of classes = 359• Average data members per class = 435/359 = 1.2• Average functions per class 1868/359 = 5.2

• All were compiled with gcc_3.2.3 with the -O2 option, and all the symbols were stripped (with strip) for the purpose of this comparison.

• The size ratios are quite consistent across dictionaries, so we give the total sizes.

• ROOT: 3.45 Mb• POOL: 5.37 Mb

So the lcg dictionary files are approx. 50% larger. Note that the CMS classes above are only the base classes of the framework.

It would have been interesting to have more statistics based on concrete application classes with more data members and functions.

LCG /AA/ROOT relationship 21

LCG Dictionary size ROOT (Nov version)

• It was easy to generate the dictionaries for about one half of all ROOT classes (320/600)

• In order to evaluate the impact in memory of the LCG dictionary, I linked the dictionaries with the ROOT executable module.• Full ROOT Process Memory Size = 12.30 Mbytes• Same + lcg dictionary = 28.30 Mbytes

Remark1: The lcg dictionary for 1/2 of the ROOT classes is 1.3 times bigger than ROOT itself.

Remark2: The LCG dictionary does not contain all the information available in the CINT dictionary.

LCG /AA/ROOT relationship 22

ROOT Dictionary size

If all classes have a dictionary, the size of the dictionary may become a large fraction of the executable module!

LCG /AA/ROOT relationship 23

The CINT dictionary

Data structuresGClassInfo API

Data members, functions

C++ parser(s)rootcint

ByteCodeGenerator

Byte Code Interpreter

The CINT library is small: 1.5 MByteCINT is more than just a parser and API to the dictionary

LCG /AA/ROOT relationship 24

The CINT dictionary evolution

• Data Members• Supports already all C++ features (no missing

important features like typedef or enum)• Future is to look into XTI in case there is

progress with the C++ committee

• Parser/Code generator• The number of failing cases has considerably

dropped. We consider parsing failures with high priority. They are in general fixed in the “next week” CINT release.

LCG /AA/ROOT relationship 25

Dictionary: How to make progress

• Review asap functionality provided by LCGdict and CINT

• Collect info from CMS/Atlas,others on the size of dictionaries.

• Investigate how many classes (*.h) can be parsed by gccxml and not by rootcint.

• Compare the two APIs and data structures.• Investigate feasibility of supporting two parsers

with one single dictionary in memory.• Investigate portability of gccxml on all ROOT

supported platforms.

LCG /AA/ROOT relationship 26

Dictionary: which options?

• Start from LCG dict• Requires lcgdict to be available on all platforms where

CINT runs today• Requires deep changes in the byte code and in the

interpreter.• Start from CINT dictionary

• Improving the API• Keeping/Improving rootcint• Adapting gccxml as an alternative parser• Both options

Following discussions in Nov/Dec, a proposal for a common C++ API to the CINT dictionary is in preparation. Because the user must see only C++ objects, this requires also a mini C++ data structure (must be small compared to CINT)

LCG /AA/ROOT relationship 27

Dictionaries : root only

X.hCINT

DS

rootcintXDictcint.cxx

CINT

API

ROOT

Root meta C++

CINT

LCG /AA/ROOT relationship 28

Dictionaries : situation today

X.h

X.xml

XDictlcg.cxx

LCGDICTDS

CINTDS

rootcint

lcgdict

gccxml

XDictcint.cxx

CINT

API

LCG

API

POOL

ROOT

Root meta C++

CINT

LCG /AA/ROOT relationship 29

Dictionaries : step 1 gain space

X.h

X.xml

XDictlcg.cxx

LCGDICT DS C++

CINTDS

rootcint

lcgdict

gccxml

XDictcint.cxx

CINT

API

LCG2

API

POOL

ROOT

Root meta C++

CINT

LCG /AA/ROOT relationship 30

Dictionaries : step 2 simplification

X.h

meta DS C++

CINTDS

rootcintXDictcint.cxx

CINT

API

LCG

ROOT

APIPOOL

ROOT

CINT

LCG /AA/ROOT relationship 31

Dictionaries : step 3 coherency

X.h

meta DS C++

CINTDS

XDict.cxx

CINT

API

LCG

ROOT

APIPOOL

ROOT

CINT

gccxml

rootcint

LCG /AA/ROOT relationship 32

Plug-in Manager(s)

• A Plug-in manager is an essential tool helping in making a system more modular

• It simplifies dynamic linking and unlinking.• It would be nice to converge on one single

manager to minimize side-effects.• The ROOT plug-in manager is very powerful and

simple to use (see slide).• It does not require an object factory machinery.

The interpreter is already doing it for free.• It is being extended to automate/simplify several

operations, such as automatic discovery of the shared lib containing a class.

LCG /AA/ROOT relationship 33

Definition of plug-ins in ROOT

Plugin.TFile: ^rfio: TRFIOFile RFIO "TRFIOFile(const char*,Option_t*,const char*,Int_t)"+Plugin.TFile: ^castor: TCastorFile RFIO "TCastorFile(const char*,Option_t*,const char*,Int_t,Int_t)"+Plugin.TFile: ^dcache: TDCacheFile DCache "TDCacheFile(const char*,Option_t*,const char*,Int_t)"+Plugin.TFile: ^chirp: TChirpFile Chirp "TChirpFile(const char*,Option_t*,const char*,Int_t)"Plugin.TSystem: ^rfio: TRFIOSystem RFIO "TRFIOSystem()"Plugin.TSQLServer: ^mysql: TMySQLServer MySQL "TMySQLServer(const char*,const char*,const char*)"+Plugin.TSQLServer: ^pgsql: TPgSQLServer PgSQL "TPgSQLServer(const char*,const char*,const char*)"+Plugin.TSQLServer: ^sapdb: TSapDBServer SapDB "TSapDBServer(const char*,const char*,const char*)"+Plugin.TSQLServer: ^oracle: TOracleServer Oracle "TOracleServer(const char*,const char*,const char*)"Plugin.TGrid: ^alien TAlien RAliEn "TAlien(const char*,const char*,const char*,const char*)"Plugin.TVirtualPad: * TPad Gpad "TPad()"Plugin.TVirtualHistPainter: * THistPainter HistPainter "THistPainter()"Plugin.TVirtualTreePlayer: * TTreePlayer TreePlayer "TTreePlayer()"Plugin.TVirtualTreeViewer: * TTreeViewer TreeViewer "TTreeViewer(const TTree*)"Plugin.TVirtualGeoPainter: * TGeoPainter GeomPainter "TGeoPainter()"Plugin.TVirtualUtil3D: * TUtil3D Graf3d "TUtil3D()"Plugin.TVirtualUtilHist: * TUtilHist Hist "TUtilHist()"Plugin.TVirtualUtilPad: * TUtilPad Gpad "TUtilPad()"Plugin.TVirtualFitter: Minuit TFitter Minuit "TFitter(Int_t)"+Plugin.TVirtualFitter: Fumili TFumili Fumili "TFumili(Int_t)"Plugin.TVirtualPS: ps TPostScript Postscript "TPostScript()"+Plugin.TVirtualPS: svg TSVG Postscript "TSVG()"Plugin.TViewerX3D: x11 TViewerX3D X3d "TViewerX3D(TVirtualPad*,Option_t*)” +Plugin.TViewerX3D: qt TQtViewerX3D QtX3d "TQtViewerX3D(TVirtualPad*,Option_t*)”

name class Shared lib How to call

LCG /AA/ROOT relationship 34

MathLibs

• It is important for HEP to have one well identified Math library (source, libs), with• Full control of the source• That we can port on as many platforms as

possible• A good test suite and documentation

• This does not mean that we have to develop new algorithms/classes/functions.

• In Nov/Dec we had a few meetings to discuss a proposal for a Mathlib in C++, an alternative to a proposal by SEAL.

LCG /AA/ROOT relationship 35

Mathlibs (2)

KernlibMathlib

CLHEP

ROOTTMath

TMatrixTCL

GSLsubset

New MathlibOpen Source

Not HEP/LCG restricted

Convert only on demand what is not already converted by TCL

Give to GSL our mods as C/GSL functions

Take small subset and freeze

From GSL, Import functions not found elsewhere.Wrap C functions in classes like in TMath

ROOT Linear algebra is being extended and improved

LCG /AA/ROOT relationship 36

Mathlibs proposals

A: SEAL proposal: Install GSL, collaborate with the GSL team.

B: Rene/Eddy proposal: copies available

LCG /AA/ROOT relationship 37

Why a Mathlib in C++

1. We want to interact with real objects (data and algorithms), not just algorithms.

2. We want to provide higher level interfaces hiding the implementation details (algorithms). A true Object-Oriented API should remain stable if internal storage or algorithms change. One can imagine the Mathlib classes being improved over time, or adapted to standard algorithms that could come with the new C++ versions.

3. Many classes require a good graphics interface. A large subset of CERNLIB or GSL has to do with functions. Visualizing a function requires to know some of its properties, eg singularities or asymptotic behaviors. This does not mean that the function classes must have built-in graphics. But they must be able to call graphics service classes able to exploit the algorithms in the functions library.

4. Many objects need operators (matrices, vectors, physics vectors, etc).5. We want to embed these objects in a data model. Users start to request

that the math library takes care of memory management and/or persistency of the object . See for instance the LHC-feedback [5], where persistency of the CLHEP was requested. The user would like to save and restore random-generator seeds etc .

6. We want to have an interactive interface from our interpreters, hence a dictionary.

LCG /AA/ROOT relationship 38

C/Fortran/GSL versus C++

Object-Oriented API vs Procedural APIgsl style : double gsl_sf_gamma(double x)int gsl_sf_gamma_e(double x, gsl_sf_result* result)

root style : TF1 gamma(TMath::Gamma,0,1)gamma.Eval(x)gamma.Derivative(x)gamma.Integral(from,to)gamma.GetRandom()gamma.Draw()

LCG /AA/ROOT relationship 39

Mathlib Proposal picture

libGSL++.soContains full standard GSL

+ CINT dictionary

TMath or/and TMath likeC++ static functions

Contains the most used math functions

High Level C++ classesFunctions (a la TF1), Physics Vectors

Linear Algebra, Random Numbers, Minimisation

Callable from interpreter(s)

Persistency

ROOT

libraries

LCG /AA/ROOT relationship 40

Summary of proposal B

• Install standard gsl: libGSL.so• Provide a CINT front-end (say libGSL++.so)

• Nearly done, thanks Lorenzo

• Extend TMath with more static functions from CERNLIB, GSL,..

• New Linear Algebra from Eddy (see later)• Extend functions classes TF1 and like with more

algorithms. 2/3 of the estimated total work already done. Main work is the development of a test/benchmark

suite

LCG /AA/ROOT relationship 41

Linear Algebra benchmarks (lxplus/gcc3.2)

LCG /AA/ROOT relationship 42

Linear Algebra benchmarks (mac/gcc3.3)

LCG /AA/ROOT relationship 43

CLHEP linear algebra problems

• CLHEP inversion :• sizes <= 6 : Limited precision Cramer algorithm• sizes > 6 : unscaled LU factorization (Cernlib DFACT)

• Suppose Hilbert matrix A(i,j) = 1/(i+j+1) i,j=0,..,4 and calculate E = A * A^-1• Cramer : i!=j E(i,j) < 10e-7 while• scaled LU : i!=j E(i,j) < 10e-13

• Of course inaccuracy worse for larger matrix. Scaling the matrix with a large or small number will make Cramer under/over flow. Unscaled LU factorization can under/over flow • example Hilbert matrix size > 12, routine will return error

• CLHEP not thread-safe

LCG /AA/ROOT relationship 44

Features found only in ROOT4.0

In-place matrix multiplication passing of lazy matrix (recipe without

instantiation) eigen-vector/value analysis for symmetric and

non-symmetric matrix condition number for arbitrary matrix (Hager

algorithm) many decomposition classes: LU, Chol, QRH, SVD each allowing repeated solutions without

decomposing again thread safe persistency

LCG /AA/ROOT relationship 45

More tests and benchmarks

• Like for the linear Algebra classes, similar test suites and benchmarks should be implemented for:Basic algorithms (TMath like)Statistical Analysis and probabilitiesFunctions: integrals, derivatives, root-finding Interpolations, approximations.Random numbers: basic, functions, histogramsPhysics vectorsMinimization algorithms

LCG /AA/ROOT 1

POOL

LCG /AA/ROOT relationship 47

POOL Objectives (Dirk’s slide)

• To allow the multi-PB of experiment data and associated meta data to be stored in a distributed and Grid enabled fashion

• various types of data of different volumes (event data, physics and detector simulation, detector data and bookkeeping data)

• Hybrid technology approach, combining • C++ object streaming technology, such as Root I/O, for the bulk data • transactional safe Relational Database (RDBMS) services, such as MySQL,

for catalogs, collections and meta data

• In particular, it provides • Persistency for C++ transient objects • Transparent navigation from one object across file and technology

boundaries- Integrated with a external File Catalog to keep track of the file physical location,

allowing files to be moved or replicated

Source of problemsAnd misunderstanding

Two catalogs ?

LCG /AA/ROOT relationship 48

POOL Objectives

• Hybrid technology approach, combining • C++ object streaming technology, such as Root I/O, for the bulk data • transactional safe Relational Database (RDBMS) services, such as MySQL,

for catalogs, collections and meta data

If an alternative solution is in mind, it must be a complete solution. In particular, an automatic schema evolution algorithm has to be part of POOL itself.An alternative solution prevents exploiting more features of the current back-endConcentrating on one back-end will eliminate unnecessary overheads and duplicated code.

It is urgent to come back to the blueprint objectiveCombining ROOT as an event store with a RDBMS-based catalog

LCG /AA/ROOT relationship 49

POOL Objectives

• Hybrid technology approach, combining • C++ object streaming technology, such as Root I/O, for the bulk data • transactional safe Relational Database (RDBMS) services, such as MySQL,

for catalogs, collections and meta data

ROOT I/O is much more than a simple object streaming technology.-It supports automatic schema evolution (a large fraction of the code)-It supports collections (directories of keys, Trees with containers appropriate for queries in interactive analysis).-It supports “object-streaming” with sockets, shared-memory.-It supports access to remote files and is GRID-aware-Collections are designed to work in a parallel/GRID setup with PROOF

LCG /AA/ROOT relationship 50

POOL libraries size and dependencies

SealBase6.60 MB

SealKernel1.62 MB

ReflectionBuilder1.02 MB

Reflection2.40 MB

PluginManager1.28 MB

AttributeList0.15 MB

FileCatalog0.13 MB

EDGCatalog3.46 MB

DataSvc0.21 MB

Collection0.96 MB

RootStorageSvc2.22 MB

RootCollection0.18 MB

PersistencySvc0.29 MB

PoolCore0.43 MB

StorageSvc1.97 MB

libCore6.40 MB

libCint1.40 MB

libTree1.24 MB

Seal 12.92

Pool 6.54

Root 9.04

Tot 28.72

LCG /AA/ROOT relationship 51

POOL/ROOT performance problems

25 1 pool tree write 1.91 +89%25 1 pool tree read 1.19 +176%25 1 root tree write 1.0125 1 root tree read 0.4325 1 pool key write 2.04 +63%25 1 pool key read 1.72 +212%25 1 root key write 1.2525 1 root key read 0.55

25 10 pool tree write 1.93 +89%25 10 pool tree read 1.10 +175%25 10 root tree write 1.0225 10 root tree read 0.4025 10 pool key write 1.67 +33%25 10 pool key read 1.6 +180%25 10 root key write 1.2525 10 root key read 0.57

25 200 pool tree write 1.56 +56%25 200 pool tree read 1.07 +154%25 200 root tree write 1.0025 200 root tree read 0.4225 200 pool key write 1.61 +27%25 200 pool key read 1.6 +180%25 200 root key write 1.2625 200 root key read 0.57

500 50 pool tree write 22.38 +19%500 50 pool tree read 5.23 +27%500 50 root tree write 18.79500 50 root tree read 4.09500 50 pool key write 19.72 +3.5%500 50 pool key read 5.77 +29%500 50 root key write 19.04500 50 root key read 4.46

The current POOL/ROOT mapping has performance problems that must be understood (not just a few per cent!)

Numbers from Ioannis, Markus

LCG /AA/ROOT relationship 52

ROOT foreign classes & POOL

A checksum algorithm implemented in ROOT4. Provides auto schema evolution without having to specify a class version number. Must be tested in POOL.

Must look at possible performance problems due to the fact that POOL does not use ClassDef (important function IsA missing for POOL). This could explain the very poor POOL performance for LHCb when using vector<T*>.

It is very strange that this performance problem has not yet been seen by CMS and Atlas!

LCG /AA/ROOT relationship 53

POOL: ref<T> and collections

• If ref<T> and collections are not understood by ROOT, it will be a source of constant troubles and misunderstanding.

• The development of these two entities should have been done in collaboration to optimize the implementation. Remember the early discussions about TRef, TLongRef, TUUID and ref<T>.

• The existing POOL collections are mapped on ROOT Trees (any bonus compared to native Trees?).

• If new collections are required (to be seen!), they must be designed with data analysis in mind, including parallelism.

• Progress in this area requires a close cooperation with the experiments with prototyping of a few implementations using the different solutions.

• We already have interfaces of ROOT collections with many RDBMS systems, including queries. (MySQL, Oracle, SapDB, PostGres)

LCG /AA/ROOT relationship 54

POOL: caching

• There is a confusion between “commit” that guarantees data base integrity and “buffering” to improve performance.

• The cache with “I take ownership” is intrusive and with consequences on the user framework.

• The solution with “no ownership” is not optimum. It implies multiple copies and duplicates the efficient buffering implemented in ROOT.

• A review of the POOL/ROOT communication will have to address these problems by removing unnecessary layers.

LCG /AA/ROOT relationship 55

POOL: Access to the catalog

• A coherent system will require a good interface between ROOT and the POOL catalog (both C++ and CINT).

• ROOT has already an abstract interface TGrid with an implementation with Alien.

• POOL will not be the only catalog around. It is important to consider the generality and variety of interfaces.

LCG /AA/ROOT 1

SIMU

• I will discuss only the VMC/SIMU relationship and the Geometry

LCG /AA/ROOT relationship 57

ROOT and the Simulation project

• The VMC has 2 goals:• Experiments define their geometry once only• The comparison between physics packages is facilitated.

• The VMC proposes 3 standard interfaces:• Standardize the interface to the generators and the

particle stack.• Standardize the interface to the step manager (hits

scoring)• Standardize the interface to the geometry

- Definition and Validation (checker)- Navigator in detector simulation (fast and slow)- Queries from a reconstruction algorithm- Graphics

LCG /AA/ROOT relationship 58

Geometry and Geometries

GeometryIn memory

XMLFiles

Eg, GDML

ROOTfile

C++ classesGeant3rz file

C++ classesC++ classes

G3

G4

Fluka

Recons

geometry

geometry

LCG /AA/ROOT relationship 59

Geometries: not the same goal !

XMLFiles

Eg, GDML

External description onlyUsed as input to a real geometry (G4, ROOT)

Checker , Viewer may be implementedRequires some data structure in memory

This has very limited functionality.Interesting (may be?) for input.

Too much emphasis on this solution

GeometryIn memory

(G3,G4,ROOT)

Simulation/Reconstruction orientedC++ API for the constructionInput can be via first solution

Checker, Viewer must be (are) implementedProvide interface to navigators

THIS IS THE MAIN HORSE TO BE OPTIMIZED

LCG /AA/ROOT 1

PI

LCG /AA/ROOT relationship 61

ROOT and PI (1)

• The PI-AIDA interface to ROOT exist. This still requires some consolidation, but it should not be expanded to other areas.

• A generic PyRoot interface (in SEAL) must be optimized, automatized. Examples of PyRoot illustrating the complementarity to CINT (instead of an alternative) should be written and used in tutorials.

• The PyRoot interface belongs logically to the Root source.

• Ruby-Root (superior to PyRoot?)

LCG /AA/ROOT relationship 62

The original AIDA model

AIDA

ROOT JAS

USER

LCG /AA/ROOT relationship 63

The ROOT-PI-AIDA model

AIDA

ROOT JASPI-AIDA

PI-ROOT

USER

LCG /AA/ROOT relationship 64

PI libraries size and dependencies

SealBase6.60 MB

PluginManager1.28 MB

AIDA_Plugin0.038 MB

AIDA_Proxy0.74 MB

PluginTreeROOT0.40 MB

AIDA_CPP0.08 MB

AIDA_ROOT0.49 MB

PluginHistogramROOT0.30 MB

libCore6.40 MB

libCint1.40 MB

libTree1.24 MB

libHist2.77 MB

Seal 7.88

PI 2.05

Root 11.81

Tot 21.74

AIDAUtilitiesxx MB

CLHEP1.50 MB

LCG /AA/ROOT relationship 65

ROOT and PI (3) ARDA

• The project scope should be different.• I would like to see more cooperation on the

development of ROOT/PROOF with a close participation with the experiments.

• ROOT/PROOF requires also a close collaboration with the file catalog providers,POOL, Alien, others.

• Data challenges are the best opportunity to develop a coherent model.

Instead of Abstract interfaces common to all data analysis packages, it is more efficient and less bureaucratic to develop file exchange formats

LCG /AA/ROOT 1

ARDA

LCG /AA/ROOT relationship 67

ROOT & ARDA

• Our program of work has clearly a very large overlap with the proposed ARDA project:• Distributed computing (move process to data):

PROOF• Distributed data access: xrootd/PROOF

• However, Data Analysis is not just an extension of Distributed Production.

• Data Analysis will be Batch AND Interactive with more and more Interactivitywith more and more Interactivity

LCG /AA/ROOT relationship 68

Data Volume & Organisation

100MB 1GB 10GB 1TB100GB 100TB 1PB10TB

1 1 500005000500505

TTree

TChain

A TChain is a collection of TTrees or/and TChains

A TFile typically contains 1 TTree (or a few)

A TChain is typically the result of a query to the file catalogue

LCG /AA/ROOT relationship 69

Data Volume & Processing TimeUsing technology available in 2004

1” 10” 1’ 10’ 1h 10h 1day 1month

1” 1” 10” 1’ 10’ 1h 10h 1day 10days

1” 1” 1” 10” 1’ 10’ 1h 10h 1day

1’ 10’ 1h 10h

100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB

ROOT 1 Processor P IV 2.4GHz 2004 : Time for one query using 10 per cent of data

Interactive batch

PROOF 10 Processors

PROOF 100Processors

PROOF/ALIEN 1000Processors

LCG /AA/ROOT relationship 70

Data Volume & Processing TimeUsing technology available in 2010

1” 1” 1” 10” 1’ 10’ 1h 10h 1day

1’ 10’ 1h

100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB

ROOT 1 Processor XXXXX 2010 : Time for one query using 10 per cent of data

Interactive batch

PROOF 10 Processors

PROOF 100Processors

PROOF/ALIEN 1000Processors

1” 1” 10” 1’ 10’ 1h 10h 1day 10days

1” 1” 1” 1” 10” 1’ 10’ 1h 10h

LCG /AA/ROOT relationship 71

GRID: Interactive AnalysisCase 1

• Data transfer to user’s laptop• Optional Run/File catalog• Optional GRID software

Optionalrun/FileCatalog

Remotefile servereg rootd

Trees

Trees

Analysis scripts are interpretedor compiled on the local machine

LCG /AA/ROOT relationship 72

GRID: Interactive AnalysisCase 2

• Remote data processing• Optional Run/File catalog• Optional GRID software

Optionalrun/FileCatalog

Remotedata analyzer

eg proofd

Trees

Trees

Commands, scripts

histograms

Analysis scripts are interpretedor compiled on the remote machine

LCG /AA/ROOT relationship 73

GRID: Interactive AnalysisCase 3

• Remote data processing• Run/File catalog• Full GRID software Run/File

Catalog

Remotedata analyzer

eg proofd

Trees

Trees

Commands, scripts

Histograms,trees

TreesTreesTrees

TreesTreesTrees

slave

slave

slave

slave

slave

slave

Analysis scripts are interpretedor compiled on the remote master(s)

LCG /AA/ROOT relationship 74

Proof Alien (Interactive and batch)

ALIEN PROOF ?

by Nina C.Fulford

This is a story about fear! Fear of the Government! Fear of your company boss! Fear of ones career! Fear for ones life maybe? Whatever way you look at it, it is based on fear. How did we allow ourselves to be ruled by fear in this day and age of so-called knowledge, progress and science? You tell me, because I want to know! For years every so called expert on this planet has claimed to have two goals: find (1) proof of alien presence on this world, either in the form of an artifact beyond our present science or in the form of an alien life form. (2) Find the missing links to our past. In order to do the second one they use millions of dollars in public money for their research. Did you think they paid their own way? No! You support this with your taxes! If they can spend thousands wandering around Africa looking for bones and getting all excited when some animals skull turns up they want to claim is an ancestor of man, then why aren't they interested in an equally old skull that is out of this world figuratively and physically?

LCG /AA/ROOT relationship 75

Summary-1: technicalities

Several important changes are proposed to optimize the relationship between ROOT and LCG/AA.

The proposed changes will reduce considerably the complexity of the system and will improve drastically the performance. This is good for developers and end-users.

If the idea is accepted, a concrete plan for implementation, starting by the most urgent tasks (dictionary, POOL, mathlib) could be setup very soon. Very positive meetings so far.

LCG /AA/ROOT relationship 76

Summary-2: organisation

The current development model• Experts design/implement/release• Experiments validate the product

should be changed to:• Experts discuss with experiments to understand their

event models, possibly influencing their design• They prototype the different models• They integrate and release (fast iterative process subject

to less surprises) This will improve the feedback mechanism It will reduce risks Simplified structures should be put in

place.