a comparison of data analysis packages

Click here to load reader

Post on 17-Jan-2016




0 download

Embed Size (px)


A Comparison of Data Analysis Packages. Irwin Gaines, Jeff Kallenbach Fermilab. Outline. Introduction: a little history Build vs. Buy: general considerations User Requirements Basic Features Advanced features Conclusions. Introduction. - PowerPoint PPT Presentation


A Comparison of Data Analysis PackagesIrwin Gaines, Jeff Kallenbach
User Requirements
Basic Features
Advanced features
Previous generation HEP experiments have used a ubiquitous homemade product: PAW
Why? Commercial systems did not offer either functionality or, more important, performance
Use of a universal product allows:
data sharing (ntuple files)
CHEP2000 9-Feb 2000
Build vs. Buy
Old days (70’s-80’s): in house development effort “free”, any software purchase is expensive
More recently(90’s):attractive licensing terms, development costs should be amortized over as large a user base as possible, Support?
Now: Consider full product lifetime costs, including development, licensing, support. Does product need to be customized or enhanced to meet HEP needs?
Preparing various statistical distributions of various mathematical functions of data in the selected events
Linking in high level language programs to process event data prior to plotting
Modifying selection criteria and plotted functions interactively
Fitting the distributions
Comparing and performing calculations on different distributions
Preserving selection criteria and functions for later use or to pass to others
Saving samples of events in a variety of specialized formats for later analysis
Accessing these specially formatted event samples to make plots, fits, statistical outputs, etc.
CHEP2000 9-Feb 2000
Parallel processing (using distinct data streams)
Debugging and profiling
Modularity (user code)
Modularity (system code)
Commercial Package: IDL (other commercial packages offer similar features; IDL appeared to be most aggressive in licensing terms)
CHEP2000 9-Feb 2000
ROOT: from browser, from tree viewer, from command line
All plots are active,can be manipulated, saved for later use, printed in a variety of formats
IDL:command line examples on following slides
plots can be either static or active, displayed or printed
CHEP2000 9-Feb 2000
Displaying a Histogram
Display a histogram
Browse the file
Step by Step:
1. select the Open menu item from the File menu
2. select “Example.root” and Open
3. double click on myTree
4. double click on the xs1
CHEP2000 9-Feb 2000
- Select the horizontal part of the histogram
- Right click to get the context menu
- Select the Fit Panel
- Select Fit
- Select the horizontal part of the histogram
- Right click to get the context menu
- Select the “SetFillAttributes”
- Select Apply
To zoom the X-Axis
- Left click on the left axis point (the cursor changes to a hand)
- Drag the mouse to the right axis point
- Release
To un zoom:
- Right click on the X-axis. This brings up the context menu for the X axis.
- Select unzoom from the context menu.
CHEP2000 9-Feb 2000
The Tree Viewer
Tree Viewer buttons:
To create a Tree Viewer
In the Browser, right click on the tree. The context menu for the tree appears.
Select StartViewer
Tree Viewer components
Slider to select the events range (min/max). While the event loop is executing, a red box inside the slider bar indicates the progress in the loop.
Break Interrupt the event loop
IList Actives a TEventList as an input selection (List In)
X To select the X variable
Y To select the Y variable
Z To select the Z variable
Cpad Specifies the output pad (default = c1)
Hist Redefines the default histogram (default = htemp)
OList Creates a new TEventList using the current selection (List Out)
Gopt Used to specify graphic options
Draw Display the variable(s) that are placed in the X, Y, Z button
Scan Same with TTree::Scan output style
W To select the weight/Selection expression
CHEP2000 9-Feb 2000
commands are methods of root classes
Full access to compiled code (in any language)
commands are part of scripting syntax
full access to compiled code (in any language)
CHEP2000 9-Feb 2000
IDL command language
cut4=where(lsig gt 5 and iso1 lt .05 and clsec gt .05 and iso2 lt .03)
read in a variable
plot histogram
dist = histogram(mass(cut4),binsize=mybin)
ht2IDL - An Interface between HEP Data files and IDL
As part of our investigation of the Interactive Data Language (IDL) for use in our environment, we have assembled a prototype of what we call ht2IDL (for "hepTuple to IDL). The is a small package of C++ code and IDL procedure files which enable the user to access HEP data stores, such as HBOOK files, from the
IDL session. It uses the HepTuple package from PAT.
How the package works
Like most modern tools, IDL provides the capability to interface with external functions written by the user. This is accomplished by writing some code, using a C-based interface, then compiling it and linking it into a shared-object file. Then, by creating some simple helper files for IDL, and starting IDL from the correct directory, where all of the new interface code lies, the user has access to all of the new functionality provided the written code and the IDL "External Interface" In our prototype, this was all accomplished on an SGI/IRIX system. In order to attempt to achieve maximum compatibility with the RunII environment, it was decided to use KCC. In principal there is no reason it should not work with CC or g++. Then, referring to the IDL External Developers' Guide, we wrote some code which uses the HepTuple library to read HBOOK files, load the data into data structures compatible with IDL, and then return them to the IDL session. We have written a prototype provides an interface to the HBOOK files (using HepTuple), makefiles and some documentation on how to use them, and sample IDL scripts (called "procedure" files) to invoke the ht2IDL functions and display and manipulate the results.
you pay for it
hard to customize, usually don’t get source
homemade products moving to free software support model (support by community)
can modify source to enhance or customize
relatively easy to use other’s code
both require a local support organization
CHEP2000 9-Feb 2000
Using native user objects
Two separate issues:
data in memory vs. data on disk (efficient disk access necessary for large data files)
can’t improve on disk speed unless objects that are read together are next to each other on disk (column wise n-tuple and generalizations)
CHEP2000 9-Feb 2000
CHEP2000 9-Feb 2000
Basically memory based
Associated I/O allows mapping an IDL array or structure variable onto a file:
I/O occurs automatically when the associated variable is subscripted, accessing only the desired object
data set size limited by file size rather than memory size
direct access to each element in the file; including convenient event selection by indexing
files can have multiple associated structures (full events, tracks, hits, etc)
performance still limited by record structure
CHEP2000 9-Feb 2000
Access to user objects
Root script language is C++, user classes can be used by interpreter if their header files are run through rootcint to create dictionary
IDL supports structures, a collection of scalars, arrays and other structures. Needs an external structure definition file to allow use in commands; no automatic way to create these from class headers
CHEP2000 9-Feb 2000
IDL GUI Builder
Available in IDL 5.3, the IDL GUIBuilder enables you to build intuitive GUIs with drag-and-drop ease. A convenient control palette with icons such as radio buttons, checkboxes, and horizontal and vertical sliders let you quickly construct interfaces that users understand. Widget properties are easily editable. Pre-made bitmaps give you graphical cues for customizing buttons relevant to their function. Also, widgets are arranged in row and column geometry for on-screen consistency. At the code level, built-in comments help you understand what each widget and event will accomplish.
CHEP2000 9-Feb 2000
What Is ION?
An easy method for users to leverage the graphics and analysis power of IDL in web based applets and applications
Allows users to share IDL applications with non-IDL users
Easy set-up, use and management
CHEP2000 9-Feb 2000
This diagram gives a high level picture of how the ION Client and Server interact and how network connections are made when the ION Client is in running in an applet or applications.
The key things to note when showing this slide are:
ION is a client/server application
ION can exist as an applet or application, both of which can use the same server
ION Applet Clients connect to the server independent (more or less) of the browser that they are running in and there is no CGI involved
IDL Commands and graphic primitives are communicated directly, in a binary format between the client and server - no ASCII equivalents or reading/writing of images
CHEP2000 9-Feb 2000
Applications based on ION
Workgroups can develop and easily deploy data processing and visualization apps with ION
Thin clients download fast and can be updated easily
Applications can exist in any Java enabled machine and still access the power of IDL
The main point of this slide is to encourage application (or at least powerful applet) development. The application model endorsed here is the same as the client/server promise of Java - thin clients that run anywhere and are easy to maintain. By using a creative combination of IDL and Java, users can easily make very powerful visualization and data processing applications.
Another point worth mentioning is that Applications are not limited by the constraints placed on applets. This enables local file system access, arbitrary network connections and more system access.
[[Add demo notes]]
CHEP2000 9-Feb 2000
Commercial products offer all basic functionality and many attractive advanced features
Homemade products still better optimized for specific HEP use
Support models evolving (open source model)

View more