experiment archive, analysis, and visualization at the national ignition facility

E

MAL

h

��

a

AA

KNNSWAD

1

efidN

mdrcw

0h

Fusion Engineering and Design 87 (2012) 2087– 2091

Contents lists available at SciVerse ScienceDirect

Fusion Engineering and Design

journa l h o me page: www.elsev ier .com/ locate / fusengdes

xperiment archive, analysis, and visualization at the National Ignition Facility

atthew S. Hutton ∗, Stephen Azevedo, Richard Beeler, Rita Bettenhausen, Essex Bond,llan Casey, Judith Liebman, Amber Marsh, Thomas Pannell, Abbie Warrick

awrence Livermore National Laboratory, Livermore, CA, United States

i g h l i g h t s

We show the computing architecture to manage scientific data from NIF experiments.NIF laser “shots” generate GBs of data for sub-microsec events separated by hours.Results are archived, analyzed and displayed with parallel and scalable code.Data quality and pedigree, based on calibration of each part, are tracked.Web-based visualization tools present data across shots and diagnostics.

r t i c l e i n f o

rticle history:vailable online 25 July 2012

eywords:IFational Ignition Facilitycientific archiveeb-based visualization

utomated analysis

a b s t r a c t

The National Ignition Facility (NIF) at the Lawrence Livermore National Laboratory is the world’s mostenergetic laser, providing a scientific research center to study inertial confinement fusion and matterat extreme energy densities and pressures. A target shot involves over 30 specialized diagnostics mea-suring critical x-ray, optical and nuclear phenomena to quantify ignition results for comparison withcomputational models. The Shot Analysis and Visualization System (SAVI) acquires and analyzes tar-get diagnostic data for display within a time-budget of 30 min. Laser and target diagnostic data areautomatically loaded into the NIF archive database through clustered software data collection agents.The SAVI Analysis Engine distributes signal and image processing tasks to a Linux cluster where com-

ashboards putation is performed. Intermediate results are archived at each step of the analysis pipeline. Data isarchived with metadata and pedigree. Experiment results are visualized through a web-based user inter-face in interactive dashboards tailored to single or multiple shot perspectives. The SAVI system integratesopen-source software, commercial workflow tools, relational database and messaging technologies intoa service-oriented and distributed software architecture that is highly parallel, scalable, and flexible. Thearchitecture and functionality of the SAVI system will be presented along with examples.

. Introduction

The National Ignition Facility is the world’s largest and mostnergetic laser, providing a research center to study inertial con-nement fusion ignition and explore matter at extreme energyensities and pressures. Integrated fusion ignition experiments onIF began in 2010 [1,2].

A target shot on the NIF involves over 30 specialized diagnosticseasuring critical x-ray, optical and nuclear phenomena [3]. NIF is

esigned to perform several target experiments per day. This rate
equires that raw data be quickly analyzed to inform scientists ofhanges to experimental parameters between shots. In this paper,e describe the Shot Data Analysis and Visualization (SAVI) system.
∗ Corresponding author.E-mail address: [email protected] (M.S. Hutton).

920-3796/$ – see front matter. Published by Elsevier B.V.ttp://dx.doi.org/10.1016/j.fusengdes.2012.07.009

Published by Elsevier B.V.

The system was designed to archive, automatically analyze, andprovide visualization of experiment results.

2. NIF Archive

The NIF Archive is the permanent repository for experimentalresults produced by NIF shots. Any data potentially used in the anal-ysis of the shot are collected and stored in the archive. As such, thearchive not only contains raw diagnostic and analyzed data, but isthe authoritative source for supporting information such as diag-nostic calibration, target metrology, and instrument configuration.The archive provides a single, consolidated resource for studyingexperimental results at NIF.

A NIF shot can produce hundreds of gigabytes (GB) of data. Datatypes include scalars and non-scalar data such as waveforms andimages. Archived data is stored in a relational Oracle Real Appli-cation Cluster (RAC) database. The software interface layer is a set

dx.doi.org/10.1016/j.fusengdes.2012.07.009

http://www.sciencedirect.com/science/journal/09203796

http://www.elsevier.com/locate/fusengdes

mailto:[email protected]

dx.doi.org/10.1016/j.fusengdes.2012.07.009

2 ering a

omss

icwtec

oaqqq

adHEc

daactoah

Tianri

eit

tsptt

2

gtd(aHi

bac

088 M.S. Hutton et al. / Fusion Engine

f frameworks and services written on top of a low-level contentanagement library. Software interfaces include SOAP-based web

ervices, Java APIs, WebDAV [6], and SQL. Additional custom HTTPervices provide export and dynamic transforms to various formats.

An archive object is considered the primary unit of stored datan the archive. An archive object is typically hierarchical and canomprise a mixture of data types including scalars, images andaveforms. The archive stores data relationally with the ability

o interact with the data as objects. Object-relational features arenabled by extensive class and attribute metadata provided by theontent management framework.

The content management framework performs transparentbject-relational mapping. Archived objects share a commonbstract base class and relational base table making it possible touery heterogeneous data across the archive with object-relationalueries. This data architecture also supports standard relationalueries using SQL.

The object-relational feature has other benefits. For example,ny scalar data can be quickly trended across shots. Alternativelyata structures can be rendered into popular data formats such asierarchical Data Format (HDF), Common Data Format (CDF), orxtensible Markup Language (XML) using generic transformationodes [7].

Binary content such as images, waveforms, and unstructuredocuments such as spreadsheets and PDF files are stored in therchive database as Binary Large Objects (BLOBs). The databasend archive software provides the means to transparently relo-ate the physical location of BLOBs, but retain the availability ofhe data to users. Information Lifecycle Management (ILM) technol-gy migrates BLOBs as documents to near-line storage devices suchs tape or low-cost disk. This reduces backup times and preservesigh-cost storage for more active data.

Once an archive object is stored, its content becomes immutable.he archive maintains a strict version history such that any changen an archive object results in a new version of the object. Everyrchive object or versioned instance is assigned a unique resourceame (URN). A name resolver service is used to resolve a URN andetrieve a particular archived object. The URN acts as a permanentdentifier for data that scientists may reference in publications.

Archive objects contain metadata that can further qualify,nrich, or describe the contents. Such metadata includes data qual-ty, analysis pedigree, user comments, display images, and dataaxonomy.

NIF uses the data taxon as a natural key for classifying data withinhe overall data taxonomy of the archive. Archive objects within theame version family share a common data taxon. A data taxonomyrovides a simple yet powerful mechanism to organize data in mul-iple dimensions. The taxon is comprised of six parts that identifyhe origin and type of data.

.1. Extract, transform, and load

NIF produces approximately 20,000 binary data artifacts per tar-et shot [4]. Laser and shot configuration data is initially staged byhe control system software into an Oracle database. Raw targetiagnostic data are acquired from devices by Front End ProcessorsFEPs) and written as HDF5 (Hierarchical Data Format) files to a sep-rate staging area on a network file system. NIF devised a canonicalDF schema for storing common array data such as waveforms and

mages. Such standards simplify the interchange and use of data.Raw data are collected in parallel and stored into the NIF Archive

y a cluster of Java-based software agents that extract, transform,nd load (ETL) data into the archive. This data collection processompletes within 10 min of a shot.

The function of ETL is tri-fold:

nd Design 87 (2012) 2087– 2091

1) Collect (extract) data from source systems such as databases andfile systems.

2) Convert (transform) data into structures required by the archiveprocess.

3) Archive (load) the data in the NIF Archive using archive softwareinterfaces.

The ETL framework is written in Java although the “transform”aspect of ETL is executed by dynamically loaded JavaScript scripts.These scripts run inside an embedded scripting engine within theETL agent software. This technical approach allows the team to “hotdeploy” transformation code without compilation of frameworkcodes. Hot-patching has proven useful in response to unexpectedanomalies with source data where transform script changes maybe required to ensure data is loaded. Also, with this process, newdiagnostics can be deployed outside of the framework release cycle.

The ETL agents are lightweight Java processes that run insidea JADE micro-container within a Java Virtual Machine (JVM). JADEis an open-source agent development framework that affords usremote management [5]. This architecture provides the ability toquickly scale-up data acquisition or re-balance processing activi-ties. The remote management features simplify startup, shutdown,and monitoring to the cluster of agents on over a dozen Linuxmachines.

2.2. Data provenance (data pedigree)

One of the basic principles of the scientific method is the repro-ducibility of experimental results. The archive was designed tocarefully maintain and validate the chain-of-custody that startswith raw data through each step of the analysis [3]. We refer tothis traceability as pedigree. Pedigree is tracked as metadata and isrecorded whenever data are refined by analysis.

The archive proactively validates the pedigree chain of archiveobjects in near real-time. Many different scenarios can result in a“bad” pedigree chain. For example, an improved or corrected cali-bration can be applied retroactively, a configuration error (wrongserial number) may have been recorded at the time of the shot, ora prior calculation step was re-executed where a result relied onan earlier version. Any change to the quality of these data inputsimpacts the entire pedigree chain from that point forward. Whenpedigree is impacted by a data change, the archive automaticallydetects the impact and flags the appropriate chain of data witha bad pedigree quality indicator. The analysis chain can then bere-analyzed and a new version of the results created.

2.3. Lifecycle events

The NIF Archive framework executes a set of configurable stepsduring the initial archive stage. The archive software posts lifecycleevents as messages to multi-subscriber queues, which then informclient software processes when data are ready for further pro-cessing. This publish/subscribe messaging pattern provides loosecoupling between the archive and the various software compo-nents that act on these events.

The messaging queues related to event processing utilize Ora-cle Advanced Queues (A/Q) technology that resides in the Oracledatabase. The persistent store provides the benefits of guaran-teed delivery and execution within the same database transaction,avoiding complications such as transaction coordination and multi-
phased commits.
The automated analysis framework described in the next sectionrelies on individual content-related events to trigger analysis andrelease data preconditions.

ering a

2

cfwp

2

Atuftt

3

dds

otmS

(

(

(

a

4

ep

bovbcb

atiaa

sltta

M.S. Hutton et al. / Fusion Engine

.4. Enrichment

The archive uses lifecycle events internally to provide asyn-hronous data enrichment. Enrichment performs data enrichingunctions such as generating automatic JPEG display images for HDFaveforms and images. This pre-processing improves visualizationerformance and reduces load on the system.

.5. Data access control

The content management framework underlying the NIFrchive provides the ability to assign an Access Control List (ACL)

o an archive object. An ACL contains the roles, permissions, andser groups assigned to those roles. The archive’s software inter-aces ensure that a user-query is performed under that user, andhe ACL is tested to ensure the individual has appropriate access tohe data.

. Analysis automation engine

The SAVI system automatically analyzes and archives targetiagnostic data immediately following the shot. Instrument andiagnostic-level analysis steps are completed within 30 min of ahot.

SAVI analysis is performed by a highly parallel distributed setf software components. Asynchronous activities are enabled byhe message queues described previously. These queues guarantee

essage delivery and allow for the natural allocation of tasks. TheAVI analysis system is composed of three major components:

1) The Director determines the analysis programs to execute andwhen execution can occur.

2) The Workflow engine coordinates the analysis pipeline and pro-vides the data integration required to assemble the analysisinputs.

3) The Analysis module performs the data analysis.

Refer to [3] for a detailed explanation of the SAVI automatednalysis framework.

. Data visualization

Data visualization provides several functions: (1) a quick look atxperiment results, (2) cross-shot analysis, and (3) a collaborationoint for scientific research.

Using the web-based Archive Viewer, data across the archive cane searched, browsed, or visualized graphically. The data taxon-my described earlier in this paper is rendered as a hierarchicalirtual directory of data available for a shot. Collections of data cane viewed in standard tabular or grid formats. Hierarchical dataan be traversed via hyperlinks along data relationships, includingetween dependent results using pedigree.

Scientists may download data and perform off-line desktopnalysis. A suitcase feature in the Archive Viewer allows scientistso multi-select data and bulk export as a single .zip file. A WebDAVnterface provides an alternative file system-based access to therchive [6]. An upload feature allows scientists to archive desktopnalysis results that become part of the official experiment record.

The key to a useful and relevant scientific archive is to employtrong visualization tools that keep scientists engaged with data on-
ine. Our goal is to encourage scientists to contribute post-analysiso the archive and to maintain the quality of analyzed data overime. Ongoing peer review and scrutiny enhances the quality of datand the scientific process. The SAVI archive and visualization tools
nd Design 87 (2012) 2087– 2091 2089

enable physics working groups and diagnostic teams to interactwith data on-line in real-time.

The object-relational archive, web services, and data taxonomyhave proven useful in providing flexible web-based visualization.For example, the Archive Viewer is able to introspect a data classand render it without any advanced knowledge of the structure.A data taxonomy allows data to be self-organizing. Entirely newclasses of diagnostic data are generally viewable, searchable, anddownloadable without any changes to the visualization software.

4.1. Dashboards and interactive plots

A major visualization strategy change occurred in mid-2010.Inspired by portals such as Google and Yahoo home pages, wedeveloped a similar approach to data visualization using interactivedashboards. This web-based framework enables the rapid assemblyof data into readily deployable units called widgets. The frameworkallows an analyst to quickly build and customize “dashboards” byassembling widgets on a web page.

A widget is an HTML container for displaying data such as tabu-lar data, plots, or images. Each widget loads in a separate browserthread using Asynchronous JavaScript and XML (AJAX) calls to dataservices hosted by visualization servers. This approach yields ahighly responsive user experience and has proven superior to ren-dering pages in massive bulk requests to the server.

Widgets in the dashboard act as windows that can be moved,minimized, or maximized. Widgets host a context-based menu fordata export or visualization actions such as enabling trendlines orannotations. Using this technique, any combination of data fromthe archive can be very quickly assembled into a topic-based dash-board.

Many of our dashboard widgets include powerful interactivedata plots: category plots, line charts, bar charts and scatter plotswith error bars. We utilize browser-centric plotting packages thatprovide zooming, re-scaling, contextual hover, toggling data series,etc. These capabilities encourage users to remain in the web-basedvisualization environment and minimize the need to revert to off-line tools where data can become stale.

4.2. Multi-shot analysis

Multi-shot analysis is also possible from the dashboard. Scien-tists need to analyze data across multiple shots, but require theflexibility to vary the grouping of shots used. As a result, we intro-duced a feature called tags. A scientist can group any combinationof shots under a tag. Tags can be private to the user or shared andadministered by privileged users. Tags can be copied and modi-fied privately to perform on the spot, ad-hoc analysis. Once a tagis defined, the tool is able to automatically display and plot data inover a dozen multi-shot dashboards.

4.3. Authorized values

Scientific working groups have formed to authorize a sin-gle value measured by multiple, independent target diagnostics.For example, neutron yields (Yn) are measured by a half-dozendiagnostics oriented around the target chamber. Specialized dash-boards provide the working group views for visualizing this datatogether. The dashboard automatically calculates a weighted meanvalue with error bars using a chi-squared statistical algorithm(Fig. 1). The working group leader collaborates with a team of
diagnostic scientists to exclude outlying measurements, and thenauthorizes the weighted value. The authorized value is archivedwith a pedigree that includes the participating instrument mea-surements.

2090 M.S. Hutton et al. / Fusion Engineering and Design 87 (2012) 2087– 2091

F rovideF ean nu

4

sN[eid

4

VwIm

Fh

ig. 1. An example of the Neutronics Working Group dashboard. This dashboard prom this dashboard, the working group can review and authorize the weighted m

.4. Comparison to simulation data

Computational models such as LASNEX and HYDRA run on LLNLupercomputers. These hydrodynamics codes attempt to simulateIF experiments based on the current understanding of physics

2]. Using the archive’s data taxonomy, simulation metrics can beasily correlated with experimental values (Fig. 2). This capabil-ty has fueled new visualization requirements that the interactiveashboards have been able to accommodate.

.5. Visualization technology

The Archive Viewer is based on Java and Struts 2, a Model-
iew-Controller web framework. The client dashboard technologyas developed using the open-source jQuery JavaScript library.
nteractive plots are based on open-source HighCharts and a com-ercial package called PowerCharts. These tools have allowed us

ig. 2. An example of a multi-shot dashboard containing simulation vs. experimental daover. These features allow potential plotting of multiple variations of simulation.

s the working group with a consolidated view of multiple measuring diagnostics.clear results.

to maintain compatibility across all modern browsers and desktopplatforms.

The dashboard is now the predominant technique for visual-izing data and provides a collaborative environment for scientificworking groups focusing on various areas of physics. Dashboardsprovide a live and current view of the data not possible with off-linetools.

5. Future efforts

The SAVI system is an ongoing development effort. Where auto-mated analysis is not yet possible, the team is providing data in“ready-to-execute” packages and improving methods for uploading
off-line analysis results.
Data visualization continues to be a major focus of develop-ment. One area of development includes enhanced collaborationfeatures in the dashboard. Examples include publish-subscribe and

ta. An interactive plot allows data toggling, and displays information with mouse

ering a

ntwm

R

[

[[

[et al., The National Ignition Facility data repository, in: ICALEPCS 2009 Confer-ence Proceedings, Kobe, Japan, 2009.

M.S. Hutton et al. / Fusion Engine

otification of events of interest. Scientists would like on-line anno-ation of experimental results and key observations. Other effortsill involve integration between the internal NIF wiki for docu-ents and research papers and content from the archive.

eferences

1] E. Moses, The National Ignition Facility: path to ignition in the laboratory, in:Fourth International Conference on Fusion Sciences and Applications, Biarritz,France, September, 2005.

[[[

nd Design 87 (2012) 2087– 2091 2091

2] The National Ignition Facility Web site, https://lasers.llnl.gov.3] S. Azevedo, Automated experimental data analysis at the National Igni-

tion Facility, in: ICALEPCS 2009 Conference Proceedings, Kobe, Japan,2009.

4] R.W. Carey, P.A. Adams, S.G. Azevedo, R.G. Beeler, C.B. Foxworthy, T.M. Frazier,

5] Java Agent Development Framework Web site, http://jade.tilab.com/.6] The WebDAV Resources Web site, http://www.webdav.org/.7] Wikipedia, List of file formats, http://en.wikipedia.org/wiki/List of file formats.

https://lasers.llnl.gov/

http://jade.tilab.com/

http://www.webdav.org/

http://en.wikipedia.org/wiki/List_of_file_formats

experiment archive, analysis, and visualization at the national ignition facility

Documents