xml for scientific computing

57
XML for Scientific XML for Scientific Computing Computing Several case studies for Several case studies for XML data in scientific XML data in scientific computing computing

Upload: sierra

Post on 09-Jan-2016

67 views

Category:

Documents


2 download

DESCRIPTION

XML for Scientific Computing. Several case studies for XML data in scientific computing. Overview. We will present case studies of the following systems XSIL: Extensible Scientific Interchange Language XDMF: Extensible Data Model and Format Discipline Specific XML: ChemicalML - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML for Scientific Computing

XML for Scientific XML for Scientific ComputingComputing

Several case studies for XML Several case studies for XML data in scientific computingdata in scientific computing

Page 2: XML for Scientific Computing

OverviewOverview We will present case studies of the following We will present case studies of the following

systemssystems• XSIL: Extensible Scientific Interchange LanguageXSIL: Extensible Scientific Interchange Language• XDMF: Extensible Data Model and FormatXDMF: Extensible Data Model and Format• Discipline Specific XML: ChemicalMLDiscipline Specific XML: ChemicalML• Gateway Application Descriptors (plus Castor)Gateway Application Descriptors (plus Castor)

XML by itself is just markup, like HTML without a XML by itself is just markup, like HTML without a browser. Each of the above uses a related set of browser. Each of the above uses a related set of software to manipulate the XML data.software to manipulate the XML data.

We present several examples of XML to give you We present several examples of XML to give you an overview. an overview.

We conclude with some remarks about standards We conclude with some remarks about standards for science applications.for science applications.

Page 3: XML for Scientific Computing

Overview of Case StudiesOverview of Case Studies XSIL and XDMF are examples of XSIL and XDMF are examples of

representing (meta)data for scientific representing (meta)data for scientific computing.computing.• Concentrate on data structures, data I/O.Concentrate on data structures, data I/O.• Meaning of data not described.Meaning of data not described.

ChemicalML marks up domain specific data.ChemicalML marks up domain specific data.• Meaningfully describes data content.Meaningfully describes data content.

Gateway application data describes science Gateway application data describes science codes themselves.codes themselves.

All possess a data object model.All possess a data object model.• Object oriented data descriptions guide the Object oriented data descriptions guide the

markup tag definitions.markup tag definitions.

Page 4: XML for Scientific Computing

XSILXSIL

XML tags for generic scientific XML tags for generic scientific data markup, with related data markup, with related

Java software. Java software.

Page 5: XML for Scientific Computing

XSILXSIL Developed in support of several projects led by Developed in support of several projects led by

CACR.CACR.• Example: LIGO, Digital SkyExample: LIGO, Digital Sky• Roy Williams, CalTech.Roy Williams, CalTech.

See See http://www.cacr.caltech.edu/SDA/xsil/http://www.cacr.caltech.edu/SDA/xsil/ for for more information and free software.more information and free software.

XSIL developed for astronomical and gravitational XSIL developed for astronomical and gravitational wave communities.wave communities.

But provides general purpose tags.But provides general purpose tags. Also comes with software for building Java Also comes with software for building Java

applications that manipulate, display XSIL applications that manipulate, display XSIL documents.documents.

Page 6: XML for Scientific Computing

XSIL TagsXSIL Tags

XSIL defines a small number of tagsXSIL defines a small number of tags• XSIL: base container for the object model.XSIL: base container for the object model.• CommentComment• Param: an arbitrary name/value pairParam: an arbitrary name/value pair• Time: describes time, plus formatTime: describes time, plus format• Table: data in columns and rowsTable: data in columns and rows• Array: table data with specific sizeArray: table data with specific size• URL: URL: • Streams: for handling dataStreams: for handling data

We’ll now go over some of these in detail. We’ll now go over some of these in detail.

Page 7: XML for Scientific Computing

The XSIL Tag IThe XSIL Tag I

XSIL documents map to a document XSIL documents map to a document object model with associated object model with associated handling code.handling code.

The root tag for XSIL is <XSIL>:The root tag for XSIL is <XSIL>:<XSIL Name=“Example” Type=“Examples.MyExample><XSIL Name=“Example” Type=“Examples.MyExample>……</XSIL></XSIL>

Type Type points to the Java code that should points to the Java code that should process this file. process this file. • It’s some file called MyExample.java in the package It’s some file called MyExample.java in the package

Examples.Examples.

Page 8: XML for Scientific Computing

The XSIL Tag IIThe XSIL Tag II

XSIL tags can be nested if different parts XSIL tags can be nested if different parts of the XSIL document need to be handled of the XSIL document need to be handled by different codes.by different codes.<XSIL Name=“Example” Type=“Examples.MyExample”><XSIL Name=“Example” Type=“Examples.MyExample”>

…… <XSIL Name=“Subsection” Type=“Examples.Subsection”><XSIL Name=“Subsection” Type=“Examples.Subsection”>……</XSIL> </XSIL></XSIL> </XSIL>

XSIL tags thus are the base container in a XSIL tags thus are the base container in a generic object hierarchy.generic object hierarchy.• MyExample object “has a” Subsection objectMyExample object “has a” Subsection object

Page 9: XML for Scientific Computing

More On Object ContainersMore On Object Containers Consider an Electromagnetics example:Consider an Electromagnetics example:

• A target is represented as a grid for finite A target is represented as a grid for finite difference integration of Maxwell’s eqns.difference integration of Maxwell’s eqns.

• The base input file contains one or more The base input file contains one or more materials. materials.

• Each material has specific EM properties.Each material has specific EM properties. If translated to XSIL, could look like this:If translated to XSIL, could look like this:

<XSIL Name=“EMRoot” Type=“CEA.Root”><XSIL Name=“EMRoot” Type=“CEA.Root”><!– Some general parameters --><!– Some general parameters --><XSIL Name=“EMMaterial” Type=“CEA.Material”><XSIL Name=“EMMaterial” Type=“CEA.Material”><!– Some info describing the material. --><!– Some info describing the material. --></XSIL></XSIL>

</XSIL></XSIL>

Page 10: XML for Scientific Computing

ParametersParameters

Each XSIL tag can contain one or Each XSIL tag can contain one or more parameters.more parameters.

Params are arbitrary name/value Params are arbitrary name/value pairs.pairs.

Params optionally have units.Params optionally have units.<XSIL …><XSIL …>

<Param Name=“Color”>Red</Param><Param Name=“Color”>Red</Param>

<Param Name=“Weight” Unit=“kg”>3.14</Param><Param Name=“Weight” Unit=“kg”>3.14</Param>

</XSIL></XSIL>

Page 11: XML for Scientific Computing

TablesTables

Params associate one value per nameParams associate one value per name Tables support multiple valuesTables support multiple values

• A Table row can have any number of values.A Table row can have any number of values. Each table contains column definitions Each table contains column definitions

followed by an arbitrary number of followed by an arbitrary number of entries.entries.

Tables get data from streams (discussed Tables get data from streams (discussed later). later).

Page 12: XML for Scientific Computing

Example TableExample Table

<XSIL…>…<Table> <Column Name=“Color” Type=“string”/> <Column Name=“Weight” Type=“float” Unit=“kg”/> <Column Name=“Length” Type=“float” Unit=“meter”/> <Stream Type=“Local” Delimiter=“,”>

“Red”,100.2,0.2“Green”,21.7,1.2

</Stream></Table></XSIL>

Page 13: XML for Scientific Computing

XSIL ArraysXSIL Arrays

XSIL arrays are similar to Fortran and C XSIL arrays are similar to Fortran and C arrays.arrays.

For mixed type data, use Tables.For mixed type data, use Tables. If all data is the same (integers, floats), use If all data is the same (integers, floats), use

Arrays.Arrays.<Array Type=“int”><Array Type=“int”>

<Dim Name=“x-dim”>2</Dim><Dim Name=“x-dim”>2</Dim><Dim Name=“y-dim”>2</Dim><Dim Name=“y-dim”>2</Dim><Stream Type=“Local” Delimiter=“,”><Stream Type=“Local” Delimiter=“,”>

137,42137,428,138,13

</Stream></Stream>

</Array</Array>>

Page 14: XML for Scientific Computing

XSIL StreamsXSIL Streams XSIL Streams can be used to load data XSIL Streams can be used to load data Data sources can beData sources can be

• In the file itself (as shown in previous examples).In the file itself (as shown in previous examples).• From files on diskFrom files on disk• From URLs (http://, ftp://, and file:// supported)From URLs (http://, ftp://, and file:// supported)

Loading data from diskLoading data from disk<Stream Type=“Remote” <Stream Type=“Remote” EncodingEncoding=“Littleendian”>=“Littleendian”>

/home/user1/data/datafile.dat/home/user1/data/datafile.dat</Stream></Stream>

Loading data from URLsLoading data from URLs<Stream Type=“Remote”><Stream Type=“Remote”>

http://my.server.edu/XSILdata/datafile.dathttp://my.server.edu/XSILdata/datafile.dat</Stream></Stream>

Page 15: XML for Scientific Computing

Ex: Use XSIL to describe input dataEx: Use XSIL to describe input data

<XSIL Name=“InputData” Type=“Examples.InDataHandler”> <XSIL Name=“Target 1” Type=“Examples.Target”> <Param Name=“Target”>Scud</Param> <Param Name=“dx”>0.1</Param> <Array> <Dim Name=“X-Dimension”>100</Dim> <Dim Name=“Y-Dimension”>100</Dim> <Stream Type=“Remote”>

/home/mpierce/data/mydata.dat </Stream> </Array> </XSIL> <XSIL Name=“Target 2” Type=“Examples.Target”> <!– Another target --> </XSIL></XSIL>

Page 16: XML for Scientific Computing

Table and Array TypesTable and Array Types

Table and Array data can be (in bits)Table and Array data can be (in bits)• boolean (1)boolean (1)• byte (8)byte (8)• short (16)short (16)• int (32)int (32)• long (64)long (64)• float (32)float (32)• double (64)double (64)• floatComplex (64)floatComplex (64)• doubleComplex (128)doubleComplex (128)• string (arbitrary length)string (arbitrary length)

Page 17: XML for Scientific Computing

Using XSILUsing XSIL

The previous example just marks up data.The previous example just marks up data. XSIL also comes with Java bindings thatXSIL also comes with Java bindings that

• Read the file and parse it.Read the file and parse it.• Extract parameter values, units, etc.Extract parameter values, units, etc.• Read in and manipulate tables, arraysRead in and manipulate tables, arrays

Central ideas:Central ideas: • Each XSIL tag corresponds to a Java classEach XSIL tag corresponds to a Java class• XSIL’s Type points to your custom driver code XSIL’s Type points to your custom driver code

that uses the XSIL classes.that uses the XSIL classes.

Page 18: XML for Scientific Computing

XSIL Coding ExampleXSIL Coding Example

Consider following small XSIL Consider following small XSIL exampleexample

<XSIL Type=“Examples.MyExample”><XSIL Type=“Examples.MyExample”>

<Param Name=“x0”>12.0</Param><Param Name=“x0”>12.0</Param>

<Param Name=“dx”>0.1</Param><Param Name=“dx”>0.1</Param>

</XSIL></XSIL>

Page 19: XML for Scientific Computing

XSIL Java Code ExampleXSIL Java Code Examplepackage extensions.Examplespackage extensions.Examplesimport org.escience.XSILimport org.escience.XSILpublic class MyExample {public class MyExample {

String x0,dx;String x0,dx;XSIL root;XSIL root;public MyExample(String xsilFileName) {public MyExample(String xsilFileName) {

root=new XSIL(xsilFileName);root=new XSIL(xsilFileName);}}public void construct() {public void construct() {

for(int i=0;i<root.getChildCount();i++) {for(int i=0;i<root.getChildCount();i++) { XSIL x=root.getChild(i);XSIL x=root.getChild(i); if(x instance of Param) {if(x instance of Param) { Param p=(Param)x;Param p=(Param)x; if(p.getName().equals(“x0”)) x0=p.getText();if(p.getName().equals(“x0”)) x0=p.getText(); if(p.getName().equals(“dx”)) dx=p.getText();if(p.getName().equals(“dx”)) dx=p.getText();

}}}}}}}}

Page 20: XML for Scientific Computing

Code NotesCode Notes

All classes (Param, Table, etc.) All classes (Param, Table, etc.) extend the XSIL class.extend the XSIL class.

Pass the XSIL class root the XSIL path Pass the XSIL class root the XSIL path through the constructor.through the constructor.• XSIL handles all parsing XSIL handles all parsing

XSIL class defines getChildCount(), XSIL class defines getChildCount(), getChild() methods.getChild() methods.

Param class defines getName() and Param class defines getName() and getText() methods.getText() methods.

Page 21: XML for Scientific Computing

XSIL SummaryXSIL Summary

Defines a small set of general Defines a small set of general purpose tags for scientific data.purpose tags for scientific data.

Data itself is not directly marked up.Data itself is not directly marked up.• Read in through streamsRead in through streams

XSIL software maps Java classes to XSIL software maps Java classes to XSIL tags.XSIL tags.• Convenient for working with XSIL docs. Convenient for working with XSIL docs. • DOM classes are much more DOM classes are much more

cumbersome to use.cumbersome to use.

Page 22: XML for Scientific Computing

XDMFXDMF

A data model geared toward A data model geared toward finite element codes, with finite element codes, with

associated software in C++, associated software in C++, Java, and TCLJava, and TCL

Page 23: XML for Scientific Computing

ICE XDMFICE XDMF ICE (Interdisciplinary Computing ICE (Interdisciplinary Computing

Environment) is a comprehensive project at Environment) is a comprehensive project at ARL MSRC that attempts to provide a ARL MSRC that attempts to provide a common software platform for DoD scientific common software platform for DoD scientific codes.codes.• Jerry Clarke, lead developerJerry Clarke, lead developer

XDMF (Extensible Data Model and Format) XDMF (Extensible Data Model and Format) provides a common data format for several provides a common data format for several different codes different codes • Primary focus: finite element codes for fluid Primary focus: finite element codes for fluid

dynamics and structural mechanics.dynamics and structural mechanics.• XDMF and related software provides the XDMF and related software provides the

backbone for loosely coupling applications and backbone for loosely coupling applications and visualization.visualization.

Page 24: XML for Scientific Computing

XDMF DesignXDMF Design

XDMF divides data into “light” and XDMF divides data into “light” and “heavy” types.“heavy” types.

Light data, or metadata, is formatted Light data, or metadata, is formatted in XML and will be described in more in XML and will be described in more depth.depth.

Heavy data is in HDF5 and not Heavy data is in HDF5 and not presented here.presented here.

Page 25: XML for Scientific Computing

XDMF Basic ConceptsXDMF Basic Concepts

XDMF basic tags are <DataStructure> and XDMF basic tags are <DataStructure> and <DataTransform><DataTransform>

<DataStructure> defines the actual data.<DataStructure> defines the actual data. <DataTransform> defines the area of <DataTransform> defines the area of

interest (AOI) in the data.interest (AOI) in the data.• AOI defined by coordinates, a function, or a AOI defined by coordinates, a function, or a

hyperslab.hyperslab. <DataTransform> contains one or more <DataTransform> contains one or more

<DataStructures><DataStructures>• The transform defines how the data structure The transform defines how the data structure

will be filtered. will be filtered.

Page 26: XML for Scientific Computing

Simple Data StructureSimple Data Structure

The example below is for 655 XYZ values The example below is for 655 XYZ values in the indicated HDF5 file.in the indicated HDF5 file.

<DataStructure Name="Some XYZ Data"<DataStructure Name="Some XYZ Data"

Type="Float"Type="Float"

Dimensions="655 3">Dimensions="655 3">

MyData.h5:/MyXYZdataMyData.h5:/MyXYZdata

</DataStructure></DataStructure> Simple character data can also be included Simple character data can also be included

directly the XML document. directly the XML document.

Page 27: XML for Scientific Computing

Data Structure for Mesh Data Structure for Mesh Connections and PressuresConnections and Pressures

<DataStructure<DataStructure

Name="Connections"Name="Connections"

Type="Int"Type="Int"

Precision="8"Precision="8"

Dimensions="100 8" >Dimensions="100 8" >

MyData.h5:/MyConnsMyData.h5:/MyConns

</DataStructure></DataStructure>

<DataStructure<DataStructure

Name="Pressure"Name="Pressure"

Type="Float"Type="Float"

Precision="8"Precision="8"

Dimensions="100">Dimensions="100">

MyData.h5:/MyPressureMyData.h5:/MyPressure

</DataStructure></DataStructure>

Page 28: XML for Scientific Computing

Data Structure Attribute SummaryData Structure Attribute Summary

<DataStructure<DataStructure Name= "Any name " Some meaningful name to Name= "Any name " Some meaningful name to

the ownerthe owner Rank="NumberOfDimensions" Redundant Rank="NumberOfDimensions" Redundant

information information Dimensions="Kdim Jdim Idim" The slowest Dimensions="Kdim Jdim Idim" The slowest

varying dimension is listed firstvarying dimension is listed first Type="Char | Float | Int | Compound" Default is Type="Char | Float | Int | Compound" Default is

FloatFloat Precision="BytesPerElement" Default is 4Precision="BytesPerElement" Default is 4 Format="XML | HDF" Default is XML Format="XML | HDF" Default is XML >>

Page 29: XML for Scientific Computing

XDMF Array TypesXDMF Array Types

XDMF array entries can have these XDMF array entries can have these types:types:• Integer Integer • Float Float • CharChar

All are 4 bytes by default, can be All are 4 bytes by default, can be increased to 8 bytes.increased to 8 bytes.

Page 30: XML for Scientific Computing

DataTransformDataTransform

DataTransform defines a way for the DataTransform defines a way for the raw data to be filtered raw data to be filtered • Gives a certain Area of Interest in data Gives a certain Area of Interest in data

set.set. Possible transforms:Possible transforms:

• Coordinate: Select an particular areaCoordinate: Select an particular area• Function: Define simple algorithm for Function: Define simple algorithm for

selecting areaselecting area• Hyperslab: Define start, stride, and Hyperslab: Define start, stride, and

count for each dimension of an array.count for each dimension of an array.

Page 31: XML for Scientific Computing

Hyperslab Transform ExampleHyperslab Transform Example The following markup instructs the processing The following markup instructs the processing

code to apply an hyperslab transform to a 4-D code to apply an hyperslab transform to a 4-D array.array.

The first data structure defines the hyperslab:The first data structure defines the hyperslab:• 0000 are the starting points for each dim0000 are the starting points for each dim• 2221 are the strides for each dim2221 are the strides for each dim• 25 50 75 3 are the step sizes for each dim25 50 75 3 are the step sizes for each dim

The second data structure gives the raw data, a The second data structure gives the raw data, a 100x200x300x3 array in the noted HDF5 file.100x200x300x3 array in the noted HDF5 file.

The transform will produce a 25x50x75x3 region The transform will produce a 25x50x75x3 region that includes every other plane of the original that includes every other plane of the original data in the original data region [0,0,0,0]-data in the original data region [0,0,0,0]-[50,100,150,2].[50,100,150,2].

Page 32: XML for Scientific Computing

Hyperslab Transform ExampleHyperslab Transform Example

<DataTransform<DataTransform

Dimensions="25 50 75 Dimensions="25 50 75 3"3"

Type="HyperSlab">Type="HyperSlab">

<DataStructure<DataStructure

Dimensions="3 4"Dimensions="3 4"

Format="XML">Format="XML">

0 0 0 0 2 2 2 1 25 0 0 0 0 2 2 2 1 25 50 75 350 75 3

</DataStructure></DataStructure>

<DataStructure<DataStructure

Name="Points"Name="Points"

Dimensions="100 Dimensions="100 200 300 3"200 300 3"

Format="HDF">Format="HDF">

MyData.h5:/XYZMyData.h5:/XYZ

</DataStructure></DataStructure>

</DataTransform></DataTransform>

Page 33: XML for Scientific Computing

Data OrganizationData Organization

DataStructures and DataTransform DataStructures and DataTransform constitute XDMF’s data representation.constitute XDMF’s data representation.

XDMF Domain tags are used as XDMF Domain tags are used as arbitrary containers.arbitrary containers.

Domains contain grids, grids contain Domains contain grids, grids contain topologies, geometries and attributes, topologies, geometries and attributes, as well as data structures.as well as data structures.

Attributes include scalars, vectors, Attributes include scalars, vectors, tensorstensors

Page 34: XML for Scientific Computing

An XDMF ExampleAn XDMF Example<Domain Name="Example #1"><Domain Name="Example #1"> <Grid Name="My Hex Grid with <Grid Name="My Hex Grid with

Pressure">Pressure"> <Topology Type="Hexahedron"<Topology Type="Hexahedron" Dimensions="100"Dimensions="100" Order="7 6 5 4 3 2 1 0">Order="7 6 5 4 3 2 1 0"> <DataStructure<DataStructure Name="Connections"Name="Connections" Type="Int"Type="Int" Precision="8"Precision="8" Dimensions="100 8" >Dimensions="100 8" > MyData.h5:/MyConnsMyData.h5:/MyConns </DataStructure></DataStructure> </Topology></Topology>

(continued in next column)(continued in next column)

<Geometry Type="XYZ"><Geometry Type="XYZ"> <DataStructure Name="XYZ <DataStructure Name="XYZ

Data"Data" Type="Float"Type="Float" Dimensions="655 3">Dimensions="655 3"> MyData.h5:/MyXYZdataMyData.h5:/MyXYZdata </DataStructure></DataStructure> </Geometry></Geometry> <Attribute Type="Scalar“ <Attribute Type="Scalar“

Center="Cell">Center="Cell"> <DataStructure <DataStructure

Name="Pressure"Name="Pressure" Type="Float"Type="Float" Precision="8"Precision="8" Dimensions="100"> Dimensions="100"> MyData.h5:/MyPressureMyData.h5:/MyPressure </DataStructure></DataStructure> </Attribute></Attribute> </Grid></Grid></Domain></Domain>

Page 35: XML for Scientific Computing

Review of ExampleReview of Example

Recall XDMF is primarily for structured and Recall XDMF is primarily for structured and unstructured finite element grids.unstructured finite element grids.• Input data includes grid connectivity info, grid Input data includes grid connectivity info, grid

geometry, and pressure values geometry, and pressure values The Domain contains a GridThe Domain contains a Grid The Grid is defined by Topology, The Grid is defined by Topology,

Geometry, and Attributes.Geometry, and Attributes. Topology, Attributes, and Geometry Topology, Attributes, and Geometry

contain data sources and structure info.contain data sources and structure info.

Page 36: XML for Scientific Computing

XDMF APIXDMF API

Like XSIL, XDMF treats the XML markup as Like XSIL, XDMF treats the XML markup as a set of instructions to be processed by a set of instructions to be processed by actual programs.actual programs.

XDMF defines an API of document XDMF defines an API of document processing engines.processing engines.• Core is in C++Core is in C++• ICE also provides Java and TCL APIs through ICE also provides Java and TCL APIs through

wrappers around core.wrappers around core. See See

http://www.arl.hpc.mil/ice/Examples/CodeIhttp://www.arl.hpc.mil/ice/Examples/CodeIntegration/DemoIceRt.cxxntegration/DemoIceRt.cxx for code example. for code example.

Page 37: XML for Scientific Computing

XDMF SummaryXDMF Summary

Provides a few general purpose tagsProvides a few general purpose tags Again, data is not directly marked up.Again, data is not directly marked up.

• Stored in HDF5Stored in HDF5 XDMF handled programmatically with XDMF handled programmatically with

APIs in C++, Java, Tcl.APIs in C++, Java, Tcl. More information:More information:

• http://www.arl.hpc.mil/ice/http://www.arl.hpc.mil/ice/

Page 38: XML for Scientific Computing

Comparison of XSIL and XDMFComparison of XSIL and XDMF

XSILXSIL• Larger tag setLarger tag set• Java APIJava API• Can read data that is Can read data that is

in document, on in document, on disk, from URLdisk, from URL

• Questionable Questionable performance and performance and memory efficiency memory efficiency for very large data for very large data sets.sets.

• Free and open Free and open sourcesource

XDMFXDMF• Uses HDF5 for large Uses HDF5 for large

data sets.data sets.• C++, Java, TCL APIs.C++, Java, TCL APIs.• Defines both data Defines both data

structures and structures and transform transform instructions.instructions.

• Supports arrays, but Supports arrays, but not mixed data not mixed data types (such as XSIL types (such as XSIL Tables).Tables).

• Integrated with ICE Integrated with ICE

Page 39: XML for Scientific Computing

Chemical Markup Chemical Markup LanguageLanguage

A domain specific XML A domain specific XML markup language.markup language.

Page 40: XML for Scientific Computing

CML IntroductionCML Introduction

XSIL and XDMF use XML to describe code XSIL and XDMF use XML to describe code input files and give simple processing input files and give simple processing instructions.instructions.

Tags describe data structure, not content.Tags describe data structure, not content. We now examine a domain specific We now examine a domain specific

example, the Chemical Markup Language.example, the Chemical Markup Language. Other domain markup languages:Other domain markup languages:

• Mathematics Markup Language (MathML)Mathematics Markup Language (MathML)• Geography Markup Language (GML)Geography Markup Language (GML)

Page 41: XML for Scientific Computing

XML for ChemistryXML for Chemistry

Goal: provide a common chemical data Goal: provide a common chemical data format that is an open, universal standard.format that is an open, universal standard.• Data representation is platform independentData representation is platform independent• Support structured searches of data banks.Support structured searches of data banks.• Provide a common format for software Provide a common format for software

(particularly visualization).(particularly visualization).• Support multidisciplinary data formats Support multidisciplinary data formats

(biology, math) through XML namespaces.(biology, math) through XML namespaces.• Provide a data object hierarchy suitable for Provide a data object hierarchy suitable for

object oriented programming.object oriented programming.

Page 42: XML for Scientific Computing

CML StructureCML Structure

Chemistry lends itself to object Chemistry lends itself to object container structurecontainer structure• Atoms have protons, neutrons, electronsAtoms have protons, neutrons, electrons• Molecules have atomsMolecules have atoms• Complex molecules and compounds are Complex molecules and compounds are

composed of molecules, molecular composed of molecules, molecular pieces (benzene rings, for example)pieces (benzene rings, for example)

CML defines these as data objects CML defines these as data objects with property fieldswith property fields

Page 43: XML for Scientific Computing

A Simple Example: GlycineA Simple Example: Glycine<molecule convention="MDLMol" <molecule convention="MDLMol"

id="glycine" title="GLYCINE">id="glycine" title="GLYCINE"> <date day="22" month="11" <date day="22" month="11"

year="1995">year="1995"> </date></date> <atomArray><atomArray> <atom id="a1"><atom id="a1"> <string <string

builtin="elementType">builtin="elementType">C</string>C</string>

<float <float builtin="x2">0.6424</float>builtin="x2">0.6424</float>

<float <float builtin="y2">0.4781</float>builtin="y2">0.4781</float>

</atom></atom> … …..</atomArray></atomArray>

<bondArray><bondArray> <bond id="b1"><bond id="b1"> <string <string

builtin="atomRef">a1</strbuiltin="atomRef">a1</string>ing>

<string <string builtin="atomRef">a2</strbuiltin="atomRef">a2</string>ing>

<string <string builtin="order">1</stringbuiltin="order">1</string>>

</bond></bond> … …..</bondArray></bondArray></molecule></molecule>

Page 44: XML for Scientific Computing

CML Example SoftwareCML Example Software

Page 45: XML for Scientific Computing

Previous SlidePrevious Slide

Browser tool, Jumbo-3.0 Browser tool, Jumbo-3.0 • User can display dozens of CML’d User can display dozens of CML’d

molecules.molecules.• Molecules can by rotated in display. Molecules can by rotated in display. • Display is rendered in SVG (Adobe Display is rendered in SVG (Adobe

plugin).plugin).• Molecule displayed is cholesterol. They Molecule displayed is cholesterol. They

also have glycine in database, but not also have glycine in database, but not as exciting to look at.as exciting to look at.

Page 46: XML for Scientific Computing

Gateway Application Gateway Application DescriptorsDescriptors

Describing scientific Describing scientific applications themselves with applications themselves with

XML and mapping to Java with XML and mapping to Java with Castor.Castor.

Page 47: XML for Scientific Computing

Gateway Application DescriptorsGateway Application Descriptors

Gateway is a computational web Gateway is a computational web portal for securely submitting and portal for securely submitting and monitoring jobs, transferring files, monitoring jobs, transferring files, and archiving information.and archiving information.

Gateway describes scientific Gateway describes scientific applications and host computers with applications and host computers with XML metadata.XML metadata.

This is used to provide general This is used to provide general purpose tools that can be used to purpose tools that can be used to build portals for specific applications.build portals for specific applications.

Page 48: XML for Scientific Computing

Application DescriptorsApplication Descriptors

Gateway describes scientific applications Gateway describes scientific applications and host machines in XML.and host machines in XML.

This is used to generate HTML forms This is used to generate HTML forms needed to collect information needed to needed to collect information needed to create batch queuing scripts and job create batch queuing scripts and job submission.submission.

The general object container scheme isThe general object container scheme is• Portals contain applicationsPortals contain applications• Applications contain hostsApplications contain hosts• Each also has a set of descriptive parameters.Each also has a set of descriptive parameters.

Page 49: XML for Scientific Computing

Example: ANSYS on GridsExample: ANSYS on Grids<Application><Application> <ApplicationName>ANSYS<ApplicationName>ANSYS </ApplicationName></ApplicationName> <Version>5.0</Version><Version>5.0</Version> <Parameter Name="IOStyle"><Parameter Name="IOStyle">

<Value>StandardIO</Value><Value>StandardIO</Value>

</Parameter></Parameter> <Parameter <Parameter

Name="NumberOfInFiles">Name="NumberOfInFiles"> <Value>1</Value><Value>1</Value> </Parameter></Parameter>

(continued on next column)(continued on next column)

<Host><Host> <HostName><HostName>

grids.ucs.indiana.edugrids.ucs.indiana.edu</HostName></HostName>

<HostIP>156.56.103.5</HostIP><HostIP>156.56.103.5</HostIP> <RemoteCopy>rcp<RemoteCopy>rcp </RemoteCopy></RemoteCopy>

<RemoteExec>rsh</RemoteExec<RemoteExec>rsh</RemoteExec>>

<WorkDir>/tmp</WorkDir><WorkDir>/tmp</WorkDir>

<QueueType>CSH</QueueType><QueueType>CSH</QueueType> <QsubPath>/usr/bin/csh<QsubPath>/usr/bin/csh </QsubPath></QsubPath> <ExecPath>echo<ExecPath>echo

</ExecPath></ExecPath> </Host></Host></Application></Application>

Page 50: XML for Scientific Computing

Java Data Object BindingsJava Data Object Bindings

As with other examples, the As with other examples, the descriptor does not do anything by descriptor does not do anything by itself.itself.

Must provide language bindings to Must provide language bindings to make it useful in programs.make it useful in programs.

We used Castor We used Castor (http://castor.exolab.org) to generate (http://castor.exolab.org) to generate classes for us.classes for us.

Page 51: XML for Scientific Computing

Castor for Data Object CreationCastor for Data Object Creation Direct mapping between Application tag and Java Direct mapping between Application tag and Java

object, for example.object, for example. Each object has necessary getter and setter Each object has necessary getter and setter

methods for manipulating data.methods for manipulating data. After making classes from XML schema (once), After making classes from XML schema (once),

load in XML file to program to create particular load in XML file to program to create particular data object instances (unmarshalled)data object instances (unmarshalled)

When program is done, modified data objects can When program is done, modified data objects can be marshalled back into XML file format.be marshalled back into XML file format.

We still have to write the Java code for specific We still have to write the Java code for specific uses, utility classes…. uses, utility classes….

Page 52: XML for Scientific Computing

Other markup languages Other markup languages and some comparisonand some comparison

Various shortcomings of Various shortcomings of programming and markup programming and markup

languageslanguages

Page 53: XML for Scientific Computing

XML SchemaXML Schema

XML Schema defines many built-in XML Schema defines many built-in typestypes• binary, boolean, byte, decimal, double, binary, boolean, byte, decimal, double,

float, int, long, short, stringfloat, int, long, short, string• And many moreAnd many more

Does not define standards forDoes not define standards for• ArraysArrays• Complex (real+imaginary) numbersComplex (real+imaginary) numbers

Page 54: XML for Scientific Computing

SOAPSOAP

Known as XML Remote Procedure Call Known as XML Remote Procedure Call protocol.protocol.• RPC is only one part of SOAPRPC is only one part of SOAP

Also defines encoding rules for data Also defines encoding rules for data exchange.exchange.

SOAP inherits all XML Schema Built-in Types SOAP inherits all XML Schema Built-in Types (see previous slide).(see previous slide).

Defines additional compound typesDefines additional compound types• Struct: arbitrary collection of types (say, strings Struct: arbitrary collection of types (say, strings

and floats) similar to XSIL table entry.and floats) similar to XSIL table entry.• Array: can contain primitive and compound types Array: can contain primitive and compound types

An array can be built out of arrays.An array can be built out of arrays.

Page 55: XML for Scientific Computing

HDF5 and XMLHDF5 and XML

Types includeTypes include• Integers Integers

2-64 bit, signed or unsigned, big or little endian2-64 bit, signed or unsigned, big or little endian

• Floats (32, 64 bit, BE or LE)Floats (32, 64 bit, BE or LE)• StringsStrings• ArraysArrays

Arbitrary compound typesArbitrary compound types See http://hdf.ncsa.uiuc.edu/HDF5/XML/See http://hdf.ncsa.uiuc.edu/HDF5/XML/

Page 56: XML for Scientific Computing

Compatibility and Missing FeaturesCompatibility and Missing Features No standard XML definitions for arrays No standard XML definitions for arrays

and “compound types” like XSIL tables.and “compound types” like XSIL tables.• We have several defs: SOAP, XSIL, XDMF, We have several defs: SOAP, XSIL, XDMF,

XML-HDF5XML-HDF5 Lack of built-in support for complex Lack of built-in support for complex

(real + imaginary) types (real + imaginary) types • XML, XML-HDF5, XDMF can easily define XML, XML-HDF5, XDMF can easily define

complex but not in standard way.complex but not in standard way.• Java does not have built-in complex type, Java does not have built-in complex type,

eithereither

Page 57: XML for Scientific Computing

More Missing FeaturesMore Missing Features

Varying support for integers, floats with Varying support for integers, floats with different sizes.different sizes.• C/C++ does not guarantee consistent bit C/C++ does not guarantee consistent bit

size.size. Binary data must specify Big Endian/Little Binary data must specify Big Endian/Little

Endian encoding for cross platform Endian encoding for cross platform compatibility.compatibility.• XML-HDF5, XSIL, XDMF all do thisXML-HDF5, XSIL, XDMF all do this• XML does notXML does not

XSIL does not have signed/unsigned XSIL does not have signed/unsigned