introduction to the binx library edikt project team ted wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk...

Post on 28-Mar-2015

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Introduction to the BinX Librarythe BinX Library

eDIKT project teameDIKT project team

Ted Wen Ted Wen tedwen@nesc.ac.uktedwen@nesc.ac.uk

Robert Carroll Robert Carroll robertc@nesc.ac.ukrobertc@nesc.ac.uk

AgendaAgenda

About the BinX projectAbout the BinX project A brief introduction to the BinX A brief introduction to the BinX

languagelanguage Introduction to the BinX libraryIntroduction to the BinX library Advanced API to the BinX libraryAdvanced API to the BinX library Use cases and requirementsUse cases and requirements

Dr Bob MannDr Bob Mann Dr Chris MaynardDr Chris Maynard

DiscussionDiscussion

About the BinX About the BinX projectproject

The problemThe problem

XML is useful to represent metadataXML is useful to represent metadata Scientific datasets can be too large in Scientific datasets can be too large in

XMLXML Most scientific data are in binary filesMost scientific data are in binary files Binary data files are not all Binary data files are not all

standardizedstandardized Binary data files are platform-Binary data files are platform-

dependentdependent

BinX – a solutionBinX – a solution

Initially designed for the Grid environmentInitially designed for the Grid environment Annotate data schema for any binary fileAnnotate data schema for any binary file Data elements are marked up in XMLData elements are marked up in XML Describe three levels of features in a Describe three levels of features in a

binary filebinary file Underlying physical representation (byte order)Underlying physical representation (byte order) Primitive data types (integer, float)Primitive data types (integer, float) Structure of the dataset (array, table)Structure of the dataset (array, table)

The BinX project at The BinX project at eDIKTeDIKT

Implementing a software library for Implementing a software library for BinXBinX

Develop a series of tools based on Develop a series of tools based on the librarythe library

Choose C++ for performanceChoose C++ for performance Write portable code for different Write portable code for different

platformsplatforms Robust and easy to useRobust and easy to use

Development statusDevelopment status

Requirement gathering from July Requirement gathering from July 20022002

Development started in October 2002Development started in October 2002 Prototype finished in December 2002Prototype finished in December 2002 Alpha version complete in April 2003Alpha version complete in April 2003 Beta version to be released in June Beta version to be released in June

20032003

The deliverablesThe deliverables

The BinX libraryThe BinX library Compiled code on different platformsCompiled code on different platforms Source code with Open Source licenseSource code with Open Source license

DocumentationDocumentation User’s guideUser’s guide Developer’s guideDeveloper’s guide

Utilities and examplesUtilities and examples

The BinX The BinX LanguageLanguage

What is BinX?What is BinX?

The Binary XML Description The Binary XML Description LanguageLanguage

A language for annotating binary data A language for annotating binary data filesfiles

It describes data types, data It describes data types, data structures and attributes such as byte structures and attributes such as byte orderorder

A BinX document is an XML file with A BinX document is an XML file with metadata of a binary data filemetadata of a binary data file

A BinX documentA BinX document <<dataset dataset

byteOrderbyteOrder=“bigEndian”>=“bigEndian”> <<definitionsdefinitions>>

<defineType <defineType typeNametypeName=“myTyp”>=“myTyp”>

<arrayFixed><arrayFixed> <character-8/><character-8/> <dim <dim indexToindexTo=“9”/>=“9”/>

</arrayFixed></arrayFixed> </defineType></defineType>

</</definitionsdefinitions>> <<filefile srcsrc=“=“myfile.binmyfile.bin”>”>

<useType <useType typeNametypeName=“myTyp”/>=“myTyp”/> <integer-32 <integer-32 varNamevarName=“X” />=“X” />

</</filefile>> </</datasetdataset>>

Root element

Data class section

Data instance section

Abstract data type

Data elementsData elements

Primitive data elementsPrimitive data elements Byte, character, integer, realByte, character, integer, real

Complex data elementsComplex data elements Arrays, struct, unionArrays, struct, union

User-defined data elementsUser-defined data elements

Primitive data typesPrimitive data types BitBit

<bit-1><bit-1> CharacterCharacter

<character-8><character-8> <unicodeCharacter-16><unicodeCharacter-16> <unicodeCharacter-32><unicodeCharacter-32>

IntegerInteger <byte-8><byte-8> <short-16>, <unsignedShort-16><short-16>, <unsignedShort-16> <integer-32>, <unsignedInteger-32><integer-32>, <unsignedInteger-32> <longInteger-64>, <unsignedLongInteger-64><longInteger-64>, <unsignedLongInteger-64>

RealReal <ieeeFloat-32><ieeeFloat-32> <ieeeDouble-64><ieeeDouble-64> <ieeeQuadruple-128><ieeeQuadruple-128>

Complex data typesComplex data types ArraysArrays

Repetitive collection of any data elementRepetitive collection of any data element MultidimensionalMultidimensional Three types of arraysThree types of arrays

Fixed length arrayFixed length array Variable-length arrayVariable-length array Streamed arrayStreamed array

StructStruct A sequence of data elementsA sequence of data elements

UnionUnion One of a group of possible data elements One of a group of possible data elements

conditional to the discriminantconditional to the discriminant

ArraysArrays Fixed-length arrayFixed-length array

<arrayFixed><arrayFixed> <ieeeDouble-64/><ieeeDouble-64/> <dim indexTo=“3” <dim indexTo=“3”

name=“X” />name=“X” /> <dim indexTo=“4” <dim indexTo=“4”

name=“Y” />name=“Y” /> <dim indexTo=“5” <dim indexTo=“5”

name=“Z” />name=“Z” /> </arrayFixed></arrayFixed>

Variable-length arrayVariable-length array <arrayVariable sizeRef=“byte-<arrayVariable sizeRef=“byte-

8”>8”> <ieeeFloat-32 /><ieeeFloat-32 /> <dim indexTo=“7”/><dim indexTo=“7”/> <dimVariable/><dimVariable/>

<arrayVariable><arrayVariable>

Streamed arrayStreamed array <arrayStreamed><arrayStreamed>

<byte-8/><byte-8/> <dimStreamed/><dimStreamed/>

</arrayStreamed></arrayStreamed>

StructStruct

<struct><struct> <short-16 varName=“ID” /><short-16 varName=“ID” /> <integer-32 varName=“Count” /><integer-32 varName=“Count” /> <ieeeDouble-64 varName=“Var” /><ieeeDouble-64 varName=“Var” />

</struct></struct>

UnionUnion <union><union>

<discriminant><discriminant> <byte-8/><byte-8/>

</discriminant></discriminant> <case discriminantValue=“32”><case discriminantValue=“32”>

<ieeeFloat-32 /><ieeeFloat-32 /> </case></case> <case discriminantValue=“64”><case discriminantValue=“64”>

<ieeeDouble-64 /><ieeeDouble-64 /> </case></case> <case discriminantValue=“0”><case discriminantValue=“0”>

<void-0 /><void-0 /> </case></case>

</union></union>

User-defined data typeUser-defined data type

<defineType <defineType typeName=“HeaderStruct”>typeName=“HeaderStruct”> <struct><struct>

<character-8 varName=“A”/><character-8 varName=“A”/> <character-8 varName=“B” /><character-8 varName=“B” /> <integer-32 varName=“Length” /><integer-32 varName=“Length” />

</struct></struct> <defineType><defineType>

Data elements as Data elements as instancesinstances

<file src=“myfile.bin”><file src=“myfile.bin”> <short-16 varName=“id”/><short-16 varName=“id”/> <arrayFixed varName=“name”><arrayFixed varName=“name”>

<character-8 /><character-8 /> <dim indexTo=“7” /><dim indexTo=“7” />

</arrayFixed></arrayFixed> <struct varName=“record”><struct varName=“record”>

<short-16 /><short-16 /> <ieeeFloat-32 /><ieeeFloat-32 />

</struct></struct> </file></file>

Reference defined Reference defined elementselements

<definitions><definitions> <defineType typeName=“A”><defineType typeName=“A”>

<struct><struct> <short-16/><short-16/> <integer-32/><integer-32/>

</struct></struct> <defineType><defineType>

</definitions></definitions>

<file src=“myfile.bin”><file src=“myfile.bin”> <useType typeName=“A” varName=“FirstUse”/><useType typeName=“A” varName=“FirstUse”/> <useType typeName=“A” varName=“SecondUse”/><useType typeName=“A” varName=“SecondUse”/>

</file></file>

The BinX LibraryThe BinX Library

Alpha versionAlpha version

Fundamental Fundamental requirementsrequirements

Access to data elements in binary files via Access to data elements in binary files via BinXBinX Parse the BinX documentParse the BinX document Build in-memory data structuresBuild in-memory data structures Read data values from the binary fileRead data values from the binary file

Automatic conversionAutomatic conversion Byte orderingByte ordering PaddingPadding

Producing BinX document and binary dataProducing BinX document and binary data Generate BinX document for data structuresGenerate BinX document for data structures Save assigned data values into binary filesSave assigned data values into binary files

General use casesGeneral use cases

Data conversion (byte order)Data conversion (byte order) Data extraction (sub-dataset)Data extraction (sub-dataset) Data combination (two arrays to Data combination (two arrays to

one)one) Data presentation (browse, pure Data presentation (browse, pure

XML)XML)

BinX ComponentsBinX Components The library has core functionality to support The library has core functionality to support

generic utilities and applicationsgeneric utilities and applications

Applications

Utilities

BinX LibraryCore

BinX core functionality Parse BinX document Read binary data

Generic tools Data conversion Extraction Packing/UnpackingApplications Domain-specific

The BinX library coreThe BinX library core Input: Input: SchemaBinXSchemaBinX, binary data file, binary data file Output: Output: DataBinXDataBinX, In-memory , In-memory

datasetdataset<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

In-memoryData structure

(Values loadedon demand)

<short-16>100</short-16>

<short-16>100</short-16>

The BinX UtilitiesThe BinX Utilities

DataBinX generatorDataBinX generator DataBinX splitterDataBinX splitter SchemaBinX creatorSchemaBinX creator Binary file indexerBinary file indexer

DataBinX generatorDataBinX generator Put binary data inside XMLPut binary data inside XML

For browsing, web service return, query For browsing, web service return, query result setresult set

<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

<short-16>100</short-16>

<short-16>100</short-16>

DataBinX splitterDataBinX splitter

The reverse of DataBinX generatorThe reverse of DataBinX generator Generate binary file for testing, Generate binary file for testing,

transportationtransportation Cross-platform (byte order)Cross-platform (byte order)

<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

<short-16>100</short-16>

<short-16>100</short-16>

SchemaBinX creatorSchemaBinX creator

GUI and Web-based utilitiesGUI and Web-based utilities Build BinX document interactivelyBuild BinX document interactively Create a BinX document based on Create a BinX document based on

anotheranother

Binary file indexerBinary file indexer

Generating indices for binary data Generating indices for binary data filesfiles Such indices can be used for fast data Such indices can be used for fast data

accessaccess<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

XY

00000004

Applications for Applications for astronomyastronomy

FITS and VOTable conversionFITS and VOTable conversion

DataBinX Utility

BinX libraryCore

SIMPLE = T… …END

01010101

SIMPLE = T… …END

01010101

<?xml version=.<VOTABLE>… …

</VOTABLE>

<?xml version=.<VOTABLE>… …

</VOTABLE>

FITS →DataBinX FITS →DataBinX →VOTable→VOTable

FITS to VOTable conversionFITS to VOTable conversion

DataBinx Utility

FITSFITS

SchemaBinX

SchemaBinX

Preprocessor

DataBinx

DataBinx

VOTable

VOTable

XSLTXSLT

XSLTtransformer

VOTable→DataBinX→FITVOTable→DataBinX→FITSS

VOTable to FITS conversionVOTable to FITS conversion

XSLTtransformer

VOTable

VOTable

XSLTXSLT

Preprocessor

DataBinx

DataBinx

FITSFITS

SchemaBinX

SchemaBinX

DataBinxUtility

BinaryData

BinaryData

Postprocessor

FITSHeader

FITSHeader

FITS-VOTable FITS-VOTable experimentexperiment

Sample FITS fileSample FITS file A data table of 82 rows X 20 fieldsA data table of 82 rows X 20 fields File size: 37KBFile size: 37KB

Generated DataBinx by DataBinx Generated DataBinx by DataBinx utilityutility Time spent: 268 msTime spent: 268 ms DataBinx document size: 1.2MBDataBinx document size: 1.2MB

VOTable transformed by MSXMLVOTable transformed by MSXML Time spent: about 1 secondTime spent: about 1 second VOTable document size: 51KBVOTable document size: 51KB

Possible future releasesPossible future releases

DataBinX parsingDataBinX parsing Utilities (GUI BinX editor)Utilities (GUI BinX editor) XPath-based data queryXPath-based data query DFDL supportDFDL support Preserving special tagsPreserving special tags

For comments, application-specific tags For comments, application-specific tags Text file supportText file support

Features or issues to Features or issues to considerconsider

Converting floating point numbersConverting floating point numbers 80-bit, 96-bit, 128-bit floating point80-bit, 96-bit, 128-bit floating point

Array manipulation (slice, section)Array manipulation (slice, section) SAX-based XML document parsingSAX-based XML document parsing

Use cases in place of DOM parsingUse cases in place of DOM parsing Built in the library or as add-on component?Built in the library or as add-on component?

Database supportDatabase support Annotating database tables?Annotating database tables? Query database tables through BinX?Query database tables through BinX?

Java version of the libraryJava version of the library Keeping exactly the same features with the C++ Keeping exactly the same features with the C++

version?version? Supporting XQuerySupporting XQuery

Query binary data files with XQuery on BinXQuery binary data files with XQuery on BinX

SupportSupport

For problems of usage:For problems of usage: http://www.edikt.org/binxhttp://www.edikt.org/binx (coming (coming

soon)soon) support@edikt.orgsupport@edikt.org

For requirements and suggestions:For requirements and suggestions: tedwen@edikt.orgtedwen@edikt.org robertc@edikt.orgrobertc@edikt.org

top related