a domain-specific modeling language for scientific data composition and interoperability
DESCRIPTION
A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability. File Formats: Image Files. Organize and store digital images that are composed of either pixel or vector (geometric) data Bitmap-based Created by scanner and digital camera TIF, JPG, BMP Vector-based - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/1.jpg)
A Domain-Specific Modeling Language forScientific Data Composition and Interoperability
Hyun Cho University of Alabama at Birmingham
Jeff Gray University of Alabama
![Page 2: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/2.jpg)
File Formats: Image Files
Organize and store digital images that are composed of either pixel or vector (geometric) data
Bitmap-based Created by scanner and digital camera TIF, JPG, BMP
Vector-based Geometric description + Bitmap Resolution Independent &
Infinitely scalable Font, DRW, CGM
![Page 3: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/3.jpg)
File Formats: Music and Audio Files
Storing audio data that are produced by audio-to-digital converters
Key Parameters Sample Rate, Resolution, Number of channels
Uncompressed formats WAV, AIFF and AU
Lossless compression Formats FLAC, Lossless Windows Media Audio (WMA)
Lossy compression Formats MP3, Lossy Windows Media Audio (WMA)
![Page 4: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/4.jpg)
File Formats: Text Files
File formats that are structured as plain text, representing a sequence of lines
ASCII, TXT
![Page 5: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/5.jpg)
File Formats: Compound File Formats
Used to structure the contents of a document in the file
Contain a number of independent data streams that are organized in a hierarchy Stream: files in a file system Storage: sub-directories in a file system
MS Office, OpenOffice
![Page 6: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/6.jpg)
Characteristics of Generic File Formats
Can handle one or two data types Numeric data or alphanumeric data
May have a limitation of the file size Mostly limited to a maximum file size of 2GB
May increase file I/O time linearly as the file size grows
0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60 70 80 90 100
Ela
pse
Tim
e (s)
File Size (M)
C
Java
An In-Depth Examination of Java I/O Performance and Possible Tuning Strategieshttp://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html
![Page 7: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/7.jpg)
Characteristics of Generic File Formats
Can handle one or two data type Numeric data or alphanumeric data
May have a limitation of the file size Mostly limited to a maximum file size of 2GB
May increase file I/O time linearly as the file size is grew
0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60 70 80 90 100
Ela
pse
Tim
e (s)
File Size (M)
C
Java
An In-Depth Examination of Java I/O Performance and Possible Tuning Strategieshttp://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html
These generic file formats are not appropriate for storing and retrieving scientific data because the files were not designed to maintain high volume of complex scientific data, such as high resolution images, massive numerical data, and graphs.
![Page 8: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/8.jpg)
Scientific Data Format: NetCDF3
Network Common Data Format Machine-independent file format
Support a wide variety of platformsincluding Linux, MacOS, & Windows
Representing multi-dimensional arrayswith ancillary data
Y
X
Time
141 241 341
131 231 331
121 221 321
111 211 311
441
431
421
411
143 243 343
133 233 333
123 223 323
113 213 313
443
433
423
413
Time = 1 Time = n
…
![Page 9: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/9.jpg)
Scientific Data Format: HDF5
Hierarchical Data Format File format for managing any kind of data
Support high volume and/or complex data Platform-independent Flexible, efficient storage
and I/O
![Page 10: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/10.jpg)
Characteristics of the Scientific Data File Formats
Self-Descriptive Contain metadata to inform the contained data type and
their organization Directly Accessible
Can access arbitrary data through APIs Concurrently Accessible
Multiple threads or processes can access data simultaneously
Enable high performance computing and speedier access
Archivable Have their own archiving mechanism to backup and
restore a high volume of data
![Page 11: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/11.jpg)
Challenges in Using the Scientific Data File Formats
Use different representations to organize the file structure Each file format needs its own data visualization and
composition It is difficult to exchange data between two or more
scientific data formats Manage the evolution of APIs
Challenging to verify that APIs are evolved in accordance with the evolution of file specification
Maintain stability of existing applications from API evolution User applications are subject to change of APIs
Limited support for data integration among heterogeneous scientific data formats
![Page 12: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/12.jpg)
Framework for Scientific Data File Management
API Abstraction Layer
CDF API
HDF API
HW API
CDF LibHDF Lib
Device Driver
NetCDF API
NetCDF Lib
...
CDF Data
HDF Data
NetCDF Data
Devices
Metamodels
CDF HDF NetCDF
File Content ManagerContent
ComposerContent Verifier
Content Mapper
Communication model
...Physical
Layer
API Layer
DSML Layer
![Page 13: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/13.jpg)
NEW SLIDES NEEDED HERE TO INTRODUCE DSM!
![Page 14: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/14.jpg)
Model-Driven Engineering (MDE) and Domain-Specific Modeling (DSM)
MDE: specifies and generates software systems based on high-level models
Domain-Specific Modeling (DSM): a paradigm of MDE that uses notations and rules from an application domain
Metamodel: defines a Domain-specific Modeling language (DSML) by specifying the entities and their relationships in an application domain
Model: an instance of the metamodel
Model Transformation: a process that converts one or more models to various levels of software artifacts (e.g., other models, source code)
![Page 15: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/15.jpg)
Unifying the representation of file structure organization
Adapt a DSML to build a tool for visualizing & composing the scientific file format in a unified way
Analyze data model of each scientific file format
Feature Model
Define DSML from Feature Model
Common Data Model
Variable Data Model
Grammar & Syntax
Implement DSML
DSML Tool
![Page 16: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/16.jpg)
Unifying the representation of file structure organization
Feature Model for Scientific File Format Describe some highlights here And here
![Page 17: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/17.jpg)
Unifying the representation of file structure organization
Content Composer DSML Modeling tool for scientific data file Implemented by using GEMS
![Page 18: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/18.jpg)
API Abstraction Layer
Help to protect user applications from the evolution of APIs
NetCDF HDF5
int nc_create ( const char* path, int cmode, int *ncidp)
H5File (const char *name,unsigned int flags)
Abstraction
createFile( const char *path, FileCreationProperty fileCreationProperty)
API Abstraction Layer
HDF API
HDF Lib
NetCDF API
NetCDF Lib
![Page 19: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/19.jpg)
Integrating data among heterogeneous data formats
Content Mapper Define rules how to map data from a scientific data
format to another Content Verifier
Verify the correctness of the file composition Verify the correctness of mapping rule
Metamodels
CDF HDF NetCDF
File Content ManagerContent
ComposerContent Verifier
Content Mapper
Communication model
DSML Layer
![Page 20: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/20.jpg)
Summary
From the prototype of the framework A DSML can help to build a graphical tool to compose and
support interoperability across scientific file structures Adoption of the layered architecture in the framework can
help to maintain the independence of each layer Both the API abstraction layer and the layered
architecture are essential to develop and maintain user applications
Further works Create metamodels that include full specification of each
scientific file Categorizing APIs in accordance to their intended use for
API abstraction layer Develop metamodels for managing API evolution
![Page 21: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/21.jpg)
Thank you!
![Page 22: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability](https://reader035.vdocuments.net/reader035/viewer/2022070404/56813b21550346895da3d7ae/html5/thumbnails/22.jpg)
Example of Scientific Data Format: OPeNDAP
Client-server protocol for scientific data access Targeted oceanographic data management