towards an information model for i2s2 brian matthews, leader, scientific applications group,...
TRANSCRIPT
Towards an information model for I2S2
Brian Matthews, Leader, Scientific Applications Group,E-Science Centre,STFC Rutherford Appleton Laboratory
Facilities Process
Proposal
Approval
Scheduling Experimen
tData storage
Record Publicatio
n
Scientist submits
application for beamtime
Facility committee approves
application
Facility registers, trains, and schedules
scientist’s visit
Scientists visits, facility
run’s experiment
Subsequent publication
registered with facility
Raw data filtered,
cleansed and stored
Data analysis
Tools for processing
made available
Characteristics : - formal application - set processes - central infrastructure - standard tools - hierarchical control - dedicated staff
•user office•instrument scientists•Library and IT support
Requirements
• Secure access to user’s data• Flexible data searching• Scalable architecture• Extensible architecture• Integration with analysis tools• Access to high-performance resources• Linking to other scientific outputs• Data policy aware
Principles
Online Proposal System
User Office System:
User Database
Scheduling
Health and Safety
Proposal Management
Metadata Catalogue
Data Acquisition
System
Storage Management
System
DataAccess Portal
Single Sign On Account Creation and Management
ICAT Software Suite, providing the crucial integration of key functions.
The ICAT software suite
• Catalogues all experiment related information
• Metadata gathered via integration with existing IT systems
– proposal systems– data acquisition
• Provides a well defined API for easy embedding into any applications.
Access data anywhere via the web Annotate and Search for data Share data with colleaguesAccess data via user’s own programs Utilise integrated e-Science resources Link to data from your publications
Component architectur
e
RDBMS
Web Services API
ICAT API
Command Line Tools
Glassfish / JBOSS
JavaC++Fortran
Data Storage/ Delivery System
Single Sign On
User Database System
Proposal System
Proposal System
Publication SystemPublication System
e-Science Servicese-Science Services
Software Repositor
y
Software Repositor
y
ICAT Deployment
Data Portal
TopCat
Towards an Information Model
Methodology
The Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnstonhttp://dublincore.org/documents/singapore-framework/
Functional requirements
A Metadata Model for Facilities Science
A common general format/standard for Scientific Studies and data holdings metadata did not exist
By proposing a Model– A specification for the types of metadata to
capture Scientific Studies– Cataloguing data holdings: provide access for
the Data Owner– Ease citation, sharing collaboration, and
integration– Allow easy Federation of distributed
heterogeneous metadata systems into a homogeneous (virtual) Platform
Therefore – The Common Scientific Metadata Model (CSMD) developed.
A Domain Model
Modelling Scientific Activity
Investigation
Publication KeywordTopic
SampleSample
ParameterDataset
Dataset Parameter
Datafile
Datafile Parameter
InvestigatorReference / Proposal IdPrevious ReferenceFacilityInstrumentTitleAbstractEtc.
Name
Name/Units/Value etcSearchableIs Sample ParameterIs Dataset ParameterIs Datafile ParameterVerified
NameUnitsString ValueNumeric ValueRange TopRange BottomError
Full ReferenceURL
Repository
NameParent Id
Topic Level
User IdRole
NameChemical FormulaSafety Information
NameUnitsString ValueNumeric ValueRange TopRange BottomError
NameSample Id
Description
NameUnitsString ValueNumeric ValueRange TopRange BottomError
NameDescription
VersionLocation
FormatFormat Version
Create TimeModify Time
SizeChecksum
Related DatafileRelated Datafile
Parameter
Authorisation
Source Datafile IdDestination Datafile Id
RelationS/W Apllication
S/W Version
User IdRole e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc.
Element TypeElement Id
Damian FlanneryCore Scientific Metadata Model
Description set profile
Metadata granule
Metadata Granule
Topic
Study Description
Access Conditions
Data Location
Data Description
Keywords providing a index on what the study is about.
Provenance about what the study is, who did it and when.
Conditions of use providing information on who and how the data can be accessed.
Detailed description of the organisation of the data into datasets and files.
Locations providing a navigational aid to where the data on the study can be found.
References into the literature and community providing context about the study.
Related Material
Legal Note
Copyright, patents and conditions of use etc relating to the study and the data in the study
.
ICAT 3.3 Schema – Study (2)
Syntax and metadata formats
ICAT API and XML format
ICAT 3.3 Database Schema
CSMD HistoryModel first pilot developed in 2001!• Now in ICAT 3.3• Serving data from STFC Facilities (ISIS, DLS)• Model proven robust – simple yet expressive
– http://code.google.com/p/icatproject/
I2S2 - Infrastructure for Integration in Structural Sciences
Bridging the gap between raw and derived data
“Lone” researcher scenario• data sharing with colleagues via email• Little or no infrastructure• Little management of raw or derived data
EPSRC National Crystallography Service
• service provision function• operates across institutions • moderate infrastructure
Diamond & ISIS•operates on behalf of multiple institutions •processes for experiments •large infrastructure engineered to manage raw data•derived data taken off site on laptops / removable drives
Interactions between research process
Grant Proposal
Facilities Proposal
FacilitiesExperimen
tData
cleansing
Record Publication
Data analysis
Local experimen
ts
Simulation
Sample Preparatio
n
Literature Review
Publication
Proposal
Approval
Scheduling
Facilities Experimen
t
Data storage
Record Publication
Analysis Tools
CS
MD
Cover the scientist’s research lifecycle as well as the facilities.
Extend to
To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies
By being: - standardised - modular - extensible
Methodology
The Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnstonhttp://dublincore.org/documents/singapore-framework/
Issues
• Metadata model• Framework for developing metadata model• Modularisation mechanisms and extensions• Formats
• Model supporting laboratory tools– How does the model fit ?– Flexibility to handle local processes
• Adhoc, partial, un-ordered
– What needs changing in the model?– What needs changing in tools?
• Data input and maintenance???• Simple ways of inputting the data• Lab books?
Extension areas:
• Secondary analysis data• Preservation data• Publication data• Topic data
• chemistry
• Controlled lists (ontologies) for • Instruments• Facilities,• Methods
• Access control• Safety data• Blogs and notebooks
ISIS - ICAT
Part of ISIS study
Gudrun
Control fileCorrection data Sample data Calibration data
Scattering function data
User inputs
Derived Data
Generalised model
Managing the links between data
Inputs of data sets
Associated with a software item with a set of parameters
Managing this? - lab-books ? - simple tools? - VRE ?