someslides
DESCRIPTION
TRANSCRIPT
The e-Science Vision
Enabling New Science through Innovative Integrated Technology Solutions
The Mission
To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base.
To “e-enable” the STFC facilities.
The Vision•An increasingly sophisticated infrastructure supporting innovative exploitation of data from the full range of STFC facilities.
– integrated into National and International activities. •Improved use of computation and data management in areas with little historic engagement but growing needs. •Exploit emerging technologies to further enhance UK capabilities.•Better science...
– accelerate the research process,– improve traceability and reproducibility – meet the challenges posed by increasing data volumes. – improves cost effectiveness and quality– encourage collaboration and knowledge exchange– enable researchers to tackle more of the world’s grand challenges – improve the long-term exploitation of research outputs– bridging facilities and users
Ken Peach
UK e-Infrastructure
LHC
I SI S TS2
HPCx + HECtoR
Users get common access, tools, inf ormation, Nationally supported services, through NGS
I ntegratedinternationally
VRE, VLE, IE
Regional and Campus grids
Community Grids
JET
The UK e-Infrastructure for e-Science
ESRF
The Road to Net-centricity from The Road to Net-centricity from Applications PerspectiveApplications Perspective
• WEB Enabled– A application that requests, and
is given access to, services and/or resources via an HTTP request
– Application may have been created before there was a WEB
– Leverage prior investment to quickly make data or application available
– Can use simple HTML WEB Interface or full WEB service interface
– Limited by the Data / Functions exposed in the original design
• WEB Service–Typically built from the ground up to run
over the WEB
–Uses industry standards to provide means of interoperating between different software applications; runs on a variety of platforms and/or frameworks
–Can be combined in a loosely coupled way in order to achieve complex operations
–Simple services can interact with each other to deliver sophisticated value-added services
–Quality of Service and value added capabilities can be documented as Service Level Agreements (SLAs)
WEB-enabled Make Data Available
Deconstruct & Reconstruct
WEB-Services / Compose
XM
L/H
TT
P
"Reach" "Volume" "Efficient & Flexible" "Agility & Speed"
WS
DL
Qo
S &
SL
As
Dat
a T
ran
sfer
•Non-Web Era– System
typically designed as closed, standalone
– Tightly coupled and engineered interface
– Data transfer via FTP / file transfer
– Data is system application specific
SO
AP
/ U
DD
I \
WS
DL
Wra
pp
ers
Strategy •Expertise in systems, applications and information management •Develop and support the integrated e-infrastructure required by researchers
– Focused around exploiting the full lifecycle for scientific data– Developed through Science led projects– User focused, standards based, acknowledging constraints from National
and International collaborations and Government priorities.•Direct contributions to projects and activities
– e.g. LHC, ISIS, DLS, CLF…– Competitive and technology push
• R&D to inform and support future programmes– Grid infrastructures for the UK and Europe– Information management in a distributed heterogeneous environment– Long term data curation– Advanced analysis and visualisation
•Leveraging investment through provision of services to partner organisations•Engage Nationally and Internationally. •Take expert advice. The e-Science Advisory Board
e-Science Advisory Board
External Dr. Daron Green - BTProf. David Ingram - UCLDr David Williams - CERN Dr Jerzy Graff - BMT Dr. Graham Cameron – EBI Prof. Malcolm Atkinson - NeSCProf. Alex Gray – Cardiff Prof. Andy Lawrence – ROEProf. Carole Goble - ManchesterProf. Paul Jeffreys – Oxford
InternalNeil GeddesJohn GordonProf. Keith JefferyProf. Paul Durham
e-Science in 2001•CCLRC e-Science Centre
– ~ 8 people– 10 Projects covering astronomy,
particle physics and computing– £1M p.a.
e-Science Industry day February 2001
e-Science in 2007
•Over 100 staff in e-Science Centre•£11M income in 2006/07 •Projects in HEP, astronomy, biomedical simulation, environmental science, nano-technology, materials science
•UK Leadership in grid infrastructure
•European leadership in data curation
Some e-science facts and figures
eScience Income 06/07, £10.8m
9%
7%19%
18%
11%
3% 3%
30%
CCLRC
CCLRC Library
PPARC
Other RC
JISC
EU
Other Government
Other
FTE e-Science Staff by year
0
20
40
60
80
100
120
2001/02 2002/03 2003/04 2004/05 2005/06 2006/07 2007/08
113 staff 28 female (8 in Library),23 fixed term
Collaborative tools
Department OverviewSTFC e-Science Centre is:
– using leading edge IT to deliver new science• Management and exploitation of large scale scientific
data.• High-quality scientific computing services• Support for collaborative working• Collaborative R&D
– Sharing expertise - technology transfer– Based on core skills:
Data analysis and Computation
Data storage
Data management
Conclusion• Strong personal belief in opportunities from ICT• Specific opportunities for STFC:
– Exploit experience in grand challenges like LHC and IPCC– Encourage collaboration across STFC facilities– Build on our unique position to lead developments internationally– Leverage the infrastructure deployed for wider UK benefit– Meet the ICT expectations of modern researchers– Use the above to stimulate innovation and support science
research•Achieving these requires
– Living close to the technology edge– Providing technological expertise and vision– Managing technology push and user pull– Active research expertise
“innovate or die” –anon.
GridPP, LCG and EGEE
CCLRC e-science centre - LHC Tier-1- Regional Operations Centre (UK+I)- Coordinator of National Grid Service- Partner in other grid deployments
Tier-1
Facilities e-Infrastructure
Diamond synchrotron
ISIS neutron and muon facility
Vulcan laser facility
Physical facilities provide data for the information Infrastructure
•Record data•Store data•Search data•Share data
Integrated system for DLS
demonstrated February 2007
ISIS 20 year back catalogue
ISIS available online
Multi-disciplinary environmental science programmes– Molecular studies of pollutants and radiation damage– Data integration resources
CCLRC provides technological support– Data management infrastructure– Grid computing– Data and information standardisation
• CML, CSML
Environmental Science
British Atmospheric Data Centrehttp:/ / badc.nerc.ac.uk
http://ndg.nerc.ac.uk
British Atmospheric Data Centre
British Oceanographic Data Centre
Simulations
Assimilation
NERC Data Grid : Googling for secure data
Bio-Medical Sciences
Data management in post-genomic biology – Integrated Systems Biology Centre– High throughput experiments– Preparations for biomedical use of DLS/ISIS ...
Biomedical simulation and integrated systems biology – Integrative Biology
• Data sharing infrastructure• Data integration and visualisation
Protein Production
CrystallisationData Collection Phasing
Protein Structure
DepositionStructure analysis
TargetSelection
Overview of Protein Crystallography
The Ontogenesis Network
Materials and Nanotechnology
Characterisation of Materials structure and properties– e-Science technology for real time analysis for experiments– Ability to run, manage and integrate the results of hundreds of
distinct calculations– Advanced visualisation for better result analysis– Long lasting archives of scientific results with easy access for
scientists
Acid Sites in Zeolites
- Ability to share results easily when required
International
?Who
Encourage and influence development of infrastructure
Synchrotron and Neutron Data Infrastructure
European DataInfrastructure
Support UK developments, drive standard access Europe wideDevelop position as a good host + develop access for UK researchers
Access to Scientific Data:
Grid Infrastructure: ESFRI/e-IRGE-infrastructure??
Summary of STFC implementation of IB Grid services and applications for Integrative Biology
•A prototype IB grid with server side visualization to handle extremely large datasets (100MB per small experiment) generated on HPCx and other NGS clusters.
• Interfaces to the grid job management and SRB built on CoolGraphics, Meshalyser & Matlab and also a standalone C++ GUI for IB services.
• Control panels of specific application packages deployed on desktop while the functional core executes on NGS for data encoding & decoding
• Results sent to desktop as well as display walls
Summary of STFC implementation of IB Grid services and applications for Integrative Biology
•Implementation of soft tissue cancer models on the grid (parallelisation included), with embedded computational steering
•Implementation of 3D image reconstruction in real time using the visualisation cluster
• MRI & histopathology images of heart data
• in-vivo cancer image data (for statistics on histopathology)
• Arterial stent tomography data from ESRF
Schematic of stent in arteryStent image to geometry
reconstruction
Processed image with tumour cells and blood vessels highlightedResult from edge detection
Screenshot of real-time 3D image reconstruction,
halfway through. STFC visualization cluster is used and image sent to remote
desktop
SKOS Phase 2 (2005-06)
•W3C Semantic Web Best Practices and Deployment (SWBPD) Working Group
– HP, IBM, Boeing, Adobe, Universities of Maryland, Stanford, Manchester, Amsterdam
•Task force to further develop SKOS – Alistair Miles (STFC) lead
Digital Curation research activities
David GiarettaDirector of CASPAR ProjectandAssociate Director UK Digital Curation Centre
Outline
BackgroundOAISCASPAR and DCCFuture research and projectsSummary
Digital Preservation…
Easy to do… …as long as you can provide money forever
Easy to test claims about tools… …as long as you live a long time
OAIS (ISO14721)Open Archival Information System
Reference Model – referenced in just about any serious work on
digital preservation– Development hosted by CCSDS Panel 2
5 year ISO review underway– minor corrections and updates– No major changes
Revised version due early 2008
Chaired by DG
OAIS Functional Entities
SIP = Submission Information PackageAIP = Archival Information PackageDIP = Dissemination Information Package
SIP
DescriptiveInfo.
AIP AIP DIP
Administration
PRODUCER
CONSUMER
queriesresult sets
MANAGEMENT
Ingest Access
DataManagement
ArchivalStorage
DescriptiveInfo.
Preservation Planning
orders
4-6.
5
Administration
PRODUCER
Approved standardsMigration goals
Develop Packaging Designs & Migration Plans
CONSUMER
Develop Preservation Strategies
and Standards
Monitor Technology
Monitor Designated Community
ProposalsRecommendations
Technology alertsExternal data standardsProtoype resultsReports
ReportsRequirement alertsEmerging standards
Product technologies
Surveys
Surveys
Service requirments
AIP/SIP templatesAIP/SIP review
Migration packagesCustomization advice
Inventory reportsPerformance infoConsumer comments
Prototype requests
Preservation requirements
Advice
Issues
Protoype results
Prototype requests
CASPAR Project
http://www.casparpreserves.eu
EU FP6 Integrated Project
Total spend approx. 16MEuro (8.8 MEuro from EU)
Started April 2006, for 42 months
David Giaretta is Co-ordinator
CASPAR AimsProduce tools and techniques to
support digital preservation and make it easier to share the cost– must be relatively easy to use– must have a low “buy-in” in terms of effort required
for adoption– must avoid requiring wholesale change of everyone
else’s systems– must be decentralised and reproducible so that it can
live on after the formal end of the CASPAR project– must be “preservable”– must be open: open source, open standards
Cannot do everything– Working closely with other projects
CASPAR information flow architecture
•Rep
•Info
VirtualisationHow do we capture the Representation Information?
OverviewEnvironmental
driversTechnology
drivers
Revolution
e-Science Centre’s role
Environment Technology
activitiesnow future
e-Science Centre roleEnvironment
– Co-located at STFC with BADC, NEODC– IPCC Data Distribution Centre– NERC DataGrid– Background in environmental science
Technology– Standards (ISO, OGC)– Architecture– Expertise in ‘Grid’ technologies– Information modelling
Activities – current
MOTIIVE (EU FP7, http://www.motiive.net)– ISO 19109: General Feature Model
• cf. object metamodel: feature types, attributes, operations, associations
– ISO 19110: Feature cataloguing– Feature Catalogue ≡ ‘semantics repository’
• powerful operational component in SDI• inheritance: semantic re-use• behaviour: service binding
– Developing candidate implementation• ebRIM 19110 mapping
Activities – current
INSPIRE (http://www.ec-gis.org/inspire)– selected by EC to co-develop statutory Implementing Rules on
data specifications• D2.3: Scoping of themes• D2.5: Generic Conceptual Model• D2.6: Draft Methodology• D2.7: Encoding
– ocean/atmosphere/met themes• CSML leading candidate
– liaising with DEFRA on UK transposition/implementation
Activities – current
Standards– ISO
• member of BSI IST/36• ISO 19111-2: Parametric coordinates• represent NERC interests (SLA)
– OGC• ‘Observations and Measurements’ model• GML• KML• OGC documents: 06-160r1, 07-112, 07-083
CCLRC Data Portal
Local data
Local metadata
Facility N
Wrapper
Local data
Local metadata
DLS
Local data
Local metadata
JAERI
Local data
Local metadata
ISIS
Wrapper Wrapper Wrapper
Facility Section
Core Data Portal Section
At the time CCLRC had:
1 World Data Centre
5 National Data Centre
10 Minor Community based Data Centre
The Portal would enable them to all be accessible
CCLRC Data Portal
CCLRC (now Core) Scientific Metadata Model
Metadata Object
Topic
Study Description
Access Conditions
Data Location
Data Description
Related Material
Keywords providing a index on what the study is about.
Provenance about what the study is, who did it and when.
Conditions of use providing information on who and how the data can be accessed.
Detailed description of the organisation of the data into datasets and files.
Locations providing a navigational to where the data on the study can be found.References into the literature and community providing context about the study.
Today used by other e-Science Projects (e.g.
MyGrid), Facilities (e.g. ISIS, DLS, CLF, Lab-in-
a-Cell) and Internationally (e.g. SNS, CLS,
Australia)
Storage Resource Broker Virtualising the Users Data
First SRB installation outside SDSC,
Distribution Version and Installation
Guidelines, Making SRB ‘Grid
aware’ through Grid Security,Licensing
ISIS 20 Year Back Catalogue
The catalogue holds 93000 Studies and 1.87 million Data files, with 870 000 Distinct keywords categorising the data.
What we aim to provide with the e-Infrastructure
Enabling users to get rapid access to their current and past data, related experiments, publications etc., leading to improved analysis through more complete information.
Creating a powerful, long lasting scientific knowledge resource.
Protecting our valuable assets - Data Curation
2 PhD and 1 MSc studentships with the Universities of Reading and Manchester on:
Long Term Metadata Management and Quality Assurance – Arif Shaon
The Usage of semantic technologies for longterm preservation – Kaixuan Wang
Future work
Dr. Robert McGreevy, ISIS
Integrating data from disparate sources into topic centres – Challenges: Data Presentation and Integration, Trust,
Encouraging usage of data from unfamiliar sources.