cems: building a cloud-based infrastructure to support...

1
Key Goals The vision has evolved and developed through collaboration work between the partners and set out in a Requirements Analysis report [1] and a business case set to the UKSA. Three main areas can be identified: 1) The provision of access to large-volume EO and climate model datasets co-located with high performance computing facilities; 2) A flexible infrastructure to support the needs of research projects in the academic community and new business opportunities for commercial companies. 3) Expertise and tools for scientific data quality and integrity giving users confidence and transparency in its data, services and products. Phase 1 Project 5 month project started in December 2011 with an initial consortium including Astrium- GEO, Logica, NCEO Reading and RAL Space. This is tasked with: Purchase and deployment of the physical infrastructure; Design and preliminary development of supporting software system; A research study on Data integrity conducted by VEGA Space; The development of software demonstrators to explore science and business use cases The facility for Climate and Environmental Monitoring from Space CEMS, the facility for Climate and Environmental Monitoring from Space, is a new joint collaboration between academia and industry to bring together their collective expertise to support research into climate change and provide a catalyst for growth in related Earth Observation (EO) technologies and services in the commercial sector. A £3m investment by the UK Space Agency (UKSA) has made possible the development of a joint facility including ISIC, the International Space Innovation Centre and STFC Rutherford Appleton Laboratory at Harwell in the UK. CEMS: Building a Cloud-Based Infrastructure to Support Climate and Environmental Data Services ISIC Building on the Harwell Campus Philip Kershaw [[email protected] ] (1), Mark Curtis (2), Ed Pechorro (3), (1) STFC Rutherford Appleton Laboratory, NCEO / Centre for Environmental Data Archival, Didcot, United Kingdom, (2) Astrium GEO-Information Services Ltd., Farnborough, United Kingdom, (3) Logica Ltd., Leatherhead, United Kingdom References [1] A User Requirements Analysis on a Facility for Climate and Environmental Monitoring from Space (CEMS), proposal to the Technology Strategy Board, Issue v1.0, 17th August 2011, Logica, NCEO, Astrium-GEO, RAL Space [2] B.N. Lawrence†§, V. Bennett†¶, J. Churchill‡, M. Juckes†§, P. Kershaw†¶, P. Oliver‡, M. Pritchard†§¶ and A. Stephens†§, The JASMIN Super-data-cluster, Department of Meteorology, University of Reading, Reading, U.K. †Centre for Environmental Data Archival, STFC Rutherford Appleton Laboratory, Didcot, U.K. ‡E-Science Department, STFC Rutherford Appleton Laboratory, Didcot, U.K. §National Centre for Atmospheric Science ¶National Centre for Earth Observation, Super Computing 2012 (submitted) Use Cases Science applications of CEMS have been identified in the following areas: 1) Operational or near-operational processing of satellite data; 2) Bringing multi-scale, multi-instrument data together for intercomparison and validation; 3) EO system design – observing system simulation experiments (OSSEs). The ESA Climate Change Initiative is an important driver with many research institutes requiring processing capability for the generation of ECV (Essential Climate Variables) products. The benefits of CEMS in all of these use cases is the availability of significant processing capability close to the source data, removing the need to hold local archives at user sites (duplication of storage), and reducing overheads in data transfer. These advantages will become increasingly important into the future, considering the expected o20 increase in data volumes for the Sentinel instruments compared with the equivalent Envisat instruments. Business applications: CEMS aims to provide an environment to stimulate innovation and growth in downstream applications. Examples included: Insurance sector: offshore assets and urban development in floodplains are susceptible to extreme weather events. Accurate scenario modelling is required on which to base premiums. Carbon Trading is another example market. EO data can be exploited for large scale analysis of carbon stock/flux. Layered Architecture The CEMS architecture is organised into layers. At the base, the underlying hardware is abstracted through to a virtualisation layer above. The latter enables consumption of resources by third parties via a cloud. The virtualisation layer also hosts the core system which manages the cloud services alongside the data management and curation functions of a data centre. At the top, an application and service provider layer provides interfaces to external user communities and partner organisations. ATSR Land Surface Temperature plot for the UK [John Remedios and Darren Ghent, University of Leicester]. Generation of LST products is one use case being explored for CEMS JASMIN [2] is closely coupled to the CEMS infrastructure. Where CEMS focuses on the EO community, JASMIN is being deployed on behalf of the NCAS (National Centre for Atmospheric Science) to support the data analysis requirements of the UK and European climate and earth system modelling community. Metrics and Logging Core System Access Control Core Services Storage Offline IaaS/PaaS Management Data and Process Discovery Data Access Data Visualisation Archive and Housekeeping Internal Support Services Access Policy Administration Metadata catalogues Datasets [file system] Onlin e Client Applications – user communities, external service providers Virtualisation Core System Harwell Site Core System Virtualisation Hardware Hardware Commercial [ISIC Electron Building] Academic [STFC R89 Building] 10 GB dedicated link Application and Service Provider Interfaces Virtualisation integration and data sharing between sites CEMS Cloud Central to the development of this infrastructure is the utilisation of cloud- based technology: multi-tenancy and the dynamic provision of resources are key characteristics to exploit in order to support the range of organisations using the facilities and their varied use cases. These characteristics will allow different research groups to set up and tear down portions of the storage, network and compute infrastructure tailored to their needs without the need for the upfront capital for hardware purchase. It is expected that CEMS will support a spectrum of cloud models, from IaaS (Infrastructure as a Service) giving users the ability configure a complete custom environment to a SaaS (Software as a Service) model, exposing a library of useful processing algorithms through WPS. In addition, PaaS (Platform as a Service) based interfaces are envisaged to enable users to experiment, building applications and developing new processing algorithms and providing environments to better interact with the data. VMware vCloud has been selected to provide the software for virtualisation following a study conducted by Logica. Open source alternatives may be investigated in the future to augment this. Data Processing Panasas storage for JASMIN-CEMS, R89 building STFC H 36TB VM Image Store 7 x R610 – 12x3GHz 96GB compute (vCloud) + 2 vCloud Management nodes JASMIN-CEMS Storage 3.5PB JASMIN, 1.1 PB CEMS (1100 Blades) 10 Gbit Low Latency Network (gnodal) JC Edge Router STFC SCARF HPC (2000 CPUs) 10 Gbit Low Latency Network (gnodal) JASMIN JASMIN – CEMS shared 36TB VM Image Store 10 x R610 – 12x3GHz 96GB compute (vCloud) + 2 vCloud Management nodes CEMS Commercial Storage 0.7PB Commercial Academic 10GBit link ISIC - STFC JANET Commercial 36TB VM Image Store 12 x R610 – 12x3GHz 96GB compute + 8 x R610 12x3.5GHz 48GB Electron Building, ISIC R89 Building, RAL STFC Fast storage (connected into low latency network with 103 10-Gbit/s connections) Hardware Infrastructure ATLAS Tape Store (3.5PB) The CEMS infrastructure is deployed between commercial and academic sites on the Harwell campus joined by a high speed fibre link. The Panasas parallel file system was chosen for its resilience, ability to scale and fast performance This is connected to the compute nodes with high speed links to eliminate i/o bottlenecks RAL-A Router 2x10Gbit link 2x10Gbit link iSCSI 2x10Gbit link iSCSI iSCSI ISIC Router Application and Service Provider Interfaces

Upload: others

Post on 23-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CEMS: Building a Cloud-Based Infrastructure to Support ...cedadocs.ceda.ac.uk/...CEMS_Poster_Kershaw_Philip.pdf · CEMS, the facility for Climate and Environmental Monitoring from

Key Goals The vision has evolved and developed through collaboration work between the partners and set out in a Requirements Analysis report[1] and a business case set to the UKSA. Three main areas can be identified: 1) The provision of access to large-volume EO and climate model datasets co-located with high

performance computing facilities; 2) A flexible infrastructure to support the needs of research projects in the academic

community and new business opportunities for commercial companies. 3) Expertise and tools for scientific data quality and integrity giving users confidence and

transparency in its data, services and products. Phase 1 Project 5 month project started in December 2011 with an initial consortium including Astrium-GEO, Logica, NCEO Reading and RAL Space. This is tasked with: • Purchase and deployment of the physical infrastructure; • Design and preliminary development of supporting software system; • A research study on Data integrity conducted by VEGA Space; • The development of software demonstrators to explore science and business use cases

The facility for Climate and Environmental Monitoring from Space

CEMS, the facility for Climate and Environmental Monitoring from Space, is a new joint collaboration between academia and industry to bring together their collective expertise to support research into climate change and provide a catalyst for growth in related Earth Observation (EO) technologies and services in the commercial sector. A £3m investment by the UK Space Agency (UKSA) has made possible the development of a joint facility including ISIC, the International Space Innovation Centre and STFC Rutherford Appleton Laboratory at Harwell in the UK.

CEMS: Building a Cloud-Based Infrastructure to Support Climate and Environmental Data Services

ISIC Building on the Harwell Campus

Philip Kershaw [[email protected]] (1), Mark Curtis (2), Ed Pechorro (3), (1) STFC Rutherford Appleton Laboratory, NCEO / Centre for Environmental Data Archival, Didcot, United Kingdom, (2) Astrium GEO-Information Services Ltd., Farnborough, United Kingdom, (3) Logica Ltd., Leatherhead, United Kingdom

References [1] A User Requirements Analysis on a Facility for Climate and Environmental Monitoring from Space (CEMS), proposal to the Technology Strategy Board, Issue v1.0, 17th August 2011, Logica, NCEO, Astrium-GEO, RAL Space [2] B.N. Lawrence∗†§, V. Bennett†¶, J. Churchill‡, M. Juckes†§, P. Kershaw†¶, P. Oliver‡, M. Pritchard†§¶ and A. Stephens†§, The JASMIN Super-data-cluster, ∗Department of Meteorology, University of Reading, Reading, U.K. †Centre for Environmental Data Archival, STFC Rutherford Appleton Laboratory, Didcot, U.K. ‡E-Science Department, STFC Rutherford Appleton Laboratory, Didcot, U.K. §National Centre for Atmospheric Science ¶National Centre for Earth Observation, Super Computing 2012 (submitted)

Use Cases Science applications of CEMS have been identified in the following areas: 1) Operational or near-operational processing of satellite data; 2) Bringing multi-scale, multi-instrument data together for intercomparison and validation; 3) EO system design – observing system simulation experiments (OSSEs).

The ESA Climate Change Initiative is an important driver with many research institutes requiring processing capability for the generation of ECV (Essential Climate Variables) products. The benefits of CEMS in all of these use cases is the availability of significant processing capability close to the source data, removing the need to hold local archives at user sites (duplication of storage), and reducing overheads in data transfer. These advantages will become increasingly important into the future, considering the expected o20 increase in data volumes for the Sentinel instruments compared with the equivalent Envisat instruments.

Business applications: CEMS aims to provide an environment to stimulate innovation and growth in downstream applications. Examples included: • Insurance sector: offshore assets and urban development in floodplains are susceptible to

extreme weather events. Accurate scenario modelling is required on which to base premiums. • Carbon Trading is another example market. EO data can be exploited for large scale analysis of

carbon stock/flux.

Layered Architecture The CEMS architecture is organised into layers. At the base, the underlying hardware is abstracted through to a virtualisation layer above. The latter enables consumption of resources by third parties via a cloud. The virtualisation layer also hosts the core system which manages the cloud services alongside the data management and curation functions of a data centre. At the top, an application and service provider layer provides interfaces to external user communities and partner organisations.

ATSR Land Surface Temperature plot for the UK [John Remedios and Darren Ghent, University of Leicester]. Generation of LST products is one use case being explored for CEMS

JASMIN[2] is closely coupled to the CEMS infrastructure. Where CEMS focuses on the EO community, JASMIN is being deployed on behalf of the NCAS (National Centre for Atmospheric Science) to support the data analysis requirements of the UK and European climate and earth system modelling community.

Metrics and Logging

Core System

Access Control

Core Services

Storage Offline

IaaS/PaaS Management

Data and Process

Discovery Data Access Data

Visualisation

Archive and Housekeeping

Internal Support Services

Access Policy Administration

Metadata catalogues

Datasets [file system]

Online

H

Client Applications – user communities, external service providers

Virtualisation

Core System

Harwell Site

Core System

Virtualisation

Hardware Hardware

Commercial [ISIC Electron Building] Academic [STFC R89 Building] 10 GB dedicated link

Application and Service Provider Interfaces

Virtualisation integration and data sharing between sites

CEMS Cloud

Central to the development of this infrastructure is the utilisation of cloud-based technology: multi-tenancy and the dynamic provision of resources are key characteristics to exploit in order to support the range of organisations using the facilities and their varied use cases.

These characteristics will allow different research groups to set up and tear down portions of the storage, network and compute infrastructure tailored to their needs without the need for the upfront capital for hardware purchase.

It is expected that CEMS will support a spectrum of cloud models, from IaaS (Infrastructure as a Service) giving users the ability configure a complete custom environment to a SaaS (Software as a Service) model, exposing a library of useful processing algorithms through WPS. In addition, PaaS (Platform as a Service) based interfaces are envisaged to enable users to experiment, building applications and developing new processing algorithms and providing environments to better interact with the data.

VMware vCloud has been selected to provide the software for virtualisation following a study conducted by Logica. Open source alternatives may be investigated in the future to augment this.

Data Processing

Panasas storage for JASMIN-CEMS, R89 building STFC

H

36TB VM Image Store

7 x R610 – 12x3GHz 96GB compute (vCloud) + 2 vCloud Management nodes

JASMIN-CEMS Storage 3.5PB JASMIN, 1.1 PB CEMS (1100 Blades)

10 Gbit Low Latency Network (gnodal)

JC Edge Router

STFC SCARF HPC (2000 CPUs)

10 Gbit Low Latency Network (gnodal)

JASMIN

JASMIN – CEMS shared

36TB VM Image Store 10 x R610 – 12x3GHz

96GB compute (vCloud) + 2 vCloud Management nodes

CEMS Commercial Storage 0.7PB

Commercial Academic

10GBit link ISIC - STFC

JANET Commercial

36TB VM Image Store

12 x R610 – 12x3GHz 96GB compute + 8 x R610 12x3.5GHz 48GB

Electron Building, ISIC R89 Building, RAL STFC

Fast storage (connected into low latency network with 103 10-Gbit/s connections)

Hardware Infrastructure

ATLAS Tape Store (3.5PB)

• The CEMS infrastructure is deployed between commercial and academic sites on the Harwell campus joined by a high speed fibre link.

• The Panasas parallel file system was chosen for its resilience, ability to scale and fast performance

• This is connected to the compute nodes with high speed links to eliminate i/o bottlenecks

RAL-A Router

2x10Gbit link

2x10Gbit link

iSCSI

2x10Gbit link iSCSI iSCSI

ISIC Router

Application and Service Provider Interfaces