g-cloud programme vision uk - technical architectureworkstrand-report t8

Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2

DATA CENTRE MIGRATION, G-CLOUD AND

APPLICATIONS STORE PROGRAMME

PHASE 2

Technical Architecture Workstrand Report

5th May 2010

Version 1.5

G-Cloud Business Mark Ferrar

Sponsor Director of Technology Strategy

Department of Health Informatics Directorate

Work Strand Lead: Miles Gray

Hardware Platform Architect

Department of Health Informatics Directorate

Industry Co-Lead: Kate Craig-Wood

Managing Director

Memset Ltd


08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 2

Contents

1. Introduction ...................................................................................................... 3

1.1. Related Documents............................................................................................................. 3 1.2. Key Assumptions ................................................................................................................ 3 1.3. Scope .................................................................................................................................. 3 1.4. Objective ............................................................................................................................. 4 1.5. Key stakeholders ................................................................................................................. 4

2. Definitions from a technical perspective .......................................................... 4

2.1. Contextual definitions .......................................................................................................... 4 2.2. Authoritative definitions within the G-Cloud programme ..................................................... 5 2.3. Service layers: Infrastructure, Platform and Application ..................................................... 6 2.4. US National Institute of Standards and Technology (NIST) definitions .............................. 7 2.5. Application Workloads ........................................................................................................ 7

3. Architectural Principals .................................................................................... 8

3.1. G-Cloud Technical Architecture .......................................................................................... 8 3.2. Applications developed for the G-Cloud ............................................................................. 8

4. High Level Logical Architecture ....................................................................... 9

4.1. Context ................................................................................................................................ 9 4.2. Component descriptions ..................................................................................................... 9

4.2.1. Applications Store for Government (ASG) ................................................................. 9 4.2.2. G-Cloud Services Interchange (GC-SI) ...................................................................... 9 4.2.3. Certified Components Repository (CCR) ................................................................. 10 4.2.4. Data services ............................................................................................................ 10 4.2.5. Monitoring Services .................................................................................................. 10 4.2.6. Public Sector Network Service Information Monitor (PSN SIM) ............................... 10

5. Technologies & market review ....................................................................... 14

5.1. Availability and usability of existing Cloud services .......................................................... 14 6. Proposed supplier and service certifications .................................................. 14

6.1. Service monitoring ............................................................................................................ 15 6.2. Infrastructure services certification ................................................................................... 15 6.3. Platform services certification ........................................................................................... 16 6.4. Software developer services certification ......................................................................... 16 6.5. Service integration / aggregation / management certification ........................................... 16 6.6. Information assurance ...................................................................................................... 16 6.7. Supplier Certification and Impact Levels matrix ................................................................ 17

7. Utility Computing ............................................................................................ 17

7.1. Units of Utility Computing (IaaS) resource specification ................................................... 17 7.2. Elasticity and “burstability” of resources ........................................................................... 18

7.2.1. Background ............................................................................................................... 18 7.2.2. Definitions for the purposes of this document .......................................................... 18 7.2.3. Discussion and existing examples ........................................................................... 19 7.2.4. Illustrative example of different capacity types in the G-Cloud ................................ 20

7.3. Utility / IaaS open specifications / standards recommendation ........................................ 21 7.4. Batch vs. Real-time Workloads ......................................................................................... 21

8. Interoperability between suppliers ................................................................. 22

8.1. Workload interoperability and migration............................................................................ 22 8.2. Workload scalability .......................................................................................................... 22 8.3. Data abstraction, sharing and interoperability .................................................................. 22

9. Data Centre Migration .................................................................................... 23

9.1. Data Centre Efficiency ...................................................................................................... 23 9.2. EU Code of Conduct for Data Centre Operations ............................................................. 24 9.3. Consolidation and migration of existing services .............................................................. 24



1. Introduction

This technical architecture provides a common foundation of software and hardware

infrastructure principles for multiple business applications. The technical architecture provides

the framework for interfaces, protocols, standards and products to be used in defining a

platform that supports applications across UK public sector organisations.

This document provides a high level of overview of the proposed technical architecture for

Government Cloud (G-Cloud), Data Centre consolidation (DCC) and the Application Store for

Government (ASG).

1.1. Related Documents

Government ICT Strategy

Open Source, Open Standards and Re-use: Government Action Plan

Work Strand reports from;

Information Assurance.

Commercial.

Service Management.

Greening Government ICT CIO and CTO Workbook

European Union Code of Conduct for Data Centre Operations

1.2. Key Assumptions

Adequate solutions can be architected within the necessary Information Governance

and Security framework.

Suppliers are able to provide interoperable products and services that function within

necessary Information Governance requirements at a cost that is attractive to the public

sector.

Technology exists to meet all public sector Software, Platform and Infrastructure needs

that can be configured as a suitable and acceptable service to the public sector.

Appropriate commercial models can be defined, agreed and set in place within the UK

and EU procurement legislation to allow services to be bought by public sector

organisations.

Cloud interoperability will increase.

1.3. Scope

In Scope:

The hardware and software components required to implement and support the G-

Cloud, Data Centre Consolidation and the Applications Store.

Out of Scope:

Network connectivity between data centres. Responsibility hands over to the Public

Sector Network (PSN) at (and including) the Customer Premises Equipment (CPE)

router(s) that terminate a PSN network connection.

Data centre to user network connectivity (PSN‟s scope).



The desktop strategy and client-side aspects of applications and services (Desktop

Strategy scope).

Detailed technical specifications of individual system elements.

1.4. Objective

The objective of this Technical Strategy is:

To define a technical architecture for Government Cloud, Data Centre consolidation

and the Government Application Store that reflects both current best practice in the

industry, reasonably foreseeable future developments and the UK public sector‟s

unique blend of requirements.

1.5. Key stakeholders

Stakeholder support of the strategy and approach are critical success factors. Our

stakeholders include:

The UK public sector CIO Council.

The UK public sector CTO Council.

The Public Sector Networks (PSN) programme.

All public sector organisations in the UK, including, but not limited to Central

Government Departments, Local Government Authorities, Non-Departmental Public

Bodies (NDPB) and any other organisation within the definition of Contracting Authority

within the UK.

The IT supply-side industry of product and service providers.

The Programme Team.

2. Definitions from a technical perspective

2.1. Contextual definitions

The following definitions are taken from the Government ICT Strategy. The concepts are

explored in more detailed in subsequent sections of this document.

Government Cloud or G Cloud is an internet based ICT infrastructure that enables public

bodies to host, select and use ICT systems from a secure, resilient and cost-effective service

environment.

Data Centre Rationalisation is the reduction in the number of data centres owned or used to

host government application services from current (2009) levels to provisionally 10 or 12 highly

resilient, secure data centres. something very much smaller, including significantly increasing

utilisation of assets within the data centres and reducing environmental impact whilst not

compromising the service or data integrity in the process.

The Application Store for Government ASG is the gateway to sharing and reuse of online

business applications, services and components between public sector organisations.



2.2. Authoritative definitions within the G-Cloud programme

The following definitions have been used in the creation of this document:

Key words for use in RFCs to Indicate Requirement Levels – As defined in RFC2119

(http://tools.ietf.org/html/rfc2119).

The following definitions are for the purposes of this document and phase 2 of the G-Cloud

programme:

Utility Computing is the packaging of Computing Resources, such as computation and

storage, as a metered service similar to a traditional service utility (such as electricity, water,

natural gas, or telephone network).

Public Cloud means Utility Computing that is available to individuals, public and private sector

organisations. Public Cloud is often non-geographically specific and can be accessed wherever

there is an Internet connection.

Private Cloud means a Utility Computing infrastructure exclusively for the use of one

organisation or community.

Hybrid Cloud means a combination of Public and Private Clouds, both remaining separate

entities, but with Workload able to migrate between them.

Computing Resource refers to computer or server infrastructure resources which includes

Processing, Storage and Network, described in more detail in section 7 of this document.

Workload refers to any service or software application which makes use of Computing

Resources.

Burst Computing Resources automatically expand and contract in response to changes in

application workload (see elasticity and burstability section).

Elastic resources must be requested by the user, operator or application (see Elasticity and

Burstability section). “Elastic” differs from burst in that the application or user must request the

additional resources for example via an Application Programmatic Interface (API).

http://tools.ietf.org/html/rfc2119



2.3. Service layers: Infrastructure, Platform and Application

The following diagram illustrates the components of the technology stack that are referred to in the

remainder of this document.

Figure 1: Service layers description

Figure 1 Note:

* Assumed to incorporate subordinate layers.

The following sections are presented under the assumption that there shall exist a set of

services known collectively as the G-Cloud and that they comprise Utility Computing services

available to the UK public sector.

It is assumed that "as a service" means all services within the definition are fully integrated up

to and including the respective level, thus incorporating any sub-levels. Therefore, Software as

a Service (SaaS) providers could either sub-contract to a Platform as a Service (PaaS)

provider, or would incorporate the PaaS themselves and provide it as part of the SaaS "stack".

In turn the Infrastructure as a Service (IaaS) could be sub-contracted or incorporated. The

customer would see an integrated service.

A better name of SaaS might be "Application as a Service", with "Application" plus "Platform"

being synonymous with "Workload", however, these definitions are now recognised and widely

used across the IT industry, so their use shall continue here.



2.4. US National Institute of Standards and Technology (NIST) definitions

The American NIST recently published a definition of Cloud Computing which has rapidly

gained wide acceptance among the consultancy community and is being regarded by some as

the standard for reference. The G-Cloud definitions of Cloud Computing for the most part do

not conflict with this definition, and the IaaS, PaaS and SaaS definitions fit very well.

The NIST definition of Cloud Computing describes five essential characteristics of Cloud, three

service models and four deployment models. This can be diagrammatically represented as a

cube, as shown in figure

2.

Figure 2: NIST Cloud Computing definition (diagrammatic representation)

The NIST definition for Hybrid Cloud is defined as “a composition of two or more clouds

(private, community, or public) that remain unique entities but are bound together by

standardized or proprietary technology that enables data and application portability (e.g., cloud

bursting for load-balancing between clouds)”. The G-Cloud can therefore be considered as

being most closely aligned with the NIST definition for a Hybrid Cloud.

2.5. Application Workloads

A Workload would typically be a software application or service which requires use of the

Computing Resources provided from one or more Certified Infrastructure and/or Platform

Suppliers (see section 6, “Supplier Certifications”). Workloads would be available via the App



Store marketplace, and would normally require a mixture of Compute Resources (Processor,

Storage and Network).

3. Architectural Principals

The technical architecture shall be built using the following principles:

3.1. G-Cloud Technical Architecture

The G-Cloud technical architecture shall:

Not exclude foreseeable future technologies

Assume implementation through a process of gradual, iterative change.

Be provided by suppliers certified for service types to comply with relevant Government

Information Governance architecture and standards.

Allow for both long- and short-term contracts.

Enable rapid provisioning and de-provisioning.

Enable the automatic ability to meet peaks in demand.

Enable service metering and “pay-as-you-go” pricing.

Enable self-service ordering of services.

Allow sharing of existing hardware infrastructure where practical.

Access the benefits of data centre operations automation.

Have an overarching service management framework.

Enable and encourage software sharing through improved awareness.

Reduce total volume of software licensing through consolidation.

Enable better utilisation of hardware resources.

Abstract hardware from software (within compatible systems).

Be suitable for small, large, static or dynamic Workloads simultaneously.

Facilitate improved business value delivery from the underlying technologies.

Be approved to protect information and services with an impact of compromise profile

of Confidentiality Integrity Availability (CIA) matrices (e.g. 224/444).

Allow for complete transparency of pricing, infrastructure and services.

Allow for a wide range of client devices, including mobiles, laptops, kiosks and desktop

personal computers.

Evolve over time as new technologies become available.

3.2. Applications developed for the G-Cloud

New applications developed for the G-Cloud shall:

Re-use pre-existing components where and whenever possible – in line with the

Government Action Plan on Open Source and Open Standards

http://www.cabinetoffice.gov.uk/cio/ict.aspx.

Be scalable to any reasonable combination of workloads that may be predicted across

UK public sector.

Be created by Certified software developers.

Be individually certified to defined information assurance requirements.


4. High Level Logical Architecture

4.1. Context

Figure 3 shows the physical and logical components of the Government Cloud Computing

environment, the Government Application Store and their interaction with each other, the PSN

and client devices. Information Assurance and Service Management are not shown as they are

considered to be ubiquitous.

The intent of figure 3 is to:

To describe and map out the physical and logical technical components of the G-Cloud

system, as opposed to a map of the conceptual elements.

To show how those elements interact with each other and with the Public Sector

Network (PSN).

To show the boundaries of responsibility of the data centre strategy, PSN strategy and

desktop strategy.

Additional notes:

Central services such as authentication and mail relay are considered “just another

hosted application”.

Brand names are used for example purposes only and are not an indication of

preference or an intent to purchase under any procurement.

Information Assurance and Service Management are not shown since they are

ubiquitous, however data flows are shown.

4.2. Component descriptions

4.2.1. Applications Store for Government (ASG)

The Government Application Store is defined in the Government ICT Strategy as a gateway to

sharing and reuse of online business applications, services and components between public

sector organisations. In other words the ASG can be thought of as a portal where

commissioners or purchasers of services can browse a catalogue of available services. The

ASG will also provide detailed information relating to costs, capacity, service levels and lead

times required to get a particular service live. The ASG will use the G-Cloud Services

Interchange, described below, as the data store of available services.

As a key piece of the G-Cloud infrastructure, the ASG must be fully resilient and fault tolerant.

In line with the principles outlined in this document, the ASG must not act as either a bottleneck

or a single point of failure for users wishing to purchase or provision services from the G-Cloud.

4.2.2. G-Cloud Services Interchange (GC-SI)

Certified Suppliers will publish services to the G-Cloud Services Interchange (GC-SI). The ASG

will be the portal to purchase and provision services published on the GC-SI. Provisioning a

service via the ASG will act as the trigger for the monitoring service data flows.



4.2.3. Certified Components Repository (CCR)

Another element tightly linked (via an API, or possibly part of the same system) to the ASG will

be the repository of software, database schemas and virtual machine images certified for use in

the G-Cloud. These components would be made available for selection via the ASG where they

can be deployed onto the G-Cloud. This process could be automated, for example the user or

service integrator would request the desired infrastructure or platform via the ASG , which

would then interface with the CCR in order to supply it with the information necessary, for

example an IP address or an SSH key, to deploy the component onto the newly instantiated

infrastructure or platform.

4.2.4. Data services

Section 8.3 discusses the opportunities provided by abstracting data and applying common

data models.

4.2.5. Monitoring Services

In order to facilitate monitoring, maintenance and fault management of G-Cloud services,

monitoring services will be necessary. It is not proposed that there be one central monitoring

service. Instead, G-Cloud suppliers will be required to have their services monitored by a

means to be decided. For example, a requirement on an IaaS provider might be that they

supply and IP address for their data centres‟ core routers and allow ICMP pings from other

select suppliers and G-Cloud users, and that they also allow for the installation of probe

equipment in their data centres.

4.2.6. Public Sector Network Service Information Monitor (PSN SIM)

The PSN SIM is outside the scope of this document, however, a simplified description maybe

useful here for context. The PSN SIM will be a centralised record of the PSN services,

dependencies and users. Network suppliers will report the status of the portions of the network

which they manage to the SIM to facilitate fault tracking. A Dependency Map will be used by

the SIM to provide alerts and notifications for incidents relating to dependent services.

When provisioning services via the ASG, the Dependency Map, described by the Service

Management work strand, will need to be updated so as to facilitate end to end service

monitoring. Ideally the PSN SIM would be programmatically updated by the GC-SI.

Automatically deployed services could then be monitored via the same system, although such

functionality is not currently present in the SIM specification.


Figure 3: High Level Logical Architecture


5. Technologies & market review

5.1. Availability and usability of existing Cloud services

There are a number of established public cloud services, though the majority at this time

appear to have insufficiently flexible service level agreements and geo-location specificity to be

usable for most G-Cloud purposes.

However, some organisations can offer solely UK-based services and are willing to adapt their

operational practices to suit Government requirement, for example by partitioning off an area of

their existing data centre facilities and restricting utilisation of that hardware by non-government

customers. That example would be an instance of “private cloud” services.

It is anticipated that there will be an overlap where some public cloud services are suitable for

some Government purposes, as illustrated in figure 4:

Figure 4: Public Cloud services suitability for Government consumption

6. Proposed supplier and service certifications

This section should be read in conjunction with the Information Assurance Strategy section on

„Assurance Methodologies for Services‟.

Four types of certification / accreditation are proposed (to be viewed in conjunction with figure

1). Certification processes shall include an audit element (akin to ISO accreditations) and must

place the pass mark sufficiently high to ensure good quality, but not so high as to restrict or

disable access to innovative Small and Medium-size Enterprises (SMEs).



IA assurance methodologies that will cover shared services in a cloud environment have yet to

be defined. The principles covering the IA conditions and standards that will form part of the

certification process will need to be established at the start of the next phase of the programme.

A starting point for this work will be the Public Sector Network (PSN) Codes of Connection

scheme (including the Technical Standards and the IA Conditions), but applied to the other

service layers (infrastructure, platform and application software). This would compel suppliers

to demonstrate their expertise in each of the key areas. The certifications would stipulate not

only best practices, but also requirements such as the ability to interface systems and services

with the centralised management system.

For Information Assurance purposes, developers will be certified to produce services at various

IL-Ts conforming to a combination of best practices and standards. These standards will be

defined during Phase 3 of the programme and will use existing standard, policies and practices

as a starting point. The standards will cover how the organisation produces good quality

services using an information assurance regime commensurate with the applicable Impact and

Threat levels.

A key expectation is that certification of the supplier and the service they provide will allow the

supplier's service(s) to be listed in the Applications Store as "available". The information in the

Applications Store will specify the Impact Level (IL) and Threat (IL-T) combinations for which

the service can be used, mandatory interfaces and components that required for technical

architecture and information assurance compatibility. Completion of the certification process

does not automatically mean that services and applications can be purchased and provisioned

from the ASG. Further technical and information assurance processes may well be required.

There is also an expectation of a rating system to indicate existing service consumers can

indicate their satisfaction with a particular service. This will provide some quality differentiation

above and beyond the minimal requirements set by the certifications.

6.1. Service monitoring

All service suppliers should be required to allow selected 3rd

party monitoring of the services

provided, for example by allowing firewall access for standard monitoring protocols to end-point

servers and core network equipment.

6.2. Infrastructure services certification

The following proposal will need to be confirmed with the relevant risk management roles

during Phase 3:

The provider shall not have logical access to machines as root or administrator (i.e. excludes

systems administration duties), but would have physical access (for hardware maintenance

etc.). Providers would also manage the local network and switching fabric (e.g. service

providers like Amazon EC2 or Rackspace). The provider must conform to security standards

applicable to the IL-T rating (see below) and must also conform to standards relating to

efficiency and best practice, such as the EU Code of Conduct for data centres. It is envisaged

that this certification would work like a "Code of Connection" for IaaS providers to make their

services available through the G-Cloud.



6.3. Platform services certification

The following proposal will need to be confirmed with the relevant risk management roles

during Phase 3:

The provider will have root / administrator access and will perform basic systems

administration, probably in partnership with one or more application provider. Such services

may include backup, service monitoring and help desk.

6.4. Software developer services certification

This certification is for software developers and suppliers providing applications (or software) as

a service or application support. For software the accreditation will mandate standards related

to acceptable development practices.

This will ensure software developers are able to offer applications via the Apps Store for trial

and interest generation (a prototype). These applications are likely to be offered through a

separate development and test community cloud. The use of case studies and pilots in this

area will assist in drawing up criteria that can be used by the developers and the governance

groups for technical architecture and information assurance. Application prototyping in the first

stage will allow for feedback and hence partially enable “crowd sourcing” of the application‟s

development.

Production applications and stand-alone components will then need to be individually

scrutinised and certified before being made available, and shall not to be "altered by

prospective customers in order to generate feedback and buy-in.

6.5. Service integration / aggregation / management certification

There is also a need for accreditation for systems and services integration, management and

aggregation services, which could also be referred to as “Operate as a Service” (OaaS). The

technical elements of such a certification would be covered by the certifications already

described above.

6.6. Information assurance

The IA Strategy section on „Assurance Methodologies for Services‟ covers the proposals for

gaining assurance of services to be used in the G-Cloud environment.

The current proposal is that each certification will cover a range of security levels, following the

IL-T ratings. Examples of mandated practices will include:

Infrastructure: Physical security of data centres, security screening of staff.

Platform: Patching regime (e.g. months vs. days) and virtualisation layer management

(e.g. disallowing unknown kernels)

Software: Logging, peer reviewed code, security releases.

Integration: Staff screening.



6.7. Supplier Certification and Impact Levels matrix

During Phase 3 of the programme an extensive piece of work will be required to model the

Skills, Impact and Threat levels against the IaaS, PaaS, SaaS and OaaS types of service to

identify the standards, policies and procedures that will need to be required.

7. Utility Computing

7.1. Units of Utility Computing (IaaS) resource specification

Certified infrastructure and platform suppliers (computing utilities) will be able to offer

Computing Resource into the G-Cloud marketplace, for use in delivering Workloads.

Computing Resource can be any combination of the following, with each element defined in the

market place in a granular way.

Processing

o Capacity (processor cores, clock speed and RAM)

o Type (e.g. instruction sets, integer vs. floating-point optimisation)

Storage

o Capacity (bytes)

o Redundancy (probability of failure per unit time, e.g. RAID level)

o Access latency (milliseconds)

o Data Input/Output (I/O) rate (i.e. bits per second)

Network

o Bandwidth capacity (bits per second)

o Simultaneous connections capacity

o Redundancy (number of physically separate / divergent connections to site)

All Computing Resources share the common characteristic of required availability. This is

normally expressed as a percentage. For example 99.9% availability equates to 8 hours

45minutes of down time in a year whereas 99.99% availability equates to less than one hour of

down time in a year. Such measures allow buyers to make an informed decision regarding cost

and realistic availability requirements.

While it will be possible to purchase any type of physical or virtualised infrastructure resources,

ensuring standards do not exclude any particular type of hardware platform, it is important to

standardise how resources are advertised in order to be able to treat (compare) them as a

utility. Processing capacity, for example, may need to be expressed in terms of a common

benchmark or baseline comparison. Each of the three Computing Resources could have all

variables defined, but practically, this should be limited to just a few key variables.

In order to allow for like-for-like comparison and better interoperability, as well as making

provisioning simpler for suppliers, it is proposed that the Processing resource be a combination

of instruction set type (e.g. x86 vs. RISC vs. GPU) and a small set of fixed ratios of CPU-to-

RAM, which are also constrained by the existing architectures. For example, Processing

resource might be offered as either standard, high memory or high CPU.



7.2. Elasticity and “burstability” of resources

7.2.1. Background

There are two modes of delivering additional Computing Resources on a short time frame, and

there are also two terms broadly used to refer to them, however at present the terms are poorly

defined. Therefore, for the purposes of this document they shall be given clear meanings.

It is worth noting some of the ways in which the terms elasticity and burstability are applied.

Elasticity, in the literal term, means something that can expand and will automatically contract

to its previous state without intervention. Plastic means something which, if expanded, stays in

its new shape.

Amazon's "Elastic Compute Cloud" is named for marketing purposes. Technically it would be

more accurately described as a "plastic compute utility" since it is a) reconfigurable only in

response to API requests, and b) a single large utility computing service. However, we shall

use “elastic” to refer to how their service operates since the term has become widely utilised in

that sense.

“Burstability” is also a commonly used term, most frequently to refer to the ability of a network

connection‟s bandwidth to spike above the normal levels for a short time. Burstability has also

been applied as a term by the long-standing virtual machine provider community, now

described as utility computing providers, to describe RAM or CPU resources that are available

for utilisation in brief load spikes. In both cases the mechanism is normally being used for peak

load curtailment and to contend resources between users.

7.2.2. Definitions for the purposes of this document

In the following description "burst" means that the resources are automatically made available

without the user or application having to do anything. It is a property of the underlying platform

in response to changes in workload. An example would be CPU utilisation - it is consumed

automatically on demand and reverts to an idle state when not required.

"Elastic" means resources that must be requested by the user, operator or application, for

example by the application requesting additional virtual machine instances via an Application

Programming Interface (API) from an "overflow" pool in response to changes in workload. It is

likely that the majority of existing applications would need modification in order to take

advantage of burst capacity.

All Computing Resources share some or all of the following characteristics:

Dedicated capacity: Resource entirely dedicated to the application workload which is

never re-allocated nor shut down. This is how most capacity within the public sector is

currently provisioned.

Guaranteed burst capacity: Capacity which is available for use at all times, but which

may be automatically re-allocated or shut down when not used without direction from

the operator or hosted application. This capacity would be constrained to the limits of

the hosting physical hardware or network connection, and would normally be restricted

to one physical location.



Non-guaranteed burst capacity: As above, but not guaranteed. Common examples

of this are consumer broadband where a user can consume up to 8Mbps but only when

others are not using the bandwidth and also existing virtual machine suppliers, who

allow CPU and RAM to breach the minimum guaranteed limits.

Guaranteed elastic: Capacity, generally in the form of additional virtual machines,

which can be requested by the operators (e.g. ahead of an expected load-spike) or on-

demand programmatically by the application. Elastic capacity would normally have to

be explicitly decommissioned (contrary to the implication in its name), and could be

regarded simply as a form of very rapid provisioning.

Non-guaranteed elastic: As above, but not guaranteed to be available. An example of

this sort of capacity is Amazon's Elastic Compute Cloud (EC2) platform.

Requirements may be specified as a timeline, with both capacity (quantity and type) and other

characteristics such as uptime requirements and response-time Service Level Agreements

(SLAs) able to change over time. That will enable peak load curtailment, for example HMRC's

peak and the DVLA's peak being at different times of the year, thus reducing overall hardware

requirement.

The size of the time units for dedicated (and other) capacities should not be heavily restricted

(initially some suppliers may only be able to do months), but the minimum time-unit initially will

probably be one hour (like Amazon EC2), though in time government should expect and

request finer granularity of charging, perhaps down to “per second” billing.

7.2.3. Discussion and existing examples

A key differentiator between elasticity and burstability is that burstability has pre-defined limits,

whereas elasticity need not have a limit. However, in practice it is unlikely that government

would rely on non-guaranteed elastic capacity for mission-critical real time Workloads. It is also

unlikely that suppliers will provide guaranteed elastic capacity without charge. However, the

likelihood of non-guaranteed elastic capacity being unavailable may be so low as to not be a

concern to the end-user. Elastic capacity will probably be billed for by the hour, with a

surcharge for guaranteeing its availability.

Non-guaranteed burst capacity would normally be a provided alongside a guaranteed

component, for example a virtual machine on a shared host with a minimum guaranteed (within

the SLA) CPU time share or a an ADSL connection (in which both cases an additional charge is

not levied for the use of the non-guaranteed element). Guaranteed burst capacity may be

chargeable, perhaps in-part, in-advance. For example, it might be purchased like traditional

dedicated capacity, but with a rebate if it is not used.

As with elasticity, it is likely that guaranteed burstability will be more expensive than non-

guaranteed (analogous to a contended ADSL line vs. a dedicated network connection). Non-

burst capacity is essentially the method of utilising the spare resources when the guaranteed

capacity is being underutilised (e.g. at night), and may be useful for processing regular batch

workloads, such as data analysis, cost effectively.



7.2.4. Illustrative example of different capacity types in the G-Cloud

Figure 6 below illustrates an example scenario of how the different types of capacity might be

utilised in the G-Cloud, and also shows the importance of the differentiations:

Figure 6: Elasticity and burstability

At night batch workload is able to burst into spare capacity. During daytime, batch application

workload (BAW) is squeezed back down partially by real-time (human-driven) application

workload #1 (RTAW1), and all the way down to its guaranteed capacity on VMH's it shares with

RTAW2.

Further, in this example RTAW1 the real-time (human-driven) workload has needed to use

more than its capacity in facility1 by elastically spawning new virtual machine instances in

facility 2.

However, only half of RTAW2's elastic capacity in facility 2 is guaranteed, and although there

was plenty of non-guaranteed at night there is little during the day, thus

RTAW2 has no further capacity to consume.



7.3. Utility / IaaS open specifications / standards recommendation

One of the requested outputs from the technical architecture work strand in phase two is a

recommendation of which open standard(s) to utilise in the first instance of the G-Cloud for

accessing IaaS / utility computing resources.

Cloud management systems being investigated include:

Eucalyptus - Open Source Cloud toolkit presented like Amazon EC2.

Haizea - An Open Source VM-based Lease Manager.

OpenNebula - Open Source Cloud management toolkit allowing hybrid model (burst

into Public).

Microsoft System Centre Operations Manager (part of BPOS and BPOS-D).

Novell Intelligent Workload Management.

Open Cloud standards reviewed:

Amazon EC2 and Simple Storage Service (S3) APIs (being utilised by other companies

than Amazon).

Open Grid Forum's (OGF) Open Cloud Computing interface (OCCi).

DMTF's OVMF (sponsored by Cisco and VMware).

At present Amazon's EC2 and S3 API has become something of a de-facto standard, where

considerable development effort is being expended. The EC2 API allows for maintenance of

virtual machine (VM) images and for VMs to be provisioned and de-provisioned, whilst the S3

APIs (one based on REST, the other on SOAP) allow for simple data object storage, retrieval

and management. Other functioning examples are Rackspace‟s Mosso API and Sun‟s Cloud

API.

These APIs are very simple and do not include all the functionality that might be envisaged or

covered in this document (e.g. specifying the full range of VM instance characteristics), but

could still be used as a common starting point. However, the vendor-backed APIs are not

patent unencumbered, thus should probably not be used as a long-term solution.

The recommended long-term solution is the OCCi, however at the time of writing the interface

specification only loosely defines a protocol and does not contain a full API. Having spoken to

the steering group of the OCCi it is clear that their intent is to create a fully open, patent

unencumbered API for general use, which would be ideal for G-Cloud purposes. Further, there

is an opportunity for the G-Cloud programme to influence the direction of development of the

OCCi, which should be exploited.

7.4. Batch vs. Real-time Workloads

With elastic and burst resource capacities it will be possible to mix Workload types within one

pool of servers thus maximising utilisation, which is the most efficient way to use a server from

both a cost and carbon perspective. Even simply turning under-used servers off is inefficient

when the embedded energy (i.e. the carbon cost of manufacture) and money (i.e. the capital

expenditure) are taken into account.



Many applications must respond to real-time, human-generated demand thus need guaranteed

elastic or burst capacity. However, the majority of such applications often need little resource at

night time. In the commercial world this issue is being addressed by "following the moon";

balancing Workloads across multiple time zones. However, may not be an option for some G-

Cloud services given the constrains outlined in the Asset Valuation and Aggregation section of

the Information Assurance Strategy.

The resources can still be entirely used, however, without the necessity for relying on non-

guaranteed burst or elastic capacity for real-time critical applications by instead mixing real time

Workloads with batch processing Workloads within the same compute grid. There are many

batch-type Workloads where the compute task will take a long time (longer than a human

wishes to wait) and where whether the task is completed is an hour or in a couple of days does

not greatly matter to the user, such as large data set analysis.

8. Interoperability between suppliers

8.1. Workload interoperability and migration

In the most flexible vision of the G-Cloud it will be possible to transition Workloads between

Certified Providers. In reality, at this time, this will only be practical for multi-node server-type

Workloads that operate in a stateless manner across a cluster of servers which probably share

common processor architecture. Interoperability will not work with Utility Computing platforms

(PaaS) which take application code directly (e.g. Google Web Apps or Microsoft Azure), unless

the receiving platform is capable of hosting the same software.

In order to allow for Workload distribution across multiple providers and the transition of

Workloads from one infrastructure or platform provider to another, the control of that

application's cluster will probably need to be centrally managed, maybe with a service like the

PSN's SIM acting as the hub. One alternative to centralised cluster management would be

DNS-based randomised load balancing.

Non-real time, manual Workload transitions will be simpler; the process would be similar to

rolling out a new version of an application, with new server nodes being provisioned, tested,

and then scaled up in size and/or scaled out in number at the new compute utility.

8.2. Workload scalability

It is important to note that simply virtualising an application does not automatically make it

scalable, and that most pre-existing applications would need to be extensively enhanced or re-

written in order to take advantage of elasticity. However, if a pre-existing application Workload

is already designed to be hosted on a cluster of server nodes, and if it can be easily migrated

from its existing environment onto a virtualised one, then it should be possible to host that

Workload in an elastic environment, such that its resource utilisation can automatically expand

and contract in response to demand.

8.3. Data abstraction, sharing and interoperability

There are cases where the data should be considered in isolation from the application, and

where it is highly desirable for the data to be abstracted from the application. For example,



record sets containing citizen data are likely to have common components, and it is

recommended that wherever possible common data models (data schemas) are developed and

used. This would have three advantages:

1. Abstraction: Allow data to be divorced from applications, thus preventing software

vendor or SaaS provider lock-in.

2. Interoperability: Allow for data sets to be operated on and utilised by more than one

application, enabling greater innovation and flexibility.

3. Sharing: Facilitate the exposure of existing isolated data sets to the G-Cloud where

they could be aggregated and re-distributed via a data distribution services, as

suggested in figure 3.

For example, some individual police forces have their own data repositories which are not

inherently interoperable. By developing a standard data schema and common data model for

police records, those data sets could be pooled and accessed via a standardised data

distribution service. This would result in greatly enhanced public services by allowing better

sharing of data between police forces, and potentially other government organisations, without

having to create one large central data store.

There are a number of existing examples of common data models being successfully deployed,

resulting in reduced costs and enhanced service delivery:

Telecommunications sector and the Shared Information Model (SID).

NATO and JC3.

Banking system and SWIFT.

Insurance sector and ACCORD.

9. Data Centre Migration

The stated aim of the Data Centre migration activity is to reduce the overall number of data

centres used by the UK public sector to between approximately 10 or 12 secure, resilient

facilities, with a corresponding reduction in cooling and power consumption.

The rationalisation and standardisation of common applications, such as email, provides an

excellent opportunity to increase the utilisation and reduce the amount of Computing Resource

required. Section 7.4 of this document discusses how virtualised environments make the best

use of the underlying hardware resources.

9.1. Data Centre Efficiency

The Power Usage Efficiency (PUE) or Data Centre Infrastructure Efficiency (DCiE) metrics

provide a measure of the efficiency of a facility in terms of the electricity usage of the facility

compared to the electricity usage of the computing resources. They do not measure the

utilisation rates of the computing resources themselves. For example, a facility which has a

high DCiE figure (or low PUE figure) may well be running servers with low utilisation rates.

Modern data centres can achieve low PUE rates, often less than 1.4, by a combination of

efficient power infrastructure design and managing the air flow in the facility. Power equipment

in a data centre often comprises power conditioning equipment, UPS, distribution units, cabling



and backup generators. The selection of this equipment should be based on a combination of

the requirements for resilience and the actual power draw of the computing resources. The

rating of a server‟s power supply is often several times greater than the power draw of the

server, even when highly utilised.

The implementation of hot and cold aisle containment can all but eliminate the need for the

mechanical cooling of the cold air supply. The application of current best practices for the

design of modern data centres, such as those listed in the EU Code of Conduct for Data Centre

Operations will contribute significantly to achieving such efficiencies.

9.2. EU Code of Conduct for Data Centre Operations

The EU Code of Conduct for data centre operations (EU CoC) provides a number of best

practices which can be applied to data centres regardless of whether they are already in use,

undergoing a retrofit process or still being planned. The best practices in the EU CoC are now

listed in the Greening Government ICT CIO and CTO workbook. This paper recommends the

application of those best practices to facilities providing Computing Resources for the G-Cloud.

9.3. Consolidation and migration of existing services

Performing an audit of applications in use across the UK public sector would be an enormous

undertaking. The Service Specification and Business Transition process will identify services

that are suitable for inclusion in the initial G-Cloud.

Before initiating a data centre consolidation exercise it will be necessary to identify a range of

applications and services that are in common usage across the UK public sector. For example

the NHS mail system is a common platform that is in use by hundreds of thousands of users

spanning many NHS trusts. Extending the use of this platform beyond the NHS would enable

other public sector organisations to benefit from the low cost, high security and high availability

the NHS mail system. Other examples of common services and applications may include

human resources, enterprise resource planning or finance applications.

Services in the G-Cloud shall be designed and built in accordance with the architectural

principles outlined in this document so as to facilitate a gradual migration of services. As

organisations migrate existing services into the G-Cloud, the services will scale in a way which

guarantees efficient and high utilisation of the underlying Computing Resources. Furthermore,

The ASG will allow organisations to understand the costs and service levels offered by G-Cloud

services prior to purchasing them. The ease of procurement, fast provisioning and known

service levels will be compelling reasons to migrate existing systems and services into the G-

Cloud.

http://www.connectingforhealth.nhs.uk/systemsandservices/nhsmail/about