g-cloud programme vision uk - technical architectureworkstrand-report t8
DESCRIPTION
G-Cloud Program vision UK - technical architectureworkstrand-report t8TRANSCRIPT
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
DATA CENTRE MIGRATION, G-CLOUD AND
APPLICATIONS STORE PROGRAMME
PHASE 2
Technical Architecture Workstrand Report
5th May 2010
Version 1.5
G-Cloud Business Mark Ferrar
Sponsor Director of Technology Strategy
Department of Health Informatics Directorate
Work Strand Lead: Miles Gray
Hardware Platform Architect
Department of Health Informatics Directorate
Industry Co-Lead: Kate Craig-Wood
Managing Director
Memset Ltd
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 2
Contents
1. Introduction ...................................................................................................... 3
1.1. Related Documents............................................................................................................. 3 1.2. Key Assumptions ................................................................................................................ 3 1.3. Scope .................................................................................................................................. 3 1.4. Objective ............................................................................................................................. 4 1.5. Key stakeholders ................................................................................................................. 4
2. Definitions from a technical perspective .......................................................... 4
2.1. Contextual definitions .......................................................................................................... 4 2.2. Authoritative definitions within the G-Cloud programme ..................................................... 5 2.3. Service layers: Infrastructure, Platform and Application ..................................................... 6 2.4. US National Institute of Standards and Technology (NIST) definitions .............................. 7 2.5. Application Workloads ........................................................................................................ 7
3. Architectural Principals .................................................................................... 8
3.1. G-Cloud Technical Architecture .......................................................................................... 8 3.2. Applications developed for the G-Cloud ............................................................................. 8
4. High Level Logical Architecture ....................................................................... 9
4.1. Context ................................................................................................................................ 9 4.2. Component descriptions ..................................................................................................... 9
4.2.1. Applications Store for Government (ASG) ................................................................. 9 4.2.2. G-Cloud Services Interchange (GC-SI) ...................................................................... 9 4.2.3. Certified Components Repository (CCR) ................................................................. 10 4.2.4. Data services ............................................................................................................ 10 4.2.5. Monitoring Services .................................................................................................. 10 4.2.6. Public Sector Network Service Information Monitor (PSN SIM) ............................... 10
5. Technologies & market review ....................................................................... 14
5.1. Availability and usability of existing Cloud services .......................................................... 14 6. Proposed supplier and service certifications .................................................. 14
6.1. Service monitoring ............................................................................................................ 15 6.2. Infrastructure services certification ................................................................................... 15 6.3. Platform services certification ........................................................................................... 16 6.4. Software developer services certification ......................................................................... 16 6.5. Service integration / aggregation / management certification ........................................... 16 6.6. Information assurance ...................................................................................................... 16 6.7. Supplier Certification and Impact Levels matrix ................................................................ 17
7. Utility Computing ............................................................................................ 17
7.1. Units of Utility Computing (IaaS) resource specification ................................................... 17 7.2. Elasticity and “burstability” of resources ........................................................................... 18
7.2.1. Background ............................................................................................................... 18 7.2.2. Definitions for the purposes of this document .......................................................... 18 7.2.3. Discussion and existing examples ........................................................................... 19 7.2.4. Illustrative example of different capacity types in the G-Cloud ................................ 20
7.3. Utility / IaaS open specifications / standards recommendation ........................................ 21 7.4. Batch vs. Real-time Workloads ......................................................................................... 21
8. Interoperability between suppliers ................................................................. 22
8.1. Workload interoperability and migration............................................................................ 22 8.2. Workload scalability .......................................................................................................... 22 8.3. Data abstraction, sharing and interoperability .................................................................. 22
9. Data Centre Migration .................................................................................... 23
9.1. Data Centre Efficiency ...................................................................................................... 23 9.2. EU Code of Conduct for Data Centre Operations ............................................................. 24 9.3. Consolidation and migration of existing services .............................................................. 24
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 3
1. Introduction
This technical architecture provides a common foundation of software and hardware
infrastructure principles for multiple business applications. The technical architecture provides
the framework for interfaces, protocols, standards and products to be used in defining a
platform that supports applications across UK public sector organisations.
This document provides a high level of overview of the proposed technical architecture for
Government Cloud (G-Cloud), Data Centre consolidation (DCC) and the Application Store for
Government (ASG).
1.1. Related Documents
Government ICT Strategy
Open Source, Open Standards and Re-use: Government Action Plan
Work Strand reports from;
Information Assurance.
Commercial.
Service Management.
Greening Government ICT CIO and CTO Workbook
European Union Code of Conduct for Data Centre Operations
1.2. Key Assumptions
Adequate solutions can be architected within the necessary Information Governance
and Security framework.
Suppliers are able to provide interoperable products and services that function within
necessary Information Governance requirements at a cost that is attractive to the public
sector.
Technology exists to meet all public sector Software, Platform and Infrastructure needs
that can be configured as a suitable and acceptable service to the public sector.
Appropriate commercial models can be defined, agreed and set in place within the UK
and EU procurement legislation to allow services to be bought by public sector
organisations.
Cloud interoperability will increase.
1.3. Scope
In Scope:
The hardware and software components required to implement and support the G-
Cloud, Data Centre Consolidation and the Applications Store.
Out of Scope:
Network connectivity between data centres. Responsibility hands over to the Public
Sector Network (PSN) at (and including) the Customer Premises Equipment (CPE)
router(s) that terminate a PSN network connection.
Data centre to user network connectivity (PSN‟s scope).
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 4
The desktop strategy and client-side aspects of applications and services (Desktop
Strategy scope).
Detailed technical specifications of individual system elements.
1.4. Objective
The objective of this Technical Strategy is:
To define a technical architecture for Government Cloud, Data Centre consolidation
and the Government Application Store that reflects both current best practice in the
industry, reasonably foreseeable future developments and the UK public sector‟s
unique blend of requirements.
1.5. Key stakeholders
Stakeholder support of the strategy and approach are critical success factors. Our
stakeholders include:
The UK public sector CIO Council.
The UK public sector CTO Council.
The Public Sector Networks (PSN) programme.
All public sector organisations in the UK, including, but not limited to Central
Government Departments, Local Government Authorities, Non-Departmental Public
Bodies (NDPB) and any other organisation within the definition of Contracting Authority
within the UK.
The IT supply-side industry of product and service providers.
The Programme Team.
2. Definitions from a technical perspective
2.1. Contextual definitions
The following definitions are taken from the Government ICT Strategy. The concepts are
explored in more detailed in subsequent sections of this document.
Government Cloud or G Cloud is an internet based ICT infrastructure that enables public
bodies to host, select and use ICT systems from a secure, resilient and cost-effective service
environment.
Data Centre Rationalisation is the reduction in the number of data centres owned or used to
host government application services from current (2009) levels to provisionally 10 or 12 highly
resilient, secure data centres. something very much smaller, including significantly increasing
utilisation of assets within the data centres and reducing environmental impact whilst not
compromising the service or data integrity in the process.
The Application Store for Government ASG is the gateway to sharing and reuse of online
business applications, services and components between public sector organisations.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 5
2.2. Authoritative definitions within the G-Cloud programme
The following definitions have been used in the creation of this document:
Key words for use in RFCs to Indicate Requirement Levels – As defined in RFC2119
(http://tools.ietf.org/html/rfc2119).
The following definitions are for the purposes of this document and phase 2 of the G-Cloud
programme:
Utility Computing is the packaging of Computing Resources, such as computation and
storage, as a metered service similar to a traditional service utility (such as electricity, water,
natural gas, or telephone network).
Public Cloud means Utility Computing that is available to individuals, public and private sector
organisations. Public Cloud is often non-geographically specific and can be accessed wherever
there is an Internet connection.
Private Cloud means a Utility Computing infrastructure exclusively for the use of one
organisation or community.
Hybrid Cloud means a combination of Public and Private Clouds, both remaining separate
entities, but with Workload able to migrate between them.
Computing Resource refers to computer or server infrastructure resources which includes
Processing, Storage and Network, described in more detail in section 7 of this document.
Workload refers to any service or software application which makes use of Computing
Resources.
Burst Computing Resources automatically expand and contract in response to changes in
application workload (see elasticity and burstability section).
Elastic resources must be requested by the user, operator or application (see Elasticity and
Burstability section). “Elastic” differs from burst in that the application or user must request the
additional resources for example via an Application Programmatic Interface (API).
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 6
2.3. Service layers: Infrastructure, Platform and Application
The following diagram illustrates the components of the technology stack that are referred to in the
remainder of this document.
Figure 1: Service layers description
Figure 1 Note:
* Assumed to incorporate subordinate layers.
The following sections are presented under the assumption that there shall exist a set of
services known collectively as the G-Cloud and that they comprise Utility Computing services
available to the UK public sector.
It is assumed that "as a service" means all services within the definition are fully integrated up
to and including the respective level, thus incorporating any sub-levels. Therefore, Software as
a Service (SaaS) providers could either sub-contract to a Platform as a Service (PaaS)
provider, or would incorporate the PaaS themselves and provide it as part of the SaaS "stack".
In turn the Infrastructure as a Service (IaaS) could be sub-contracted or incorporated. The
customer would see an integrated service.
A better name of SaaS might be "Application as a Service", with "Application" plus "Platform"
being synonymous with "Workload", however, these definitions are now recognised and widely
used across the IT industry, so their use shall continue here.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 7
2.4. US National Institute of Standards and Technology (NIST) definitions
The American NIST recently published a definition of Cloud Computing which has rapidly
gained wide acceptance among the consultancy community and is being regarded by some as
the standard for reference. The G-Cloud definitions of Cloud Computing for the most part do
not conflict with this definition, and the IaaS, PaaS and SaaS definitions fit very well.
The NIST definition of Cloud Computing describes five essential characteristics of Cloud, three
service models and four deployment models. This can be diagrammatically represented as a
cube, as shown in figure
2.
Figure 2: NIST Cloud Computing definition (diagrammatic representation)
The NIST definition for Hybrid Cloud is defined as “a composition of two or more clouds
(private, community, or public) that remain unique entities but are bound together by
standardized or proprietary technology that enables data and application portability (e.g., cloud
bursting for load-balancing between clouds)”. The G-Cloud can therefore be considered as
being most closely aligned with the NIST definition for a Hybrid Cloud.
2.5. Application Workloads
A Workload would typically be a software application or service which requires use of the
Computing Resources provided from one or more Certified Infrastructure and/or Platform
Suppliers (see section 6, “Supplier Certifications”). Workloads would be available via the App
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 8
Store marketplace, and would normally require a mixture of Compute Resources (Processor,
Storage and Network).
3. Architectural Principals
The technical architecture shall be built using the following principles:
3.1. G-Cloud Technical Architecture
The G-Cloud technical architecture shall:
Not exclude foreseeable future technologies
Assume implementation through a process of gradual, iterative change.
Be provided by suppliers certified for service types to comply with relevant Government
Information Governance architecture and standards.
Allow for both long- and short-term contracts.
Enable rapid provisioning and de-provisioning.
Enable the automatic ability to meet peaks in demand.
Enable service metering and “pay-as-you-go” pricing.
Enable self-service ordering of services.
Allow sharing of existing hardware infrastructure where practical.
Access the benefits of data centre operations automation.
Have an overarching service management framework.
Enable and encourage software sharing through improved awareness.
Reduce total volume of software licensing through consolidation.
Enable better utilisation of hardware resources.
Abstract hardware from software (within compatible systems).
Be suitable for small, large, static or dynamic Workloads simultaneously.
Facilitate improved business value delivery from the underlying technologies.
Be approved to protect information and services with an impact of compromise profile
of Confidentiality Integrity Availability (CIA) matrices (e.g. 224/444).
Allow for complete transparency of pricing, infrastructure and services.
Allow for a wide range of client devices, including mobiles, laptops, kiosks and desktop
personal computers.
Evolve over time as new technologies become available.
3.2. Applications developed for the G-Cloud
New applications developed for the G-Cloud shall:
Re-use pre-existing components where and whenever possible – in line with the
Government Action Plan on Open Source and Open Standards
http://www.cabinetoffice.gov.uk/cio/ict.aspx.
Be scalable to any reasonable combination of workloads that may be predicted across
UK public sector.
Be created by Certified software developers.
Be individually certified to defined information assurance requirements.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
4. High Level Logical Architecture
4.1. Context
Figure 3 shows the physical and logical components of the Government Cloud Computing
environment, the Government Application Store and their interaction with each other, the PSN
and client devices. Information Assurance and Service Management are not shown as they are
considered to be ubiquitous.
The intent of figure 3 is to:
To describe and map out the physical and logical technical components of the G-Cloud
system, as opposed to a map of the conceptual elements.
To show how those elements interact with each other and with the Public Sector
Network (PSN).
To show the boundaries of responsibility of the data centre strategy, PSN strategy and
desktop strategy.
Additional notes:
Central services such as authentication and mail relay are considered “just another
hosted application”.
Brand names are used for example purposes only and are not an indication of
preference or an intent to purchase under any procurement.
Information Assurance and Service Management are not shown since they are
ubiquitous, however data flows are shown.
4.2. Component descriptions
4.2.1. Applications Store for Government (ASG)
The Government Application Store is defined in the Government ICT Strategy as a gateway to
sharing and reuse of online business applications, services and components between public
sector organisations. In other words the ASG can be thought of as a portal where
commissioners or purchasers of services can browse a catalogue of available services. The
ASG will also provide detailed information relating to costs, capacity, service levels and lead
times required to get a particular service live. The ASG will use the G-Cloud Services
Interchange, described below, as the data store of available services.
As a key piece of the G-Cloud infrastructure, the ASG must be fully resilient and fault tolerant.
In line with the principles outlined in this document, the ASG must not act as either a bottleneck
or a single point of failure for users wishing to purchase or provision services from the G-Cloud.
4.2.2. G-Cloud Services Interchange (GC-SI)
Certified Suppliers will publish services to the G-Cloud Services Interchange (GC-SI). The ASG
will be the portal to purchase and provision services published on the GC-SI. Provisioning a
service via the ASG will act as the trigger for the monitoring service data flows.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 10
4.2.3. Certified Components Repository (CCR)
Another element tightly linked (via an API, or possibly part of the same system) to the ASG will
be the repository of software, database schemas and virtual machine images certified for use in
the G-Cloud. These components would be made available for selection via the ASG where they
can be deployed onto the G-Cloud. This process could be automated, for example the user or
service integrator would request the desired infrastructure or platform via the ASG , which
would then interface with the CCR in order to supply it with the information necessary, for
example an IP address or an SSH key, to deploy the component onto the newly instantiated
infrastructure or platform.
4.2.4. Data services
Section 8.3 discusses the opportunities provided by abstracting data and applying common
data models.
4.2.5. Monitoring Services
In order to facilitate monitoring, maintenance and fault management of G-Cloud services,
monitoring services will be necessary. It is not proposed that there be one central monitoring
service. Instead, G-Cloud suppliers will be required to have their services monitored by a
means to be decided. For example, a requirement on an IaaS provider might be that they
supply and IP address for their data centres‟ core routers and allow ICMP pings from other
select suppliers and G-Cloud users, and that they also allow for the installation of probe
equipment in their data centres.
4.2.6. Public Sector Network Service Information Monitor (PSN SIM)
The PSN SIM is outside the scope of this document, however, a simplified description maybe
useful here for context. The PSN SIM will be a centralised record of the PSN services,
dependencies and users. Network suppliers will report the status of the portions of the network
which they manage to the SIM to facilitate fault tracking. A Dependency Map will be used by
the SIM to provide alerts and notifications for incidents relating to dependent services.
When provisioning services via the ASG, the Dependency Map, described by the Service
Management work strand, will need to be updated so as to facilitate end to end service
monitoring. Ideally the PSN SIM would be programmatically updated by the GC-SI.
Automatically deployed services could then be monitored via the same system, although such
functionality is not currently present in the SIM specification.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 11
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
Figure 3: High Level Logical Architecture
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 13
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
5. Technologies & market review
5.1. Availability and usability of existing Cloud services
There are a number of established public cloud services, though the majority at this time
appear to have insufficiently flexible service level agreements and geo-location specificity to be
usable for most G-Cloud purposes.
However, some organisations can offer solely UK-based services and are willing to adapt their
operational practices to suit Government requirement, for example by partitioning off an area of
their existing data centre facilities and restricting utilisation of that hardware by non-government
customers. That example would be an instance of “private cloud” services.
It is anticipated that there will be an overlap where some public cloud services are suitable for
some Government purposes, as illustrated in figure 4:
Figure 4: Public Cloud services suitability for Government consumption
6. Proposed supplier and service certifications
This section should be read in conjunction with the Information Assurance Strategy section on
„Assurance Methodologies for Services‟.
Four types of certification / accreditation are proposed (to be viewed in conjunction with figure
1). Certification processes shall include an audit element (akin to ISO accreditations) and must
place the pass mark sufficiently high to ensure good quality, but not so high as to restrict or
disable access to innovative Small and Medium-size Enterprises (SMEs).
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 15
IA assurance methodologies that will cover shared services in a cloud environment have yet to
be defined. The principles covering the IA conditions and standards that will form part of the
certification process will need to be established at the start of the next phase of the programme.
A starting point for this work will be the Public Sector Network (PSN) Codes of Connection
scheme (including the Technical Standards and the IA Conditions), but applied to the other
service layers (infrastructure, platform and application software). This would compel suppliers
to demonstrate their expertise in each of the key areas. The certifications would stipulate not
only best practices, but also requirements such as the ability to interface systems and services
with the centralised management system.
For Information Assurance purposes, developers will be certified to produce services at various
IL-Ts conforming to a combination of best practices and standards. These standards will be
defined during Phase 3 of the programme and will use existing standard, policies and practices
as a starting point. The standards will cover how the organisation produces good quality
services using an information assurance regime commensurate with the applicable Impact and
Threat levels.
A key expectation is that certification of the supplier and the service they provide will allow the
supplier's service(s) to be listed in the Applications Store as "available". The information in the
Applications Store will specify the Impact Level (IL) and Threat (IL-T) combinations for which
the service can be used, mandatory interfaces and components that required for technical
architecture and information assurance compatibility. Completion of the certification process
does not automatically mean that services and applications can be purchased and provisioned
from the ASG. Further technical and information assurance processes may well be required.
There is also an expectation of a rating system to indicate existing service consumers can
indicate their satisfaction with a particular service. This will provide some quality differentiation
above and beyond the minimal requirements set by the certifications.
6.1. Service monitoring
All service suppliers should be required to allow selected 3rd
party monitoring of the services
provided, for example by allowing firewall access for standard monitoring protocols to end-point
servers and core network equipment.
6.2. Infrastructure services certification
The following proposal will need to be confirmed with the relevant risk management roles
during Phase 3:
The provider shall not have logical access to machines as root or administrator (i.e. excludes
systems administration duties), but would have physical access (for hardware maintenance
etc.). Providers would also manage the local network and switching fabric (e.g. service
providers like Amazon EC2 or Rackspace). The provider must conform to security standards
applicable to the IL-T rating (see below) and must also conform to standards relating to
efficiency and best practice, such as the EU Code of Conduct for data centres. It is envisaged
that this certification would work like a "Code of Connection" for IaaS providers to make their
services available through the G-Cloud.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 16
6.3. Platform services certification
The following proposal will need to be confirmed with the relevant risk management roles
during Phase 3:
The provider will have root / administrator access and will perform basic systems
administration, probably in partnership with one or more application provider. Such services
may include backup, service monitoring and help desk.
6.4. Software developer services certification
This certification is for software developers and suppliers providing applications (or software) as
a service or application support. For software the accreditation will mandate standards related
to acceptable development practices.
This will ensure software developers are able to offer applications via the Apps Store for trial
and interest generation (a prototype). These applications are likely to be offered through a
separate development and test community cloud. The use of case studies and pilots in this
area will assist in drawing up criteria that can be used by the developers and the governance
groups for technical architecture and information assurance. Application prototyping in the first
stage will allow for feedback and hence partially enable “crowd sourcing” of the application‟s
development.
Production applications and stand-alone components will then need to be individually
scrutinised and certified before being made available, and shall not to be "altered by
prospective customers in order to generate feedback and buy-in.
6.5. Service integration / aggregation / management certification
There is also a need for accreditation for systems and services integration, management and
aggregation services, which could also be referred to as “Operate as a Service” (OaaS). The
technical elements of such a certification would be covered by the certifications already
described above.
6.6. Information assurance
The IA Strategy section on „Assurance Methodologies for Services‟ covers the proposals for
gaining assurance of services to be used in the G-Cloud environment.
The current proposal is that each certification will cover a range of security levels, following the
IL-T ratings. Examples of mandated practices will include:
Infrastructure: Physical security of data centres, security screening of staff.
Platform: Patching regime (e.g. months vs. days) and virtualisation layer management
(e.g. disallowing unknown kernels)
Software: Logging, peer reviewed code, security releases.
Integration: Staff screening.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 17
6.7. Supplier Certification and Impact Levels matrix
During Phase 3 of the programme an extensive piece of work will be required to model the
Skills, Impact and Threat levels against the IaaS, PaaS, SaaS and OaaS types of service to
identify the standards, policies and procedures that will need to be required.
7. Utility Computing
7.1. Units of Utility Computing (IaaS) resource specification
Certified infrastructure and platform suppliers (computing utilities) will be able to offer
Computing Resource into the G-Cloud marketplace, for use in delivering Workloads.
Computing Resource can be any combination of the following, with each element defined in the
market place in a granular way.
Processing
o Capacity (processor cores, clock speed and RAM)
o Type (e.g. instruction sets, integer vs. floating-point optimisation)
Storage
o Capacity (bytes)
o Redundancy (probability of failure per unit time, e.g. RAID level)
o Access latency (milliseconds)
o Data Input/Output (I/O) rate (i.e. bits per second)
Network
o Bandwidth capacity (bits per second)
o Simultaneous connections capacity
o Redundancy (number of physically separate / divergent connections to site)
All Computing Resources share the common characteristic of required availability. This is
normally expressed as a percentage. For example 99.9% availability equates to 8 hours
45minutes of down time in a year whereas 99.99% availability equates to less than one hour of
down time in a year. Such measures allow buyers to make an informed decision regarding cost
and realistic availability requirements.
While it will be possible to purchase any type of physical or virtualised infrastructure resources,
ensuring standards do not exclude any particular type of hardware platform, it is important to
standardise how resources are advertised in order to be able to treat (compare) them as a
utility. Processing capacity, for example, may need to be expressed in terms of a common
benchmark or baseline comparison. Each of the three Computing Resources could have all
variables defined, but practically, this should be limited to just a few key variables.
In order to allow for like-for-like comparison and better interoperability, as well as making
provisioning simpler for suppliers, it is proposed that the Processing resource be a combination
of instruction set type (e.g. x86 vs. RISC vs. GPU) and a small set of fixed ratios of CPU-to-
RAM, which are also constrained by the existing architectures. For example, Processing
resource might be offered as either standard, high memory or high CPU.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 18
7.2. Elasticity and “burstability” of resources
7.2.1. Background
There are two modes of delivering additional Computing Resources on a short time frame, and
there are also two terms broadly used to refer to them, however at present the terms are poorly
defined. Therefore, for the purposes of this document they shall be given clear meanings.
It is worth noting some of the ways in which the terms elasticity and burstability are applied.
Elasticity, in the literal term, means something that can expand and will automatically contract
to its previous state without intervention. Plastic means something which, if expanded, stays in
its new shape.
Amazon's "Elastic Compute Cloud" is named for marketing purposes. Technically it would be
more accurately described as a "plastic compute utility" since it is a) reconfigurable only in
response to API requests, and b) a single large utility computing service. However, we shall
use “elastic” to refer to how their service operates since the term has become widely utilised in
that sense.
“Burstability” is also a commonly used term, most frequently to refer to the ability of a network
connection‟s bandwidth to spike above the normal levels for a short time. Burstability has also
been applied as a term by the long-standing virtual machine provider community, now
described as utility computing providers, to describe RAM or CPU resources that are available
for utilisation in brief load spikes. In both cases the mechanism is normally being used for peak
load curtailment and to contend resources between users.
7.2.2. Definitions for the purposes of this document
In the following description "burst" means that the resources are automatically made available
without the user or application having to do anything. It is a property of the underlying platform
in response to changes in workload. An example would be CPU utilisation - it is consumed
automatically on demand and reverts to an idle state when not required.
"Elastic" means resources that must be requested by the user, operator or application, for
example by the application requesting additional virtual machine instances via an Application
Programming Interface (API) from an "overflow" pool in response to changes in workload. It is
likely that the majority of existing applications would need modification in order to take
advantage of burst capacity.
All Computing Resources share some or all of the following characteristics:
Dedicated capacity: Resource entirely dedicated to the application workload which is
never re-allocated nor shut down. This is how most capacity within the public sector is
currently provisioned.
Guaranteed burst capacity: Capacity which is available for use at all times, but which
may be automatically re-allocated or shut down when not used without direction from
the operator or hosted application. This capacity would be constrained to the limits of
the hosting physical hardware or network connection, and would normally be restricted
to one physical location.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 19
Non-guaranteed burst capacity: As above, but not guaranteed. Common examples
of this are consumer broadband where a user can consume up to 8Mbps but only when
others are not using the bandwidth and also existing virtual machine suppliers, who
allow CPU and RAM to breach the minimum guaranteed limits.
Guaranteed elastic: Capacity, generally in the form of additional virtual machines,
which can be requested by the operators (e.g. ahead of an expected load-spike) or on-
demand programmatically by the application. Elastic capacity would normally have to
be explicitly decommissioned (contrary to the implication in its name), and could be
regarded simply as a form of very rapid provisioning.
Non-guaranteed elastic: As above, but not guaranteed to be available. An example of
this sort of capacity is Amazon's Elastic Compute Cloud (EC2) platform.
Requirements may be specified as a timeline, with both capacity (quantity and type) and other
characteristics such as uptime requirements and response-time Service Level Agreements
(SLAs) able to change over time. That will enable peak load curtailment, for example HMRC's
peak and the DVLA's peak being at different times of the year, thus reducing overall hardware
requirement.
The size of the time units for dedicated (and other) capacities should not be heavily restricted
(initially some suppliers may only be able to do months), but the minimum time-unit initially will
probably be one hour (like Amazon EC2), though in time government should expect and
request finer granularity of charging, perhaps down to “per second” billing.
7.2.3. Discussion and existing examples
A key differentiator between elasticity and burstability is that burstability has pre-defined limits,
whereas elasticity need not have a limit. However, in practice it is unlikely that government
would rely on non-guaranteed elastic capacity for mission-critical real time Workloads. It is also
unlikely that suppliers will provide guaranteed elastic capacity without charge. However, the
likelihood of non-guaranteed elastic capacity being unavailable may be so low as to not be a
concern to the end-user. Elastic capacity will probably be billed for by the hour, with a
surcharge for guaranteeing its availability.
Non-guaranteed burst capacity would normally be a provided alongside a guaranteed
component, for example a virtual machine on a shared host with a minimum guaranteed (within
the SLA) CPU time share or a an ADSL connection (in which both cases an additional charge is
not levied for the use of the non-guaranteed element). Guaranteed burst capacity may be
chargeable, perhaps in-part, in-advance. For example, it might be purchased like traditional
dedicated capacity, but with a rebate if it is not used.
As with elasticity, it is likely that guaranteed burstability will be more expensive than non-
guaranteed (analogous to a contended ADSL line vs. a dedicated network connection). Non-
burst capacity is essentially the method of utilising the spare resources when the guaranteed
capacity is being underutilised (e.g. at night), and may be useful for processing regular batch
workloads, such as data analysis, cost effectively.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 20
7.2.4. Illustrative example of different capacity types in the G-Cloud
Figure 6 below illustrates an example scenario of how the different types of capacity might be
utilised in the G-Cloud, and also shows the importance of the differentiations:
Figure 6: Elasticity and burstability
At night batch workload is able to burst into spare capacity. During daytime, batch application
workload (BAW) is squeezed back down partially by real-time (human-driven) application
workload #1 (RTAW1), and all the way down to its guaranteed capacity on VMH's it shares with
RTAW2.
Further, in this example RTAW1 the real-time (human-driven) workload has needed to use
more than its capacity in facility1 by elastically spawning new virtual machine instances in
facility 2.
However, only half of RTAW2's elastic capacity in facility 2 is guaranteed, and although there
was plenty of non-guaranteed at night there is little during the day, thus
RTAW2 has no further capacity to consume.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 21
7.3. Utility / IaaS open specifications / standards recommendation
One of the requested outputs from the technical architecture work strand in phase two is a
recommendation of which open standard(s) to utilise in the first instance of the G-Cloud for
accessing IaaS / utility computing resources.
Cloud management systems being investigated include:
Eucalyptus - Open Source Cloud toolkit presented like Amazon EC2.
Haizea - An Open Source VM-based Lease Manager.
OpenNebula - Open Source Cloud management toolkit allowing hybrid model (burst
into Public).
Microsoft System Centre Operations Manager (part of BPOS and BPOS-D).
Novell Intelligent Workload Management.
Open Cloud standards reviewed:
Amazon EC2 and Simple Storage Service (S3) APIs (being utilised by other companies
than Amazon).
Open Grid Forum's (OGF) Open Cloud Computing interface (OCCi).
DMTF's OVMF (sponsored by Cisco and VMware).
At present Amazon's EC2 and S3 API has become something of a de-facto standard, where
considerable development effort is being expended. The EC2 API allows for maintenance of
virtual machine (VM) images and for VMs to be provisioned and de-provisioned, whilst the S3
APIs (one based on REST, the other on SOAP) allow for simple data object storage, retrieval
and management. Other functioning examples are Rackspace‟s Mosso API and Sun‟s Cloud
API.
These APIs are very simple and do not include all the functionality that might be envisaged or
covered in this document (e.g. specifying the full range of VM instance characteristics), but
could still be used as a common starting point. However, the vendor-backed APIs are not
patent unencumbered, thus should probably not be used as a long-term solution.
The recommended long-term solution is the OCCi, however at the time of writing the interface
specification only loosely defines a protocol and does not contain a full API. Having spoken to
the steering group of the OCCi it is clear that their intent is to create a fully open, patent
unencumbered API for general use, which would be ideal for G-Cloud purposes. Further, there
is an opportunity for the G-Cloud programme to influence the direction of development of the
OCCi, which should be exploited.
7.4. Batch vs. Real-time Workloads
With elastic and burst resource capacities it will be possible to mix Workload types within one
pool of servers thus maximising utilisation, which is the most efficient way to use a server from
both a cost and carbon perspective. Even simply turning under-used servers off is inefficient
when the embedded energy (i.e. the carbon cost of manufacture) and money (i.e. the capital
expenditure) are taken into account.
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 22
Many applications must respond to real-time, human-generated demand thus need guaranteed
elastic or burst capacity. However, the majority of such applications often need little resource at
night time. In the commercial world this issue is being addressed by "following the moon";
balancing Workloads across multiple time zones. However, may not be an option for some G-
Cloud services given the constrains outlined in the Asset Valuation and Aggregation section of
the Information Assurance Strategy.
The resources can still be entirely used, however, without the necessity for relying on non-
guaranteed burst or elastic capacity for real-time critical applications by instead mixing real time
Workloads with batch processing Workloads within the same compute grid. There are many
batch-type Workloads where the compute task will take a long time (longer than a human
wishes to wait) and where whether the task is completed is an hour or in a couple of days does
not greatly matter to the user, such as large data set analysis.
8. Interoperability between suppliers
8.1. Workload interoperability and migration
In the most flexible vision of the G-Cloud it will be possible to transition Workloads between
Certified Providers. In reality, at this time, this will only be practical for multi-node server-type
Workloads that operate in a stateless manner across a cluster of servers which probably share
common processor architecture. Interoperability will not work with Utility Computing platforms
(PaaS) which take application code directly (e.g. Google Web Apps or Microsoft Azure), unless
the receiving platform is capable of hosting the same software.
In order to allow for Workload distribution across multiple providers and the transition of
Workloads from one infrastructure or platform provider to another, the control of that
application's cluster will probably need to be centrally managed, maybe with a service like the
PSN's SIM acting as the hub. One alternative to centralised cluster management would be
DNS-based randomised load balancing.
Non-real time, manual Workload transitions will be simpler; the process would be similar to
rolling out a new version of an application, with new server nodes being provisioned, tested,
and then scaled up in size and/or scaled out in number at the new compute utility.
8.2. Workload scalability
It is important to note that simply virtualising an application does not automatically make it
scalable, and that most pre-existing applications would need to be extensively enhanced or re-
written in order to take advantage of elasticity. However, if a pre-existing application Workload
is already designed to be hosted on a cluster of server nodes, and if it can be easily migrated
from its existing environment onto a virtualised one, then it should be possible to host that
Workload in an elastic environment, such that its resource utilisation can automatically expand
and contract in response to demand.
8.3. Data abstraction, sharing and interoperability
There are cases where the data should be considered in isolation from the application, and
where it is highly desirable for the data to be abstracted from the application. For example,
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 23
record sets containing citizen data are likely to have common components, and it is
recommended that wherever possible common data models (data schemas) are developed and
used. This would have three advantages:
1. Abstraction: Allow data to be divorced from applications, thus preventing software
vendor or SaaS provider lock-in.
2. Interoperability: Allow for data sets to be operated on and utilised by more than one
application, enabling greater innovation and flexibility.
3. Sharing: Facilitate the exposure of existing isolated data sets to the G-Cloud where
they could be aggregated and re-distributed via a data distribution services, as
suggested in figure 3.
For example, some individual police forces have their own data repositories which are not
inherently interoperable. By developing a standard data schema and common data model for
police records, those data sets could be pooled and accessed via a standardised data
distribution service. This would result in greatly enhanced public services by allowing better
sharing of data between police forces, and potentially other government organisations, without
having to create one large central data store.
There are a number of existing examples of common data models being successfully deployed,
resulting in reduced costs and enhanced service delivery:
Telecommunications sector and the Shared Information Model (SID).
NATO and JC3.
Banking system and SWIFT.
Insurance sector and ACCORD.
9. Data Centre Migration
The stated aim of the Data Centre migration activity is to reduce the overall number of data
centres used by the UK public sector to between approximately 10 or 12 secure, resilient
facilities, with a corresponding reduction in cooling and power consumption.
The rationalisation and standardisation of common applications, such as email, provides an
excellent opportunity to increase the utilisation and reduce the amount of Computing Resource
required. Section 7.4 of this document discusses how virtualised environments make the best
use of the underlying hardware resources.
9.1. Data Centre Efficiency
The Power Usage Efficiency (PUE) or Data Centre Infrastructure Efficiency (DCiE) metrics
provide a measure of the efficiency of a facility in terms of the electricity usage of the facility
compared to the electricity usage of the computing resources. They do not measure the
utilisation rates of the computing resources themselves. For example, a facility which has a
high DCiE figure (or low PUE figure) may well be running servers with low utilisation rates.
Modern data centres can achieve low PUE rates, often less than 1.4, by a combination of
efficient power infrastructure design and managing the air flow in the facility. Power equipment
in a data centre often comprises power conditioning equipment, UPS, distribution units, cabling
Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2
08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 24
and backup generators. The selection of this equipment should be based on a combination of
the requirements for resilience and the actual power draw of the computing resources. The
rating of a server‟s power supply is often several times greater than the power draw of the
server, even when highly utilised.
The implementation of hot and cold aisle containment can all but eliminate the need for the
mechanical cooling of the cold air supply. The application of current best practices for the
design of modern data centres, such as those listed in the EU Code of Conduct for Data Centre
Operations will contribute significantly to achieving such efficiencies.
9.2. EU Code of Conduct for Data Centre Operations
The EU Code of Conduct for data centre operations (EU CoC) provides a number of best
practices which can be applied to data centres regardless of whether they are already in use,
undergoing a retrofit process or still being planned. The best practices in the EU CoC are now
listed in the Greening Government ICT CIO and CTO workbook. This paper recommends the
application of those best practices to facilities providing Computing Resources for the G-Cloud.
9.3. Consolidation and migration of existing services
Performing an audit of applications in use across the UK public sector would be an enormous
undertaking. The Service Specification and Business Transition process will identify services
that are suitable for inclusion in the initial G-Cloud.
Before initiating a data centre consolidation exercise it will be necessary to identify a range of
applications and services that are in common usage across the UK public sector. For example
the NHS mail system is a common platform that is in use by hundreds of thousands of users
spanning many NHS trusts. Extending the use of this platform beyond the NHS would enable
other public sector organisations to benefit from the low cost, high security and high availability
the NHS mail system. Other examples of common services and applications may include
human resources, enterprise resource planning or finance applications.
Services in the G-Cloud shall be designed and built in accordance with the architectural
principles outlined in this document so as to facilitate a gradual migration of services. As
organisations migrate existing services into the G-Cloud, the services will scale in a way which
guarantees efficient and high utilisation of the underlying Computing Resources. Furthermore,
The ASG will allow organisations to understand the costs and service levels offered by G-Cloud
services prior to purchasing them. The ease of procurement, fast provisioning and known
service levels will be compelling reasons to migrate existing systems and services into the G-
Cloud.