Climate model errors, feedbacks and forcings: a comparisonof perturbed physics and multi-model ensembles
Matthew Collins • Ben B. B. Booth •
B. Bhaskaran • Glen R. Harris • James M. Murphy •
David M. H. Sexton • Mark J. Webb
Received: 23 September 2009 / Accepted: 27 March 2010 / Published online: 7 May 2010! Crown Copyright 2010
Abstract Ensembles of climate model simulations arerequired for input into probabilistic assessments of the risk
of future climate change in which uncertainties are quan-
tified. Here we document and compare aspects of climatemodel ensembles from the multi-model archive and from
perturbed physics ensembles generated using the third
version of the Hadley Centre climate model (HadCM3).Model-error characteristics derived from time-averaged
two-dimensional fields of observed climate variables indi-
cate that the perturbed physics approach is capable ofsampling a relatively wide range of different mean climate
states, consistent with simple estimates of observational
uncertainty and comparable to the range of mean statessampled by the multi-model ensemble. The perturbed
physics approach is also capable of sampling a relatively
wide range of climate forcings and climate feedbacks underenhanced levels of greenhouse gases, again comparable
with the multi-model ensemble. By examining correlations
between global time-averaged measures of model error andglobal measures of climate change feedback strengths, we
conclude that there are no simple emergent relationshipsbetween climate model errors and the magnitude of future
global temperature change. Algorithms for quantifying
uncertainty require the use of complex multivariate metricsfor constraining projections.
Keywords Ensembles ! Uncertainty ! Model errors !Climate feedbacks ! Observational constraints
1 Introduction
Quantitative predictions of future climate change on time
scales of decades to centuries are required inform society inits endeavours to both adapt to the consequences of climate
change and to put in place mitigation efforts to control it.
The complexity of interacting processes in the climatesystem means that we must use three-dimensional numer-
ical models that represent all those processes and feedbacks
in order to make predictions that directly feed into decisionmaking. Complex models are required to provide regional
detail, details of changes in extremes and for the assess-
ment of non-linear, rapid or abrupt climate change.Uncertainties or errors1 in numerical models limit the
utility of projections from any individual model. Ensemble
approaches have been applied in other prediction problemsto increase utility by producing estimates of uncertainties
in short-term predictions (e.g. Molteni et al. 2006). By first
measuring the prediction uncertainties, and then tracingthose uncertainties to model biases and errors, we should
be better able to target research to improve models andultimately produce better, less uncertain, climate projec-
tions. In parallel, there is a need to use information from
the current generation of models to inform policy andplanning now, hence there is a need to develop techniques
to extract robust information from models and make
credible projections.A component of any projection system should be an
ensemble of models which sample natural variability,
forcing uncertainty and the uncertainties in the underlying
M. Collins (&) ! B. B. B. Booth ! B. Bhaskaran !G. R. Harris ! J. M. Murphy ! D. M. H. Sexton ! M. J. WebbMet Office Hadley Centre, FitzRoy Road, Exeter EX1 3PU, UKe-mail: [email protected]
1 Here we use the term ‘‘error’’ and ‘‘model error’’ to meandifferences between models and the real world, as is common innumerical weather and climate modelling, rather than, e.g. codingerrors or bugs that might be easily corrected.
123
Clim Dyn (2011) 36:1737–1766
DOI 10.1007/s00382-010-0808-0
physical (and increasingly chemical and biological) pro-
cesses which drive regional and global climate change.Two approaches have been adopted in recent years. The
first we term the ‘‘multi-model ensemble’’, sometimes
called the ensemble-of-opportunity, meaning the collectionof the output from the world’s climate models. Recent
efforts to collect such information (Meehl et al. 2007b)
have produced an unprecedented array of studies that feddirectly into the most recent IPCC assessment. The second
ensemble technique we term the ‘‘perturbed-physicsensemble’’ (e.g. Murphy et al. 2004) whereby a single
model structure is used and perturbations are made to
uncertain physical parameters within that structure,including the potential to switch in and out existing sec-
tions of code in some cases.
One strength of the multi-model approach is in theability to sample a wide range of structural choices which
may impact model errors, climate change feedbacks and
climate forcings; widely different dynamical cores andwidely different techniques for parameterising physical
processes. There is a potentially large ‘‘gene pool’’ of
possible models. Extensive coordination is required toensure that modelling groups produce compatible experi-
ments (the list of which is growing: e.g., Hibbard et al.
2007) and increasingly, as models become more complexincluding earth-systems processes and data assimilation
schemes for example, modelling groups share components,
potentially limiting the gene-pool. Despite great effortsworld-wide, the number of ensemble members produced is,
at most, of the order of tens of members. Knutti et al.
(2010) discuss a wide range of issues relating to multi-model ensembles.
The key strength of the perturbed physics approach is
the ability to produce a large numbers of ensemble mem-bers in a relatively easy way. It is possible to control the
experimentation and systematically explore uncertainties in
processes and feedbacks. For example, it is possible toproduce a set of ensemble experiments where the input
forcing data (e.g. in a twentieth century simulation) is the
same in each experiment, but the parameters which control,say, the climate sensitivity of the model are varied. Thus,
the different sources of uncertainty can be isolated. It is
also possible to explore a wide range of feedback processesin the model by ‘‘de-tuning’’ it, potentially revealing the
impact of previous compensating errors. Such de-tuning
also ameliorates the potential for double-counting whenconstraining models with observations (e.g. Allen et al.
2002); that is the assigning of a relative likelihood to dif-
ferent model versions based on observed data that hasalready been used in their development.
The main motivation for this paper is to document the
design and characteristics of a number of perturbed physicsensembles that have been produced as part of an extensive
programme of research at the Met Office Hadley Centre to
produce regional climate projections (e.g. Murphy et al.2007, 2009) and to contrast aspects of those perturbed
physics ensembles with corresponding multi-model
ensembles. Such basic comparisons are important when weconsider the number of approaches which use either or both
types of ensembles to produce societally relevant infor-
mation about climate change (see Murphy et al. 2009 andthe special edition of the Philosophical Transactions of the
Royal Society A—Collins 2007). In documenting studieswhich produce projections in terms of probability distri-
bution functions (PDFs), it is not always possible to devote
space to basic model diagnostics. This paper is intended toaddress this issue.
While we clearly cannot investigate all possible aspects
of the many stored Tbytes of model output we have accessto in one paper, a number of questions and issues have
driven the analysis herein:
1. What are relative model-error characteristics of the
two approaches? We might naively assume that the
multi-model ensemble contains members with a widerange of different error characteristics, whereas the
perturbed-physics approach produces members with
very similar baseline climates and thus very similarerrors. Is it possible to identify systematic and random
components of model error? What is the relative
partitioning of systematic and random errors betweenthe two types of ensembles? Why, in the multi-model
case, is the ensemble mean so often the ‘‘best model’’?
2. We know that the perturbed physics approach iscapable of producing model variants with a wide range
of different feedbacks strengths under climate change
(e.g. Webb et al. 2006; Sanderson et al. 2008). Are theranges comparable with those found in the CMIP3
models for both equilibrium and transient climate
change? What are the main drivers of uncertainties inglobal climate change feedbacks in the two types of
ensemble?
3. The total uncertainty in global mean change under, e.g.historical forcing and future SRES scenarios is a
combination of uncertainties in feedbacks and uncer-
tainties in radiative forcings. To the extent that thelatter can be estimated, what are the differences
between radiative forcings in the two ensemble
approaches?4. Finally, are there clear relationships between measures
of model error and the magnitudes of climate change
feedbacks?
Question 4 is highly relevant when we use ensembles of
climate model projections to generate predictions of cli-mate change expressed in terms of PDFs which provide a
measure the uncertainty (or credibility) in that prediction.
1738 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
We cannot simply form histograms from, or fit statistical
distributions to, the output from model simulations offuture change. A key stage in forming PDFs is to assign a
relative likelihood to each member of the ensemble by
comparing simulations of past climate and climate changewith observations (e.g. Rougier 2007). If we can clearly
deduce that, for example, a model with a very high climate
sensitivity performs less well than a model with a lowerclimate sensitivity when examining a wide range of
observational tests, then we have less belief in that highersensitivity model. Formally that model should receive a
lower weight when forming a PDF from the ensemble and
for this to be the case, i.e. to be able to distinguish betweendifferent models, there should be some relationship
between the predict and say, climate sensitivity and the
particular metric. This we call an observational constraint.A particular metric, or more generally a particular collec-
tion of metrics, is useful in assessing model fidelity if, and
only if, there is some relationship (perhaps indirect)between that set of metrics and the prediction variable of
interest.
Furthermore, we may seek predictions of joint PDFs ofvariables, e.g. future temperature and precipitation change
in a particular region. A metric optimised to constrain the
PDF of future regional temperature change may not beoptimal in constraining the PDF of future precipitation
change in that region. Likewise, an observational constraint
on climate variables in one region may not provide aconstraint on the variable in another remote region.
Murphy et al. (2007, 2009) outline a particular method
to produce joint PDFs of future climate change usingperturbed physics ensembles and observational constraints.
The perturbed physics ensembles described here, together
with others documented elsewhere, are combined with astatistical emulator of the model parameter space (see e.g.,
Rougier et al. 2009 for an example) and a ‘‘time-scaling’’
technique (Harris et al. 2006) which maps equilibrium totransient responses taking into account any errors that may
arise because of a mismatch between the patterns of tran-
sient and equilibrium. Using these tools it is possible tomimic the behaviour of HadCM3 at any choice of param-
eter values and allow the effective sampling of many more
ensemble members than those described here. The priorpredictive distributions obtained from the emulated
ensemble are then constrained with observations of the
time-averaged fields projected onto a truncated multivari-ate EOF space, and constrained with trends in various
simple surface air temperature indices to produce likeli-
hood-weighted posterior predictive distributions. Murphyet al. (2007, 2009) go further and estimate the impact of
structural uncertainty in a term called the discrepancy
which is estimated from the multi-model ensemble toproduce joint PDFs of future changes.
To now, the principal driver of for such work has been
the quantification of uncertainty and the production ofprobabilistic projections. We might also use the concept of
observational constraints and relative likelihoods of dif-
ferent models to improve models in a more targeted way(see e.g., Jackson et al. 2008). At present we test models
during their development phase using a wide variety of
different metrics and diagnostics, using different observa-tions and different experiments. If we find a model to be
deficient in a particular way (e.g., if surface temperaturesare too warm in summer) we devote resources to improving
that particular aspect of the model. We rely on our previous
experience or belief of which variables are the mostimportant and secondly how well those variables need to be
simulated in order to produce the most accurate predic-
tions. There is a danger in this approach that we mightdevote significant resources to improving a model in an
area which is largely irrelevant for our particular prediction
problem of interest. Alternatively we may neglect a vari-able which is highly influential in the prediction problem.
By systematically relating the errors in the model simula-
tion of present day and historical climate to uncertainties(errors) in our prediction variable of interest, it should be
possible to produce a better priority list for which variables
are most important.The above issues are touched upon in Sect. 5 of the
manuscript, but a more complete analysis, including the
use of observations to produce PDFs will be presented infuture publications and is also part of an ongoing pro-
gramme of research. In the recently released UK Climate
Projections (Murphy et al. 2009) the rather complex sta-tistical technique alluded to above is employed to relate
model errors to future predictions. As we shall see in
Sect. 5, there is no simple metric or diagnostic that pro-vides a clear constraint on predictions of global-mean cli-
mate change. That is to say, there is no single field that a
model needs to simulate perfectly in order for us to havecomplete confidence in a prediction from that model: a fact
that has been known intuitively by modellers for some
time. The list of metrics for testing models is multivariate.It is likely to be incomplete as there are, in general, more
climate variables in a model than are observed. The list is
also likely to contain redundant information in the sensethat there are covariances between errors in different fields
that means not all metrics are independent from each other.
The extraction of useful information about climate changefrom imperfect climate models is likely to be a complex
endeavour, on a par with the complexity of climate models
themselves or the data-assimilation schemes used in initial-value prediction.
Section 2 of the paper describes the ensemble experi-
ments examined, with a particular focus on the perturbedphysics ensemble experiments. Section 3 presents an
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1739
123
analysis and comparison of model errors. In Sect. 4 feed-
backs and radiative forcings are contrasted. Section 5presents a simple analysis of the relationships between
model errors and feedback strengths. Finally Sect. 6 sum-
marises the results of the analysis.
2 Climate model ensembles and variables
2.1 Perturbed physics ensembles
The perturbed physics approach was developed in response
to the call for better quantification of uncertainties in cli-mate projections (see e.g., Chapter 14 of the IPCC Third
Assessment Report—Moore et al. 2001). The approach
involves perturbing the values of uncertain parameterswithin a single model structure, with the choice and range
for the perturbed parameters determined in discussion with
colleagues involved in parameterisation development, orby surveys of the modelling literature. In some cases, dif-
ferent variants of physical schemes may be also be swit-
ched in and out as well as parameters in those alternativeschemes being varied. Any number of experiments that are
routinely performed with single models can then be pro-
duced in ‘‘ensemble mode’’ subject to constraints oncomputer time. A significant amount of perturbed physics
experimentation been done with HadCM3 and variants,
starting with the work of Murphy et al. (2004) and Stain-forth et al. (2005) and continuing with Piani et al. (2005),
Barnett et al. (2006), Webb et al. (2006), Knutti et al.
(2006), Collins et al. (2006), Harris et al. (2006), Collinset al. (2007), Sanderson et al. (2007, 2008) and Rougier
et al. (2009). Other modelling centres are also investigating
the approach using GCMs (e.g. Annan et al. 2005,Niehorster et al. 2006) and more simplified models (e.g.
Schneider von Deimling et al. 2006) with a view to both
understanding the behaviour of their models and to quan-tifying uncertainties in predictions. Sokolov et al. (2009)
use a version of the perturbed physics approach to make a
comprehensive assessment of future global-scale changesampling uncertainties in physical, biogeochemical and
economic factors.
Here we make use of perturbed physics ensemblesproduced using in-house supercomputer resources at the
Met Office Hadley Centre. Analysis of a much larger set of
perturbed experiments performed as part of the climate-prediction.net project are presented in other publications
(e.g. Piani et al. 2005; Knutti et al. 2006; Sanderson and
Piani 2007; Sanderson et al. 2008, Frame 2009). A com-parison between the smaller in-house and larger public-
resource ensembles performed with the mixed-layer
version of HadCM3 is presented in Rougier et al. (2009) inthe context of model emulation (see below).
2.1.1 Considerations in the design of perturbed physicsensembles
Given that one of the key strengths of the perturbed-
physics approach is the ability to control the design of the
ensemble, a design must be produced. However, there are anumber of competing factors that might influence that
ensemble design:
1. To aid understanding of the results, it may be useful
to perturb one model parameter at a time. However,
this limits the potential for interactions betweenuncertainties in different processes, such as clouds
and radiation for example, which we might expect to
be important.2. To reduce the risk of over-confidence in predictions, it
is necessary to produce model versions with a wide-
range of baseline climates and climate change feed-backs. This may mean relaxing a small number of the
usual strict criteria for producing models, such as the
near-balance of the top-of-atmosphere energy fluxesand may reveal errors in model variables that have
been previously compensated for by the adjustment of
a number of different parameters and/or the introduc-tion of different representations of processes.
3. In contrast, given limited and expensive computer
resources, it may be best to attempt to produce modelversions which are somehow ‘‘good’’, perhaps by
trying to predict and minimise a collection of simple
model metrics such the root mean squared errorcharacteristics for time–mean climate fields. At least
we would not want to produce a large number of model
versions that we would consider, by normal standards,to be a complete waste of computer resource. The
potential issue in producing such ‘‘tuned’’ ensembles is
the possibility of double counting model errors whenthe ensemble is weighted to produce PDFs of climate
change. Double counting may lead to over-constrained
predictions and potential for underestimatinguncertainty.
4. To facilitate the building of the best emulator (e.g.
Rougier et al. 2009), a statistical model which relatesmodel parameters to outputs, it may be necessary to
explore a wide range of model parameters and
interactions between parameters in ways which aidthe building of that emulator. Techniques such as
‘‘Latin-Hypercubes’’ (e.g. McKay 1979) may be
employed for example. While this may result in modelversions which may be considered unacceptable when
compared to observational data, they would get down-
weighted in any posterior PDF calculation. Their job isto minimise the amount of extrapolation by the
emulator outside sampled parameter space.
1740 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
5. For more complex versions of the model (e.g. using a
dynamical ocean component rather than a mixed-layer,
q-flux or slab component) fewer ensemble membersare possible because of the extra resources required to
spin-up model versions and run scenario experiments.
No one experimental design is capable of fulfilling allthe above design criteria, yet they have all, at some time,
guided our work on quantifying uncertainty in the presence
of limited computer resources. For this reason we choose toseparate our archive of perturbed physics versions of
HadCM3 into the different sub-ensembles described below.
We call the model HadSM3 when referring to the versionof the model with a simplified mixed-layer, q-flux or slab
ocean and use the letter S to prefix the ensemble name. In
the case of the version coupled to a dynamical ocean,HadCM3, we use the prefix AO.
2.1.2 Description of perturbed physics ensembles
2.1.2.1 S-PPE-S The ensemble described in Murphy
et al. (2004) in which 31 parameters and switches in theatmosphere component of atmosphere-slab version,
HadSM3, are perturbed. Perturbations are made to a single
parameter at a time (as denoted by the suffix S in S-PPE-S),either to the minimum or to the maximum of the range
specified in consultation with modelling experts, or on/offin the case of a switch. This results in 53 different model
versions, including the standard parameter setting as
defined in the standard published version of the model(Gordon et al. 2000; Pope et al. 2000), rather than the
median or best-guess parameter values. In this design, if a
perturbation in one physical scheme has an impact on aprocess or model variable that is also related to another
scheme; there can be no compensation achieved by per-
turbing a related parameter, as might be done in the modeldevelopment process. In that sense, the single-perturbation
approach might be thought of as the simplest form of
model ‘‘de-tuning’’ (Stocker 2004) in that no attempt ismade to a priori maximise the model performance when
compared to observations (it should be stressed that no
systematic tuning of model performance was done to pro-duce the standard parameter settings). The initial purpose
of this ensemble was to provide a simple, understandable
assessment of the parameter uncertainty in HadSM3.Details of all the parameters perturbed are presented in the
appendix to Murphy et al. (2004) and also in Barnett et al.
(2006) and Rougier et al. (2009).
2.1.2.2 S-PPE-M This ensemble also utilises the mixed-
layer ocean version, HadSM3, but in this case simultaneous
‘‘multiple’’ (suffix M) perturbations are made to theparameters, i.e. all 31 parameters and switches for the
S-PPE-S case are perturbed simultaneously. Here, there can
be compensation between perturbations to physical pro-cesses. In the design of the ensemble, an attempt was made
to minimise the average of the root mean squared error of a
number of time-averaged model fields while sampling awide range of surface and atmospheric feedbacks under
climate change. This ‘‘tuned’’ design of the ensemble
was guided by deriving a linear predictor (based on theS-PPE-S ensemble), relating the 31 parameters of HadSM3
to the climate sensitivity and the Murphy et al. (2004)‘‘Climate Prediction Index’’ or CPI. Further details of
experimental design are given in Webb et al. (2006) who
also examine cloud-feedback processes under climatechange in some detail and compare with a multi-model
ensemble. In contrast to the S-PPE-S ensemble, the inter-
active sulphur cycle (Jones et al. 2001) is activated in allensemble members although no changes to sulphate
emissions are employed. The ensemble contains 129
members, which includes a version with the standardparameter settings but with interactive sulphur cycle
activated.
A particular feature of models with mixed-layer oceansis a cooling instability that can appear during the 19CO2
and/or 29CO2 phase (a description of the mechanism for
the instability is presented in the supplementary informa-tion in Stainforth et al. (2005)). This happens in one of the
129 members, leaving 128 members analysed here.
2.1.2.3 S-PPE-E An additional 103 HadSM3 experi-ments are grouped into this ensemble using the same
parameters perturbed in S-PPE-S and S-PPE-M. A small
number of experiments were performed to make initialestimates of the non-linearity of parameter combinations in
Murphy et al. (2004) (see the appendix of that paper) but
the majority of the members were produced to exploreparts of parameter space not covered by the other HadSM3
ensembles for use in the building of an emulator of the
parameter space of the atmosphere component of the model(further details can be found in Rougier et al. (2009),
Murphy et al. (2009)). The generic function of an emulator
is to map the parameters of the model onto variables ofinterest and, as a consequence, there is a requirement to
explore parameter space without recourse to potential
model validity. Thus, in contrast to the ‘‘tuned’’ S-PPE-Mensemble, no attempt is made to minimise root-mean-
squared (RMS) errors for example; the exploration of
parameter space being the main motivation for the largemajority of the members of this ensemble. The 103 are a
subset of a larger ensemble in which 13 parameter com-
binations suffer the cooling instability described above, soare not analysed.
For each member of the mixed-layer model version
ensembles, a calibration phase (from 10 to 25 years
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1741
123
depending on decadal drift and variability) is performed
and the heat convergence from within the mixed-layercomponent is averaged into monthly values and kept fixed
in both the 19 and 29CO2 experiments. Unfortunately, a
coding error was subsequently discovered in the S-PPE-Mexperiments, and also in some members of the S-PPE-E
ensemble, such that the heat convergence field was speci-
fied from only 1 year of the calibration phase, rather thanbeing averaged over many years. This has the potential
impact of introducing noise into the heat convergence field,which may drive the SSTs in the 19CO2 phase away from
the seasonally varying climatology. As we shall see in later
analysis, the impact is on average rather small, in particularwhen one contrasts errors in the SST fields in the perturbed
mixed-layer experiments with those in non-fluxed adjusted
coupled model runs. Repeat experiments in which 10–20average year heat convergence fields are applied to mem-
bers with the largest SST noise show no significant dif-
ferences in global-scale features such as RMS errors fornon-SST related variables nor in the components of the
atmospheric and surface feedbacks at 29CO2. The model
versions are therefore suitable for quantifying uncertaintyand examining feedbacks, etc.
2.1.2.4 AO-PPE-A This ensemble uses the fully coupled
version of HadCM3 but with perturbations only toparameters in the atmosphere component (an updated
version of the ensemble described in Collins et al. 2006).
The standard settings and 16 combinations of parametersettings selected from the S-PPE-M ensemble are used in
order to sample a range of surface and atmosphere feed-
backs under transient climate change. Members areselected based on an approximately uniform sampling of
the climate sensitivity of the larger S-PPE-M ensemble
while ensuring that a wide range of different parametersettings are sampled. The choice was made by examining
the table of sensitivities and parameters in S-PPE-M,
rather than using any numerical algorithms. In addition,the interactive sulphur cycle is activated as it is in the
S-PPE-M ensemble but in contrast, sulphate emissions are
varied in some simulations (see Sect. 2.3). Murphy et al.(2009) also describe an ensemble with perturbations to
parameters within the HadCM3 sulphur-cycle. The results
from this ensemble will be described elsewhere. Fluxadjustments are employed in these coupled model simu-
lations to: (1) prevent model drift that would result from
perturbations to the parameters that lead to top-of-atmo-sphere net flux imbalances, and (2) to improve the credi-
bility of the simulations in simulating regional climate
change and feedbacks. The limitations of coupled mod-elling the presence of flux adjustments has been discussed
widely, e.g. Dijkstra and Neelin (1999). Here the simi-
larity of baseline surface-climate states facilitates the
combination of the HadSM3 and HadCM3 ensembles to
produce ‘‘time-scaled’’ response for a larger number ofcombinations of model parameters (Harris et al. 2006).
The spin-up technique is similar to that described in
Collins et al. (2006) except that a less vigorous salinityrelaxation is employed during the Haney-forced phase
(relaxation coefficients are those used by Tziperman et al.
(1994); 30 and 120 days for temperature and salinity,respectively) which significantly alleviates the problem of
SST and sea-ice biases found in the Collins et al. (2006)ensemble (Fig. 1). The 16 perturbed sets of parameter-
combinations are selected from the 128-member S-PPE-
M, although the combinations are not the same as thoseshown in Table 1 of Collins et al. (2006). For historical
reasons, the sea-ice scheme in HadCM3 is contained in the
atmosphere component of the model and parameters in thescheme are perturbed in line with the equivalent S-PPE-M
ensemble.
2.1.2.5 AO-PPE-O The fully coupled HadCM3 is usedwith the standard atmosphere settings (with interactive
sulphur cycle) but with perturbations to parameters and
schemes in the ocean component. The ensemble extendsthe work of Collins et al. (2007) and Brierley et al. (2009,
2010) who provide details of the physical schemes in
HadCM3 that were surveyed for parameters and switchesto perturb. Briefly, parameters in the schemes which
control horizontal mixing of heat and momentum, the
vertical diffusivity of heat, isopycnal mixing, mixed layerprocesses and water type are varied. A Latin Hypercube
design is employed which is efficient in permitting inter-
actions between perturbations to parameters. The samespin-up technique used in the AO-PPE-A ensemble is
employed to generate flux-adjustment terms. This is in
contrast to the experiments described in Collins et al.(2007) where no flux adjustments were employed. In that
study it was found that model drift can introduce biases in
surface climate which lead to differences in atmosphere/surface feedbacks under climate change. Such biases were
considered undesirable here as we wish to isolate the
impact of ocean parameter perturbations. The use of fluxadjustments also facilitates comparison with the ensembles
which employ a slab-ocean and with the AO-PPE-A
ensemble.
2.2 Multi-model ensembles
Much has been written about the CMIP3 archive of model
output and the reader is referred to Meehl et al. (2007b) for
a history and to the PCMDI web site and for a constantlyevolving list of papers based on the archive. Here we also
augment the analysis by using archived output from the
CFMIP project (e.g. Webb et al. 2006) in the case of model
1742 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
versions which use mixed-later ocean formulations. We
denote the multi-model ensembles used as follows, to beconsistent with the notation adopted above.
2.2.1 S-MME
Different atmosphere models coupled to simple mixed-layer oceans. Model output is extracted from the CFMIP
(see e.g., Webb et al. 2006) and WCRP CMIP3 database at
PCMDI (Meehl et al. 2007b). 20-year averages from19CO2 and 29CO2 experiments are used. The models
used are show in Table 1 and the ensemble consists of 16
members.
2.2.2 AO-MME
We use coupled model output from the 23 models in the
WCRP CMIP3 database. Again, the models used are shown
in Table 1. There is a significant overlap in model versionsbetween the S-MME and AO-MME ensembles.
Fig. 1 Annual mean SST biases in fixed pre-industrial CO2 simula-tions with HadCM3 with standard parameter settings. a The non flux-adjusted version of the model submitted to CMIP3. b The version ofthe model with interactive sulphur cycle and flux adjustmentsreported in Collins et al. (2006). c The standard version of the modelwith interactive sulphur cycle and adjusted Haney relaxation coeffi-cients used in this paper in generating ensembles AO-PPE-A and AO-PPE-O. Adjusting the Haney coefficients leads to a reduction in SSTbiases in all coupled-model simulations
Table 1 Models used in the multi-model ensembles in this study
Model name Atmos-slab Atmos-ocean
BCC-CM1 9
BCCR-BCM2.0 9
CCSM3 9 9
CGCM3.1(T47) 9 9
CGCM3.1(T63) 9 9
CNRM-CM3 9
CSIRO-Mk3.0 9 9
ECHAM5/MPI-OM 9 9
ECHO-G 9
FGOALS-g1.0 9
GFDL-CM2.0 9 9
GFDL-CM2.1 9
GISS-EH 9
GISS-ER 9 9
INGV-SXG 9
INM-CM3.0 9 9
IPSL-CM4 9 9
MIROC3.2 (hires) 9 9
MIROC3.2 (medres) 9 9
MIROC3.2 (high sensitivity) 9
MRI-CGCM2.3.2 9 9
PCM 9
UKMO-HadCM3 9
UKMO-HadGEM1 9 9
UIUC 9
HadCM4 9
The slab-ocean version of UKMO-HadCM3 is not selected as amember of the multi-model ensemble as that is included as a memberof the perturbed-physics ensembles. However, the coupled version isincluded as this version of HadCM3 is run without flux adjustmentsand hence may be considered to be different from the flux-adjustedperturbed physics couple model standard member
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1743
123
There are a number data limitations in this archive and
analysis is only performed on the subset of multi-modelsfor which the data exists and is suitable.
2.3 Experiments and variables
Sheer volume of data prevents us from examining all
variables from all experiments run using the modelensembles described above. Hence we focus on the fol-
lowing set of experiments and variables because (1) thereexists a common set of core experiments that can be easily
and fairly compared and (2) they allow us to examine the
main feedbacks and forcings under commonly used sce-narios for climate change. The experiments examined are:
1. The 19 and 29CO2 equilibrium runs in the case of all
models with mixed-layer/q-flux/slab oceans. For somemodel experiments 19CO2 is taken to mean pre-
industrial levels, while in others it is taken to meanpresent day, or some other level (year 1900 in the case
of the MIROC models for example). We make no
practical distinction here as the differences betweenfeedbacks dominate the response at 29CO2 and
because the applied forcing due to doubling does not
depend significantly on the chosen 19CO2 baselinevalue.
2. Pre-industrial (and in the case of some AO-MME
members, present day) control experiments with noexternal forcing and experiments with 1% per year
compounded increase in CO2. We use 80 years of
output from control experiments for both MME andPPE members and 80 years of 1% per year experi-
ments which, for most MME members, are taken from
the experiment in which CO2 continues to increaseafter year 70 (the ‘‘1%to49’’ experiments). For a
handful of MME members, this experiment was not
available and the run in which CO2 is stabilised pastthe 70 year mark are employed (‘‘1%to29’’ experi-
ments). In practice, this makes little difference to the
calculation of the transient climate response, effectiveclimate feedback parameter, and other quantities of
interest.
3. Experiments forced with historical changes in radi-atively important factors. For the PPE ensemble
experiments, historical changes in CO2, methane and
some minor greenhouse gases are used, together withchanges in sulphate-aerosol emissions and variations
in solar irradiance and volcanic optical depth. The
origin of the anthropogenic and natural forcing is thesame as that in experiments using a subsequent version
of the Met Office Hadley Centre climate model
(HadGEM1), and are described in Stott et al. (2006).For some of the multi-model members, both
anthropogenic and natural factors are included but
for others only anthropogenic factors are used for the
‘‘20cm3’’ simulation (see e.g. Forster and Taylor 2006and Sect. 4.4 later).
4. Experiments forced with future changes in anthropo-
genic greenhouse gases and aerosols under the SRESA1B scenario. For the AO-PPE members, the solar
variability is prescribed by repeating the solar cycle in
the period 1993–2003 for the years 2004–2100 of thescenario. The future volcanic forcing is set constant by
holding the volcanic optical depth to the year 2000
values (close to that in the AO-PPE-A control simu-lations). A range of options appear to be used in the
AO-MME. See Forster and Taylor (2006) for more
information on both historical and A1B forcings in theWCRP CMIP3 ensemble.
We also make use of a very long multi-century simu-
lation of the standard un-flux adjusted coupled version ofHadCM3 with fixed concentrations of greenhouse gases.
This is in order to estimate the natural variability of model
error, climate change feedback parameters and radiativeforcing. While multi-century fixed-forcing experiments
with other models may yield slightly different estimates of
such variability, as we see below, it is common for inter-model or inter-model-version differences to dominate, so
the use of output from just one multi-century modelexperiment is valid.
The list of variables examined is; surface air tempera-
ture (SAT), sea surface temperature (SST), average pre-cipitation rate, net top-of-atmosphere (TOA) energy fluxes
and the shortwave (SW) and longwave (LW) components,
TOA cloud radiative forcing (CRF) SW and LW compo-nents, mean sea level pressure (MSLP), cloud amount,
surface sensible and latent heat fluxes, surface SW and LW
fluxes and zonal mean relative humidity. The use of TOAcloud radiative forcing rather than simply examining the
clear-sky fluxes is preferable as in regions of sea-ice and
land ice/snow small differences between the position of theedge of the ice can dominate the calculation of fields such
as root-mean-squared-errors. By differencing the all- and
clear-sky fluxes the relative difference in the model per-formance in terms of the radiative effects of clouds is better
captured. We use only time-averaged seasonal and annual
fields so that atmosphere-slab and fully coupled simula-tions may be compared. This list thus represents a com-
bination of impact-relevant variables and variables that
have been shown to be linked to climate change feedbackprocesses. They are also the list of variables used by
Murphy et al. (2009) in constraining PDFs of future change
using the ensemble output described here.Observational data is taken from a number of sources
indicated in Table 2. Only one data set is used to calculate
1744 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
model error terms, but the other fields are used to produce
an order-of-magnitude estimate of observational error inthe calculation as described below. In some cases gridded
observational data are derived from the same raw point
information and simply use different statistical techniquesto produce the gridded product. The treatment of uncer-
tainties in observations remains a limitation of this study as
comprehensive estimates of uncertainty simply do not existfor most variables. Nevertheless, this does represent an
advance on previous studies (e.g. Gleckler et al. 2008).
3 Model ‘‘Errors’’
The purpose of this section is to make comparisons
between the modelled and observed mean climate of themembers of the different ensembles in order to contrast the
perturbed physics and multi model approaches. There are a
number of simple and widely used metrics which may beused to quantitatively compare models with observed cli-
mate variables (e.g. Taylor 2001). It is not possible here to
make a completely comprehensive comparison of allvariables, with all possible observational sources, using all
possible metrics. We seek rather to perform an analysis and
inter-comparison of some of the main features of observedclimate between the two different approaches. The analysis
uses the climate variables outlined above, which are chosen
(based on previous experience) on the basis of their user-relevance and because of their key role in physical feed-
backs under climate change.
For each of the observed climate variables considered,
we interpolate both the observations and the multi-modeloutput onto the spatial grid of the perturbed physics
ensemble. This results in the minimum number of inter-
polation steps because of the large number of perturbedphysics members. The global mean bias in a climate var-
iable is defined as the area-weighted globally averaged sum
of the grid-box difference between the 20-year and 80-yeartime-averaged 19CO2 or pre-industrial or present-day
control climates (for slab and coupled models, respec-tively) and the observed climate variable. The sum is
calculated only on grid points at which the observed time-
averaged field exists. The root mean squared (RMS) error,e, is calculated similarly but with the global mean bias
removed before the calculation (sometimes called the
centred RMS error—Taylor 2001). The same step is per-formed when calculating the correlation between the
observed and modelled field. These types of metrics are in
the spirit of the ‘‘Taylor Diagram’’ cited above. The use ofeither pre-industrial or present-day control run is related to
that chosen by the different modelling groups for the initial
state of the 1%/year CO2 increase experiment. As statedabove, while there are detectable differences in metrics
computed from the two differently specified control runs
for a single model (e.g. Reichler and Kim 2008), thegeneric model error tends to dominate so there is little
sensitivity in the final model comparison. We calculate
the bias, RMS error and correlation for both seasonaland annual-mean fields but present only the annual-mean
values for reasons of space and because they are
Table 2 Observational dataemployed in this study to assessmodel errors
The principal fields used areindicated in bold. Otherobserved fields are used toestimate observational errors
Variable Observational field Reference
Land surface air HadCRUT3 Brohan et al. (2006)
Temperature Legates and Willmot (1990)
Sea surface temperature HadISST1.1 1871-1900(used to calibrate flux adjustment)
Rayner et al. (2003)
NCDC SST Smith and Reynolds (2004)
GISS SST Hansen et al. (1996)
Precipitation CMAP Xie and Arkin (1997)
GPCP Adler et al. (2003)
Top-of-atmosphere radiative fluxes ERBE Harrison et al. (1990)
CERES Wielicki et al. (1996)
ISCCP FD Rossow and Lacis (1990)
Mean sea level pressure HadSLP2 Allan and Ansell (2006)
ERA40 Uppala et al. (2005)
Cloud amount ISCCP D2 Rossow et al. (1996)
HIRS Wylie et al. (1994)
Surface fluxes SOC Grist and Josey (2003)
DaSilva Da Silva et al. (1994)
Zonal mean relative humidity ERA40 Uppala et al. (2005)
AIRS version 5 Aumann et al. (2003)
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1745
123
representative of generic errors in different models. We
note though that in terms of constraining model predictionsusing observations, information from the annual cycle may
be of some use (e.g. Knutti et al. 2006).
In order to get an order-of-magnitude estimate of theobservational error, we compute biases, RMS differences
and correlations between all pairs of the observational
fields listed in Table 2. The maximum bias and RMSE andminimum correlation is then used as a crude estimate of the
likely magnitude of the error in the observations. In theabsence of numerical estimates of both systematic and
random errors in the majority of the observational fields,
this is the most simple approach. A conclusion of this studyis that more comprehensive estimates of errors in obser-
vational data sets are required in order to quantify uncer-
tainty in model projections of future climate change.In the case of the AO-PPE-O ensemble, all bias, RMS
error and correlation fields are indistinguishable from the
standard version of the model, presumably because of theuse of identical atmosphere parameters and flux adjust-
ments, so that ensemble is not discussed extensively in
what follows. The values of the error metrics for theAO-PPE-O ensemble are included in the figures for
completeness.
3.1 Errors in two-dimensional time-averaged fields
Examining land surface air temperature errors in the per-turbed-physics model versions with slab-ocean components
first, we see biases and RMS errors of the order of a few
degrees globally (Fig. 2). In the case of the ‘‘de-tuned’’S-PPE-S ensemble with only single parameters perturbed,
land surface temperature biases are exclusively negative
when compared to the HadCRUT3 observational data set,with the standard model versions placed towards the end of
the distribution which is closest to observations. In the case
of the ‘‘tuned’’ S-PPE-M ensemble, there is a wider spreadof biases than in the model versions with only one single
parameter perturbed, in which positive values are evident.
A similar range of RMS errors is evident in the twoensembles, reflecting the optimisation of RMS errors in
ensemble design (see above and Webb et al. 2006). Bigger
RMS errors are seen in the S-PPE-E ensemble whichexplores more regions of parameter space.
In the slab-ocean multi model ensemble, S-MME, we
see a similar range of land SAT biases as in the case of theperturbed physics ensembles, but a somewhat wider range
of RMS errors. It is possible that the specification of dif-
ferent surface boundary conditions, which may impactsurface air temperature in the multi-model ensemble, pro-
motes a wider range of spatial patterns of surface air
temperature. Fields such as orographic height, vegetationand soil properties are identical in each of the members of
the perturbed physics ensembles, although some surface-
related processes such as the roughness length are per-turbed—see the appendix to Murphy et al. (2004). We also
note at this point that correlation scores are of little use
when comparing land surface air temperatures in models asthey are is close to unity for all model versions, being
dominated by the pole to equator temperature gradient.
SSTs in models with mixed-layer or slab oceans are tiedmore closely to observations because of the calibration
phase in which the implied ocean heat transports are cal-culated. The exception is some members of the S-PPE-M
and S-PPE-E ensembles where, while part of the spread in
biases and RMS errors is due to the multiple-parameterperturbations, part may also be attributed to the afore-
mentioned error that was inadvertently introduced into the
calculation of the implied heat transports. Despite this, bothSST bias and RMS errors are of a similar magnitude in
slab-ocean perturbed physics and multi model ensembles
and are in many cases smaller than those errors seen in thenon-flux-adjusted CMIP3 coupled models (AO-MME). As
mentioned above, we have re-run a number of experiments
where noise in the calculation of the slab-model heat fluxconvergence fields was present and found that this has a
relatively small impact on global error characteristics and
feedbacks.Turning to the coupled model ensemble experiments,
the range of biases in SST is generally smaller in both
atmosphere-parameter-perturbed (AO-PPE-A) and ocean-parameter-perturbed (AO-PPE-O) ensembles in compari-
son with the coupled multi-model ensemble (AO-MME).
Similarly, RMS errors are smaller. This is because of theexclusive use of flux adjustments in the former which tend
to limit (but not eliminate) the formation of SST errors.
Perhaps surprisingly however, the range of land SAT bia-ses is also smaller in the flux-adjusted coupled PPE simu-
lations than in the multi-model case and, correspondingly,
Fig. 2 Bias, centred root mean squared errors (RMSE) and correla-tions between two-dimensional time-mean modelled and observedfields. From top to bottom; land surface air temperatures (SAT), seasurface temperatures (SST), precipitation, net top-of-atmosphere(TOA) fluxes (positive incoming), outgoing SW fluxes, outgoingLW fluxes, outgoing SW cloud forcing, outgoing LW cloud forcing,mean seal level pressure (MSLP), cloud amount, surface sensible heatflux, surface latent heat flux, surface SW fluxes, surface LW fluxesand zonal mean relative humidity. Different ensembles (S-PPE-E,etc., see Sect. 2.1) are indicated and one dot is plotted for eachensemble member. The light blue dots show the bias, RMSE andcorrelation for the ensemble mean of all the models in the ensemble.The red dot indicates the experiment with standard HadCM3parameter settings, flux adjustments and interactive sulphur cycle.The light grey shading represents an estimate of the uncertainty inobservational fields (see text). The dark grey shading indicates themean and ±2SD of 20 or 80-year means of the value calculated froma multi-century integration of the non-flux-adjusted version ofHadCM3 and hence gives an order-of-magnitude estimate of naturalvariability in the calculated errors
c
1746 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
the land surface air temperature RMS errors are generally
smaller than those seen in many of the non-flux-adjustedcoupled multi-model members. There are reasonably large
top-of-atmosphere net flux imbalances in some of the AO-
PPE-A members which might be expected to lead to largeland SAT errors, but it seems that having better ocean SSTs
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1747
123
can influence the land surface temperatures in some way,
as perhaps indicated by studies of land–sea contrast whichfind a seemingly strong relationship under climate change
(e.g. Lambert and Chiang 2007; Sutton et al. 2007; Joshi
et al. 2008). It seems that flux adjustment of SSTs can alsolead to better simulation of land temperatures, at least by
these gross measures.
Global mean biases in precipitation in the slab-model
ensembles follow a similar pattern to those in global landsurface air temperature and SST in the different ensembles,
except that the S-MME has a relatively wider range of
biases than any of the other slab-ocean perturbed physicsensembles. The range of global precipitation biases in
AO-PPE-A is smaller than the range seen in AO-MME,
Fig. 2 continued
1748 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
with the former being more consistent with biases seen in
the slab-ocean ensembles. Under climate change scenarios,model differences in changes in global mean precipitation
tend to be positively correlated with differences in changes
in global mean temperature across ensembles via theircorrelation with lower tropospheric water vapour (e.g.
Held and Soden 2006). Here there is no simple relationship
between global mean SST or global mean surface airtemperature and global mean precipitation across present
day/pre-industrial simulations in either the PPE or MMEequilibrium experiments, suggesting that other factors are
at play. Different representations of the effects of aerosol
particles are one potential candidate for explaining a lackof correlation between global mean biases in precipitation
and temperature.
Looking at errors in other surface fields, we note somerelatively large negative biases in mean sea level pressure
(MSLP) in the coupled perturbed physics ensembles (AO-
PPE-A and AO-PPE-O). This is due to a numerical drift inatmospheric mass during the spin-up phase of those
ensemble members which was subsequently corrected
during the running of the control and scenario experimentsanalysed here. Such a global mean bias does not impact the
spatial pattern of MSLP and thus the RMS errors in MSLP
in those ensembles are small in comparison to those seen insome of the multi-model ensemble members. The spatial
pattern of MSLP is a leading-order indicator of the hori-
zontal circulation in the different models and model ver-sions. So the absolute value can be justifiably corrected
a posteriori in, for example, impacts studies, if required.
For the surface sensible heat flux, the range of bothbiases and RMS errors is generally smaller in the perturbed
physics ensembles in comparison with the multi-model
ensembles. In the case of surface latent heat fluxes, theranges are more comparable and are generally larger than
the sensible heat flux errors.
As indicated above, relatively large ranges of biases,compared to, for example, the TOA forcing from a dou-
bling of CO2, are evident in the models with slab ocean
components; more so in the case of the perturbed physicsensembles (by design) but also in the case of the multi-
model slab-ocean ensemble. When coupling to a slab-
ocean model, a non-zero ocean heat convergence term isgenerally permitted and counters the effect of a non-zero
TOA flux imbalance. We permit the existence of relatively
large TOA imbalances in the perturbed physics ensemblesin order to explore more fully the model parameter space
and also note that imbalances might also arise because of
missing or structurally deficient processes (a more com-plete justification and discussion is presented in Collins
et al. (2006)). The largest TOA imbalances are found in the
S-PPE-E ensemble in which parts of parameter spacenot explored by the other ensembles are sampled. It is
interesting to note that the algorithm used to pick out these
additional experiments (see Rougier et al. 2009) tends tofavour models with negative incoming net TOA biases.
These additional experiments were largely designed to
inform the building of an emulator of the parameter spaceof the model and should not necessarily be viewed as being
intended to have realistic climates in comparison to, say,
the model version in the S-PPE-M ensemble which weredesigned to have small RMS errors. As we shall see later
however, it is possible to span a wide range of global cli-mate change feedbacks with models which are close to
radiative balance at the TOA (Fig. 10, Sect. 5).
The TOA-error situation for the coupled AO ensemblesis rather different. We speculate that the members of the
AO-MME ensemble have been developed to produce a net
TOA flux close to zero to avoid climate drifts and the useof flux adjustments, hence the existence of relatively small
global-mean biases in that ensemble. There is a strong anti-
correlation (coefficient of -0.8) between global meanbiases in SW TOA fluxes and global mean biases in LW
TOA fluxes which seems to limit biases in net TOA in the
coupled multi-model ensemble. With the exception of theINM-CM3.0 model, this strong anti-correlation is also seen
in the equivalent slab-ocean versions of the multi model
ensemble. No such anti-correlation is evident in the per-turbed physics AO-PPE-A, and would not have been
expected because of the design of the ensemble.
The picture for the TOA SW fluxes is similar to that ofthe net TOA, with a larger range of biases in the models
coupled to slab oceans, and in particular some large biases
in some of the model versions in the S-PPE-E ensemble. Asmaller range of biases is seen in the coupled AO models
with the AO-MME showing the smallest range. The LW
situation is slightly different however. The S-MME andS-PPE-S and S-PPE-M ensembles show a more similar
range of smaller biases than in the SW case and, in par-
ticular, the AO-MME and AO-PPE-A ensembles have asimilar and smaller range (note the difference in scale on
the x-axis in the panels of Fig. 2). It seems that the SW
biases dominate the total spread in TOA biases in the slab-ocean perturbed physics sub-ensembles; absolute correla-
tion coefficients are greater than 0.9 (i.e. less that -0.9)
between net TOA and SW biases for all slab-model per-turbed physics ensembles. Isolating the components of the
fluxes associated with the cloud radiative forcing (labelled
SW CF in Fig. 2), we find that this is the main driver of thespread in total TOA SW biases in the PPE ensembles.
Indeed, it appears that cloud forcing biases and RMS errors
are of a similar magnitude to the biases and RMS errors intotal fluxes, indicating a major role for clouds in deter-
mining model errors in energy fluxes. We see global mean
biases and RMS errors of the order of 10% and greater inthe fractional cloud amount in both perturbed physics and
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1749
123
multi-model ensembles and we return to this point later
(Sect. 5, Fig. 10). We also note here that that the picture forSW and LW TOA flux errors is reflected at the surface with
a very similar patterns of biases and RMS errors for the
different ensembles.Finally we examine errors in the zonal mean relative
humidity fields in the ensemble experiments. Here we see a
relatively small range of biases in all perturbed physicsensembles in comparison to the relatively wide range seen
in the multi-model slab ocean and coupled ensembles. Theranges of RMS errors between the PPE and MME
ensembles are more comparable however.
A motivation for the quantification and comparison ofmodel errors in this way is to use the information to assign
relative levels of credibility to different members of the
different ensembles; with the ultimate aim of producingprobabilistic projections. It is clear from Fig. 2 that, for the
majority of variables, the bias and RMS errors in the
models are bigger than the crude estimate of uncertaintythat can be attached to the observational data sets and much
bigger than that which can be attributed to natural vari-
ability. Nevertheless, differences between observationaldata sets are large enough for some variables to make it
difficult to distinguish between different models or differ-
ent model versions, which would, in turn, make it difficultto assign a relative likelihood. For example, in the case of
total outgoing SW flux biases, it is clear that a model with a
bias of 40 W m-2 is inconsistent with the observations, butwhat can we say about two models, one of which has a bias
of -5 W m-2 and the other which has a bias of
?5 W m-2? Both are, in some sense, consistent with theuncertainty in outgoing SW flux measures. Such uncer-
tainty presents a considerable challenge when both devel-
oping models and constraining ensembles of models withobservations. We discuss this issue further in Sect. 5.
It is clear from Fig. 2 that, using the perturbed physics
approach, it is possible to sample model versions in whichthere are biases and root mean squared errors in mean
climate fields which are comparable to those found in the
multi-model ensembles. The key point is that, in the case ofthe perturbed physics ensembles, it is possible to have
some control over the error characteristics of the ensemble
members one produces. In the next section we discusssome further characteristics of model errors in the two
types of ensemble.
3.2 Similarity of model errors
The relative similarity of model errors is of interest as inmany multi-model exercises it is often observed that the
ensemble-mean forecasts or ensemble-mean climatologies
are found to have greater skill/fidelity than those forecastsor climatologies produced with any individual model
(e.g. Lambert and Boer 2001; Hagedorn et al. 2005;
Reichler and Kim 2008; Gleckler et al. 2008). It can beseen in Fig. 2 (blue dots), that for the multi-model
ensembles examined here, the ensemble mean RMS error is
in many cases smaller than the smallest RMS error of anyindividual model of the ensemble. For the perturbed
physics ensembles, this is not always the case. We might
suspect that the multi-model approach would be charac-terised by a wide spread of spatial distributions of model
errors, imprinted by the different structural approaches,whereas the perturbed-physics approach would be charac-
terised by very similar spatial distributions of errors related
to the single model structure.To investigate this we can make use of a simple spatial
correlation measure to look at the differences between
spatial patterns of errors in different models (e.g. Jun et al.2008). Spatial difference fields (i.e., model two-dimen-
sional mean fields minus observations) do not tend to be
dominated by large latitudinal gradients, which result in thenear unity correlation score in Fig. 2 that is evident for
many fields. Hence they are of use here. Figure 3 shows
frequency distributions of intra-ensemble error correlationsfor the ensembles and variables considered previously. In
each case, an n by n matrix of the correlation between the
spatial errors of all pairs of ensemble members is computedfor the n-member ensemble in question. The histograms
show the relative occurrence of values of those correlation
coefficients in different bins of width 0.1, computed overthe lower triangle of the matrix and excluding the unit
values on the diagonal. The mean and spread of the his-
togram provides information on the similarity of spatialerror patterns within the ensemble.
We can produce a similar diagnostic for non-overlap-
ping sections of the multi-millennial control run of Had-CM3 to test for the effects of natural variability. For all the
variables considered in Fig. 3, correlations between spatial
errors in different sections of that long run always lie in thebin 0.9–1.0 (this is the case for both 80-year and 20-year
averages) hence the structure of the intra-ensemble error
similarity in Fig. 3 and may be interpreted as real differ-ences in spatial error structures between ensemble
members.
For the multi-model ensembles with slab-oceancomponents and with dynamic oceans (S-MME and
AO-MME), there are relatively wide distributions of spatial
patterns of model error with error correlations distributedaround an average correlation of approximately 0.5 for
most variables. There is little evidence of very low or
negative correlations which suggest that, on a global scale,models share some commonality of error patterns although
regionally errors can be of a different sign. Although
observational data sets can suffer from global and regionalbiases and random errors, repeating Fig. 3 by selecting a
1750 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
random observational data set (as described in the abovesection) for each element of the correlation matrix pro-
duces no qualitative change to the figure. This leads us to
conclude that common errors in models do not arisebecause of differences in observational data sets, unless it
is the case that all the observational data sets considered
share the same regional biases. Although we reiterate thatthe treatment of observational error by the simple sampling
of available data sets is rather crude.
In the case of the perturbed physics ensembles, there is atendency for distributions of spatial correlations to be
skewed more towards unity, i.e. more similar spatial pat-
terns of error across the ensemble than in the case of themulti-model ensembles. However, it is not universally the
case that the perturbed physics approach results in a dis-
tribution of more-identical patterns of model errors usingthis measure. In the case of the S-PPE-E, and to a certain
extent the S-PPE-M, the distributions of spatial correlations
Fig. 3 Distributions of intra-ensemble spatial correlations of annualmean model errors for the climate variables and ensembles consid-ered. The name of each ensemble is indicated above the individual
panels and the variables are indicated on the abscissa. The bin size is0.1 of correlation coefficient in each case
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1751
123
are more like those computed from the multi-model
ensembles, with a larger spread of correlations and averagecorrelations much less than unity. This suggests that the
perturbed physics approach can be used to sample a rela-
tively wide range of different baseline climates. Never-theless, there is clearly ‘‘imprint’’ of the baseline model on
the spatial patterns of errors.
A further diagnostic of spatial error characteristics canbe derived from the root mean squared error statistics of the
two-dimensional fields. If we let e2 be the total meansquared error (MSE) in a single climate field in an
ensemble member (with the global mean bias removed as
above) we may decompose this total error into a systematiccomponent, es
2, a random component er2 and a component
due to natural variability, en2;
e2 " e2s # e2
r # e2n:
The systematic component, es2, is defined as the mean
squared error of the ensemble average of that particulartwo-dimensional variable. In this case, the term
‘‘systematic’’ refers to the error which is common to all
models in the ensemble. The component due to naturalvariability, en
2, is that due to taking different 20-year or 80-
year averages from a long unforced control integration: by
examining the long HadCM3 control experiment, we findthis component to be small and hence it is possible to
neglect it. The random component of mean squared error,
er2, is that associated with drawing a particular model from
the underlying distribution of all models in that particular
ensemble type. The concept of an underlying distribution
of models is simpler to imagine in the case of the perturbedphysics ensembles; it is the space of all possible parameter
settings of HadCM3. In the case of the multi-model
ensembles it is perhaps harder to define but we persist withthe analogy in order to interpret the error characteristics of
the ensembles.
Figure 4 shows the average of the relative contributionsof systematic and random errors as a fraction of the total
error computed from the multi-model and perturbed
physics ensembles. For the multi-model ensembles,approximately half or less of the total error is explained by
the systematic component. For these ensembles, the meansquared error of the ensemble mean is smaller than the
smallest mean squared error of the ensemble members and
the ensemble mean is the ‘‘best’’ model. The general sit-uation for the perturbed physics ensembles is that more of
the total error is contributed from the systematic compo-
nent than the random component. The spatial patterns ofthe errors in each member are more similar, as is seen in
Fig. 3. However, for the S-PPE-E ensemble, there is much
more a character of the multi-model partitioning of sys-tematic and random components of error. That is a greater
sampling of random model versions in which patterns of
error do not resemble each other closely. For this sub-
ensemble, the distribution of error correlations is more likethat seen in the multi-model ensembles and the ensemble
mean root mean squared error for a number of different
climate fields is close to the minimum of the error found inany individual member (Fig. 2).
Again, it is perhaps possible that Fig. 4 may be altered
substantially by existence of regional errors in observa-tional data sets. While small differences are evident on
choosing different data sets for computing the figure, thequalitative picture is not altered. Neither is it altered if we
average the ensemble mean squared errors over all the
different observational data sets considered and sample arandom observational data set to compute the random
component of error.
Some of the members of the S-PPE-E ensemble haverather large TOA flux imbalances. Nevertheless, this par-
titioning of more equal contributions from systematic and
random components is even the case if the S-PPE-Eensemble is restricted to ensemble members which are
within 5 W m-2 of TOA balance (indicated as S-PPE-Bal
on Fig. 4). Indeed, the partitioning of systematic andrandom errors for this collection of 43 experiments is
much closer to that seen in the multi-model case. This
suggests that it is possible, in some sense, to mimic thebehaviour of the multi-model ensemble, i.e. having a
greater proportion of random as opposed to systematic
errors and having the ensemble mean being the ‘‘best’’model, if this was thought to be an important aspect of the
ensemble design.
4 Feedbacks and forcings
4.1 Surface-atmosphere feedbacks
A number of frameworks exist for computing climatechange feedbacks and their components. For example,
Soden and Held (2006) analyse the components of feed-
backs in the CMIP3 models using a technique which allowsthe separation into water vapour, lapse rate, cloud and
albedo components. While computationally more tractable
than the full radiative perturbation method (e.g. Colman2003), such methods require a significant amount of pro-
cessing of three-dimensional fields of model output and the
use of a specific off-line radiation code.Given the large number of ensemble members examined
here, and the desire to examine feedbacks in the maximum
number of multi-model ensemble members, we adopt thesimplest, linear, approach to feedback analysis (e.g. Cess
et al. 1990; Boer and Yu 2003). This approach forms the
total feedback parameter (in W m-2 K-1) into componentsfrom the clear-sky and non-clear-sky regions (hereafter
1752 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
cloud feedback or cloud radiative forcing (CRF) feedback)and further into SW and LW components respectively. The
advantage of using this method is that only a handful of
surface and TOA fields are required for each ensemblemember and there is no dependence on, for example, the
choice of off-line radiation scheme. Furthermore, the
method is easily applicable to both equilibrium atmo-sphere-slab-ocean 29CO2 model experiments and to the
1% per year CO2 increase transient model experiments run
with the coupled atmosphere–ocean models; in the lattercase the ‘‘effective’’ feedback parameters (Murphy 1995)
may be calculated. Nevertheless, there are potential issues
to consider when using the simple approach that are welldocumented (Zhang et al. 1994; Colman 2003; Soden et al.
2004). These problems are alleviated in a related publica-
tion (Yokohata et al. 2010) by adopting the Taylor et al.(2007) approach in comparing the equilibrium feedbacks in
these HadCM3 ensembles with those in a similar perturbed
physics ensemble performed with the MIROC3.2 model.
Gregory et al. (2004) and Forster and Taylor (2006)adopt a time-regression technique to calculate the feedback
parameter and its components in the case of the transient
experiments. In contrast, Raper et al. (2002) use 20-yearaverage model fields centred at the time of CO2 doubling in
the 1% per year scenario. Here both techniques were tested
and no discernible differences were found between theapproaches, with high correlations ([0.9) across ensembles
between feedback components calculated in both ways. We
adopt the latter (20-year average) approach which retainsconsistency also with the analysis of feedbacks in the
equilibrium experiments which also employs 20-year
averages. Thus, the natural variability of the estimates inthe two ensembles should be similar (notwithstanding the
slightly larger natural variability that is likely in models
with a full dynamical ocean).For the estimate of the radiative forcing of doubled CO2
we use the values tabulated in Table 10.2 of Meehl et al.
(2007a) for the multi-model ensemble members. In cases
Fig. 4 The relative contribution of systematic (black bar) andrandom (white bar) mean squared errors to the total error averagedover the ensembles indicated on the figure for the different climate
variables considered. S-PPE-Bal indicates an ensembles of all thoseperturbed-physics slab models for which the net TOA flux is within5 W m-2. See Sect. 3.2 for more details
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1753
123
where there is incomplete information on the radiative
forcing for a doubling of CO2, including all the perturbedphysics members, we adopt the same estimate of
3.85 W m-2 in the LW and -0.15 W m-2 in the SW
(Myhre et al. 1998). For the perturbed physics ensembles,the same radiation code is employed in each member and
no perturbations were made to parameters which might
directly effect the radiative forcing from CO2 (although aswe see later, the perturbations do seem to effect the forcing
from other agents). Hence we use the standard value of theHadCM3 double CO2 forcing in the perturbed physics case
for both simplicity and for the practical reason of not being
in possession the radiative forcing data for the ensemblemembers.
While some studies (e.g. Senior and Mitchell 2000) have
shown that the climate feedback parameter and its com-ponents may have a time-dependency in the transient cli-
mate change case, we find, in agreement with the work of
Forster and Taylor (2006), no significant variations withina single member over the course of the 1% per year CO2
increase 80-year transient experiments examined. That is
not to say however that there may be time-dependence ineither multi-model or perturbed physics ensemble members
at higher levels of radiative forcing or after significant
further climate change.Figure 5 shows the analysis of the climate feedback
parameters and its components for all the ensembles
considered. The range of total feedback parameter in theslab-ocean multi model ensemble corresponds to a range of
2.0–6.3 K in equilibrium climate sensitivity, with the upper
bound being a version of MIROC3.2 included in theCFMIP ensemble because of its known high sensitivity and
hence because of its usefulness in examining a range of
feedback processes. The range of effective climate sensi-tivity in the AO-MME ensemble is slightly wider at
1.6–7.0 K, the lowest sensitivity model being the BCCR-
BCM2.0 (with no slab-ocean version available) and thehighest being the MIROC3.2hires, which is not the same
version as the highest sensitivity model included in the
CFMIP ensemble. The behaviour of the latter is docu-mented in Yokohata et al. (2008).
As shown in other studies (e.g. Stainforth et al. 2005;
Piani et al. 2005; Webb et al. 2006) the perturbed physicsapproach is capable of exploring a range of global climate
feedbacks of a similar order of magnitude to those found in
the multi-model case. In the case of the S-PPE-E ensemble,the range of feedbacks is somewhat wider than either of the
multi-model ranges, spanning climate sensitivities from 1.6
to 7.9 K. Other perturbed-physics studies with HadCM3(Stainforth et al. 2005; Piani et al. 2005) see inferred cli-
mate sensitivities ranging from approximately 2 K up to
greater than 10 K. These sensitivities (determined from anexponential fit to a 15 year slab-model experiment rather
than an experiment integrated to equilibrium) arise because
of differences in the parameter values chosen for theensemble design in those studies; notably, and as pointed
out in Stainforth et al. (2005), because of the use of lowest
value of the entrainment rate parameter in many members(see also Sanderson et al. 2008). For the S-MME-M
ensemble examined here, there are two high climate sen-
sitivity members (6.7 and 7.1 K) for which the entrainmentcoefficient is not set close to its minimum value, but closer
to the standard value indicating that it is not essential to setthat particular parameter low to produce a high sensitivity
version of HadCM3. The range of total feedback parameter
values in the AO-PPE-A ensemble is similar to that in theS-PPE-M ensemble, a feature of the ensemble design (see
above and Webb et al. 2006). As in the case of the com-
parison of bias and RMS errors, the feedbacks in theAO-PPE-O ensemble are all very similar; there is little
impact of perturbing ocean parameters on global surface
and atmospheric feedbacks.Splitting first the total feedback parameter into compo-
nents from clear-sky and CRF areas, it is clear that the
range of the latter is larger than that in the former. Theclear-sky feedback is exclusively negative, being domi-
nated (we may assume) by the negative black-body feed-
back offset partially by positive water vapour/lapse ratefeedbacks (clear sky LW) and by some positive feedback
(clear sky SW) from sea-ice and snow albedo processes.
Cloud feedbacks (using this method) can take either signand are furthermore composed of SW and LW components
of either sign.
4.2 Drivers of uncertainties in feedbacks
Correlations between feedback components and the totalfeedback parameter provide a simple way of determining
the leading-order driver of temperature response uncer-
tainties in the ensembles (Fig. 6). It is evident that CRFfeedbacks are, as reported elsewhere (e.g. Webb et al.
2006), the major drivers of spread in the total feedback
parameters. The correlation between the total and cloudfeedback parameter is 0.8 in the case of the AO-MME
ensemble; 0.9 in the case of the S-MME ensemble; 0.8 in
the case of the S-PPE-S and greater than 0.9 for the otherperturbed physics ensembles. In terms of the SW and LW
components, in the case of both multi-model ensembles it
is the SW component of the CRF feedback which is moststrongly correlated with the total; correlation coefficients of
0.7 and 0.8 for the coupled-model and slab-model ensem-
bles respectively. In the perturbed physics cases, correla-tions between the SW and total feedback parameters are
positive, but more modest. Stronger correlations are found
between the LW component of the cloud feedback andthe total feedback parameter in the PPEs (correlation
1754 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
coefficients of 0.8 in each separate ensemble). This is in
contrast to the MMEs, where there is almost zero corre-lation between the LW components and the total feedback
parameter.
These results are confirmed by Yokohata et al. (2010) inwhich the variance of the feedback parameter in the Had-
CM3 PPEs is found to be explained by variations in both
SW and LW cloud feedbacks. Because the multi-model
ensembles have only a few members, sampling issues
might affect this conclusion. By sub-sampling the per-turbed physics ensembles it is possible to find small sub-
ensembles that behave like the multi-model ensemble; that
is having a high correlation ([0.8) of the SW cloud feed-back parameter with the total feedback parameter, while
having a low correlation (\0.1) of the LW cloud feedback
parameter with the total. The frequency of occurrence of
Fig. 5 Global atmospheric and surface climate feedback parametersin W m-2 K-1 and (effective) climate sensitivity computed at thetime of CO2 doubling, or at 29CO2 equilibrium for the ensembles asindicated on the panels. A circle is plotted for each member and thewidth of the grey shading is an estimate of uncertainty due to naturalvariability in the calculation as estimated from the long HadCM3control experiment. Top panel the total feedback parameter and
(effective) climate sensitivity; next panels the decomposition of thetotal into clear sky and CRF components; next panel the decompo-sition of the clear sky feedbacks into SW and LW components; nextpanel the decomposition of the cloudy sky feedbacks into SW andLW components and bottom panel the decomposition of the total intoSW and LW. Each panel is drawn in the same scale for comparison
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1755
123
Fig. 6 Scatter plots of total feedback parameter (x-axis) againstcomponents of the total feedback parameter (y-axis) used toinvestigate the drivers of uncertainty in total feedbacks. The name
of the ensemble is indicated in the title of each plot and the correlationcoefficient is also quoted. The ordinate variable in each row isdifferent and is indicated by the title on the y-axis
1756 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
these sub-ensembles is very small however, and we assess
the chance of randomly generating MME-like behaviourgiven only a small perturbed physics ensemble to be less
than 1%. Yokohata et al. (2010) explain the apparent
importance of LW cloud feedbacks by splitting the feed-backs into the classes defined in Webb et al. (2006). They
find that the classes with substantial LW cloud feedback
(associated with high cloud) contribute little to the totalfeedback because of opposing SW cloud feedbacks. This
leads to the conclusion that, like the multi-model ensemble,it is the SW component of the cloud feedback (associated
with low cloud regions) which is the principal driver of
uncertainty in the case of the perturbed physics ensemblesexamined here.
4.3 Ocean feedbacks
In the case of the transient experiments, ocean feedbacks
are also important in determining the rate and magnitude ofclimate change. There are various ways of measuring the
efficiency of the ocean in taking up heat and we compare
four related measures here. The k parameter or ocean heatuptake efficiency (e.g. Raper et al. 2002) has the same units
as the atmosphere-surface feedback parameters discussed
above, W m-2 K-1, and can be thought of as the equiva-lent ‘‘ocean feedback parameter’’ which measures the rate
at which heat is removed per unit degree of warming. kGenerally has a time-dependence in, for example, a 1% peryear CO2 increase experiment but by measuring it at the
same time point in each ensemble member, a comparison is
possible. Alternatively, we also examine the effective heatcapacity of the ocean in J K-1 m-2, which may be trans-
lated into an effective ocean depth. A further measure may
be obtained by fitting the output of each ensemble memberto a simplified upwelling-diffusion energy balance model
(Huntingford and Cox 2000) and determining the ocean
thermal diffusivity that best matches the member. This issimilar to the approach of Forest et al. (2006) although we
note that, because the simplified model used here (cited
above) is different to that used by Forest et al. (2006), theestimate of the diffusivity are not numerically comparable.
These measures are contrasted in Fig. 7 for the three
coupled model ensembles considered.Here we do see a significant difference between the
behaviour of the perturbed physics and multi model
ensembles. In the AO-PPE-A ensemble with perturbationsto atmosphere parameters, we see little ensemble spread in
these measures of ocean heat uptake in comparison with
the spread seen in the AO-MME case. This might havebeen expected as each member of the AO-PPE-A ensemble
employs an identical ocean component. However, in the
case of the AO-PPE-O ensemble, with identical HadCM3atmosphere components but perturbations to parameters in
the ocean model, there is a similarly small spread. Collins
et al. (2007) performed a smaller number of HadCM3experiments with perturbations to parameters controlling
three vertical heat transport processes. They found only
small variations in the rate of transient warming in theseun-flux-adjusted experiments. Only marginally significant
variations were found, associated with both changes in
ocean heat uptake efficiency and atmosphere and surfacefeedbacks associated with climate drifts that arise because
of the lack of flux adjustment in those experiments. In theAO-PPE-A and AO-PPE-O ensembles, flux adjustments
are employed to limit such drifts. It appears that the mul-
tiple ocean-component parameter perturbations made here[perturbing more parameters than was done in Collins et al.
(2007)] do not affect the rate of ocean heat uptake signif-
icantly. Nor do they affect the mean surface climate fields,as can be seen from Fig. 2.
Brierley et al. (2009) examine the Collins et al. (2007)
un-flux-adjusted experiments in more detail. They foundfirstly that there is only a small impact of those limited
perturbations on the total heat uptake, the variations being
an order of magnitude smaller than the ensemble averageheat uptake. Furthermore they found an interesting form of
compensation in those experiments, such that when a single
ocean process is perturbed, direct changes in the heatuptake associated with that perturbation is often balanced
by an indirect change in heat uptake from another pro-
cesses (see e.g. Fig. 4 of Brierley et al. 2010). We may
Fig. 7 Measures of the rate of ocean heat uptake in the three ensemblesindicated. Top left panel the k parameter or ocean heat uptake efficiency(W m-2 K-1); top right panel the effective ocean heat capacity(J m-2 K-1); bottom left panel the effective ocean depth (m) computedfrom that heat capacity; bottom right the heat diffusivity (W m-1 K-1)computed by fitting a simplified energy balance model. A filled circle isplotted for each ensemble member. Forest et al. (2006) present theirfitted heat diffusivities in units of the square root of cm2 s-1.300 W m-1 K-1 corresponds to 0.86 (cm2 s-1)-1/2, 600 W m-1 K-1
is 1.2 (cm2 s-1)-1/2, 900 W m-1 K-1 is 1.5 (cm2 s-1)-1/2 and1,200 W m-1 K-1 is 1.7 (cm2 s-1)-1/2 for comparison with theirstudy. This makes the estimates here somewhat lower that the estimatespresented in Forest et al. (2006) and Stott and Forest (2007)
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1757
123
conjecture that a similar compensation is happening here.
Furthermore, for the scenario experiments (1% per yearCO2 increase and SRES A1B) the linear nature of the
radiative forcing increase means that the ocean plays only a
relatively minor role in determining the rate of climatechange. Contrast, for example, the relative magnitude of
the ocean heat uptake efficiency, k, in Fig. 7 and the total
feedback parameter in Fig. 5. In scenarios where forcing isstabilised, it may be possible to see more of an impact of
the perturbations. Limited experiments have been per-formed with the AO-PPE-O ensemble and the standard
deviation of 20-year averaged global mean temperature
anomalies after 60 years of 29CO2 stabilisation is 0.14 Kin comparison with 0.06 K for the standard deviation of the
20-year averaged TCR computed at the time of CO2 dou-
bling. This represents only a modest increase in spread. It iseither the case that we have not perturbed the most
appropriate parameters in the model despite an extensive
effort to consult our ocean-modelling colleagues, or thatthe heat uptake in this particular ocean component is rather
robust to changes in parameters under the forcing scenarios
examined. Perhaps more ‘‘structural’’ changes are required.We leave these questions to further research.
4.4 Radiative forcing
Having calculated the feedback parameter and its SW and
LW components for each of the coupled atmosphere–oceanmodels from the 1% per year CO2 experiments, it is pos-
sible to estimate the radiative forcing in the historical and
SRES A1B scenarios using the simple linear method ofForster and Taylor (2006) (see their Sect. 2). The forcing is
calculated as the sum of the global feedback parameter
multiplied by the global surface air temperature responseand the global TOA flux diagnosed from the model.
Differences between modelled climate responses in
complex forcing scenarios involving aerosols and naturalfactors will be partly a consequence of differences in cli-
mate feedbacks but also partly because of differences in the
radiative forcing. These may arise because of differentspecifications of forcing agents (e.g. volcanic optical
depths, solar input, aerosol emissions) but also because of
different treatments of those forcing agents by the differentmodels, e.g. the conversion of aerosol emissions into
concentrations, or even because of different radiation codes
(e.g. Collins 2006). As models get more complex, theradiative forcing is less a well-known function of the input
data, but more a result of interactions between complex
modelled processes; the aerosol indirect effects beingprime examples.
The first feature to note is the clustering of the AO-
MME historical simulations into two groups containingthose which apply both anthropogenic and natural forcing,
and those in which only components of the anthropogenic
forcing are applied (Fig. 8). This is obvious from the‘‘negative spikes’’ in the SW forcing time series in the
historical phase of the experiments. Coincident negative
spikes are also seen in the AO-PPE-A historical phase inwhich the volcanic forcing is specified from an updated
Sato et al. (1993) series. The negative SW spikes are
accompanied by smaller positive volcanic LW forcingspikes that result from an enhanced greenhouse effect that
is particularly strong in the polar-night regions where theSW forcing is absent. An estimate of the average volcanic
radiative forcing is calculated in Fig. 9 by differencing the
radiative forcing in the years following the three latetwentieth century eruptions (1964, 1983 and 1992) with the
average value of the radiative forcing in the 5 years prior to
each eruption, and then taking the average over the threeevents. The corresponding ranges of estimated volcanic
forcing in the AO-MME and AO-PPE-A are quite similar
in the SW, LW and in the total. While this averagingreduces contamination from both natural variability and
uncertainties in other forcing agents, some uncertainties
remain as can been seen from the grey shading in Fig. 9.There is inevitably a significant amount of contamination
by natural variability when estimating the volcanic radia-
tive forcing in this way. Despite the fact that the volcanicforcing time series of stratospheric optical depth is pre-
cisely the same in each member of the perturbed physics
ensemble, the spread in total negative volcanic radiativeforcing is comparable with the spread in the multi-model
case in which different input forcing data are used.
In order to compare the century-scale historical radiativeforcing across all members of the ensembles, it is more
convenient to average the radiative forcing in the decade
1995–2004 (Fig. 9), hence avoiding large volcanic erup-tions. Here we do see differences between ensemble
members which are greater than would be expected from
natural variability. The range of SW forcing in theAO-MME is similar to that in the AO-PPE-A but with the
latter having a mean which is slightly more negative (an
ensemble mean of -0.9 W m-2 compared to -0.6 W m-2).Uncertainty in SW forcing could be a consequence of dif-
ferences in the forcing from sulphate and other aerosol
particles that arise because of different ways of specifyingthe direct forcing and due to the way that the model cal-
culates the indirect forcing. The HadCM3 aerosol scheme
translates fields of emissions into concentrations bydynamical processes and represents only the first aerosol
indirect effect (Jones et al. 2001). Despite no perturbations
to the parameters in that sulphate aerosol scheme in theperturbed-physics ensemble, there does appear to be some
significant spread in the SW forcing. In addition, rapid
cloud adjustments to changing levels of greenhouse gases(Gregory and Webb 2008) can appear as an effective
1758 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
forcing when the calculations are performed in this way.
Hence differences in the attributes of the physical climatesystem which arise through variations in model parameters
appear, in this case, to be sufficient to lead to differences in
the SW forcing which are on a par with that seen in the
multi-model ensemble. Uncertainties in aerosol forcing inequilibrium perturbed physics ensembles are further
examined in Ackerley et al. (2009), while the transient
Fig. 8 Time series of global mean surface air temperature changeand estimated radiative forcing (SW, LW and total) in coupled modelensemble simulations of the historical period and the future under theSRES A1B scenario. The top row shows the forcing time series fromthe multi-model members which include anthropogenic forcing onlyin the historical period. In the second row, the multi-model members
include both anthropogenic and natural forcings. The third row is forthe historical experiments using the perturbed physics AO-PPE-Aensemble with perturbed atmosphere parameters. The bottom tworows are the future experiments with multi-model and perturbedphysics ensembles respectively
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1759
123
experiments here will be examined in further detail inforthcoming publications.
LW forcing in 1995–2004 is centred around
2.4 W m-2 in both the multi-model and perturbed-physicsensembles, with a range of 1.5–3.1 W m-2 in the
AO-MME case and a smaller range of 2.1–2.7 W m-2 in
the AO-PPE-A case (in both cases the range is greaterthan would be expected from natural variability). These
values may be approximately compared with those pre-
sented in Fig. 2.1 of Forster et al. (2007). The number ofminor greenhouse gases prescribed in the AO-MME may
vary across the ensemble, whereas changes in major andminor gases in the AO-PPE-A are the same in each
member which could be a part explanation for the smaller
range in the perturbed physics case. Another factor whichmay contribute to the slightly greater range in the case of
the AO-MME is differences in radiation codes (Collins
2006). Nevertheless, there does appear to be some influ-ence of variations in atmospheric model parameters on
the LW forcing in the AO-PPE-A ensemble when calcu-
lated in this way. This is confirmed when we look at theLW forcing in that ensemble in the future.
The spread of total historical forcing in the multi-model
ensemble (0.9–3.0 W m-2 with a mean of 1.7 W m-2) isslightly bigger than the spread in the perturbed physics
historical forcing (0.7–1.9 W m-2 with a mean of
1.5 W m-2). The perturbed physics approach does how-ever produce some significant spread in forcing, despite the
specification of identical forcing time series (greenhouse
gases, aerosol emissions and natural factors) in eachmember. The spread in total forcing in both ensembles can
be compared to that stated in Forster et al. (2007); 0.6–
2.4 W m-2 with a mean of 1.6 W m-2 for the year 2005. Itis likely that further spread would arise if a greater number
of ensemble members are performed with a wider sample
of atmosphere and sulphur-cycle parameter space and inputforcing fields.
Turning to the future forcing estimates, more interestingdifferences are evident between the AO-MME and AO-
PPE-A as measured by the average in the decade 2090–
2099. A larger range of SW forcing is diagnosed from themulti-model ensemble than in the perturbed physics
ensemble (Figs. 8, 9) with even some positive radiative
forcing in the SW. Forster and Taylor (2006) discuss this inmore detail. In addition there are a few ‘‘outliers’’ in the
calculation of the LW forcing in the multi model case.
Even excluding these outliers, the range of total forcing in2090–2099 appears to be larger in the multi-model case
than in the perturbed physics case. In order to generatefuture uncertainty in radiative forcing in the case of per-
turbed physics ensembles, it is probably necessary to
sample uncertainties in the input files which specify theless certain radiative agents such as aerosols, ozone, etc.
In summary, the range of uncertainty in radiative forcing
between the multi-model and perturbed physics cases arecomparable in terms of the forcing due to volcanic erup-
tions and the mean forcing over the historical period. In
terms of the future forcing, there is a wider spread in themulti-model than in the perturbed physics ensemble. The
consistency between the volcanic and historical forcing is
partly a consequence of contamination by natural vari-ability which reduces the signal-to-noise ratio but which
plays a relatively less important role as the forcing rises in
the future.Sampling uncertainties in historical forcing is desirable
as those will impact, for example, the component of the
committed warming in any prediction scheme and the useof historical trends in providing observational constraints.
Sampling future forcing scenarios in some way which may
be suitable for producing PDFs is a more difficult problembecause of the lack of a large body of literature on prob-
abilistic forcing scenarios; hence the conditioning of PDFs
on the SRES scenarios in Murphy et al. 2009. Nevertheless,some efforts to quantify uncertainties in the economic
Fig. 9 Time-averaged SW, LW and total radiative forcing fromdifferent coupled model ensembles and for different time periods asindicated on the y-axis of each panel. Vol indicates the annual-meanforcing averaged over the years following the major twentieth centuryvolcanic eruptions (1964, 1983 and 1992). A circle is plotted for each
member of the ensemble and the grey shading represents an estimateof the ±2SD uncertainty in the estimate due to natural variability(computed from the long HadCM3 control experiment). Left SWforcing; middle LW forcing; right total
1760 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
Fig. 10 Scatter plots of total climate change feedback parameter(Fig. 5) versus time-averaged model errors (biases and root-mean-squared errors—Fig. 2). Black crosses indicate perturbed physicsmodel experiments with HadSM3 and HadCM3, red squares indicatemulti-model experiments with slab ocean components and red
triangles indicate multi-model experiments with dynamical oceans.The grey vertical bars are an estimate of the uncertainty fromobservations in the calculation (from Fig. 2), centred on the mean biasor RMSE across all ensemble types
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1761
123
drivers of future forcing are underway (Webster et al. 2002;
Sokolov et al. 2009).
5 Relating model errors to feedbacks
Having examined model errors and climate change feed-
backs in the multi-model and perturbed physics ensembles,we now examine the relationships between them. As
highlighted in the introduction, there are a number ofreasons why we may wish to do this. Firstly, in order to
make predictions of climate change in which uncertainties
in modelling processes are quantified, part of the algorithmrequires the assignment of relative likelihoods to different
models or different model versions. This, together with all
the other ingredients in the Bayesian approach (see intro-duction and Murphy et al. 2007, 2009) are used to produce
weighted probability distribution functions of future
change. Secondly, to improve models we need to knowhow to target research to do this, i.e., by quantifying the
relationship between error and climate feedback, we may
learn which improvements to different aspects of the modelsimulations will lead to the most progress in reducing
uncertainty in predictions.
Scatter plots of biases and root-mean-squared errors inindividual variables against the total strength of the feed-
backs under climate change (Fig. 10) are a first step in
examining such relationships. In doing this we might hopeto uncover simple leading-order correlations between
model errors and feedback strengths. Unfortunately, it is
clear from Fig. 10 that no such simple relationships existover all model versions in the multi-model and perturbed
physics experiments examined here. The best linear cor-
relations are found between the feedback parameter andvariables such as biases in net TOA fluxes, total outgoing
SW and SW cloud radiative forcing, with correlation
coefficients around 0.7 in the case of all the perturbedphysics experiments, but no similar correlations for the
multi-model members. The only variable in which there is
a reasonably high correlation between errors and feedbacksin both perturbed physics and multi model ensembles are
the biases in the global mean cloud amount (coefficients
around 0.6–0.7, see also Yokohata et al. 2010). Neverthe-less, for the perturbed physics ensembles there are weak to
moderately strong correlations for a number of variables
suggesting that the combination of those (and other) vari-ables into a single metric would be a way of constraining
the climate feedback parameter. In order to do this, we
must take into account both errors in observational fieldsand covariances between errors in different variables.
Reducing the degrees of freedom in such a calculation is
important and projection onto a multivariate EOF space, asdone in Piani et al. (2005) and Murphy et al. (2009), is one
way of doing this. For the multi-model ensembles, there are
much fewer weak-to-moderate correlations. One possiblereason for this is that relationships are weakened in the
model development processes when models are modified,
for example, to achieve net TOA flux balance.This lack of strong relationships for single variables is
obvious in retrospect since if there were such a clear
coupling between the errors in the present day simulationof a single variable and climate sensitivity, then this would
have probably have been discovered through simplephysical arguments and/or mechanistic studies. As has
already been pointed out in a number of studies (Min et al.
2007; Sanderson et al. 2008), it is not possible to stronglyconstrain predictions of even global mean climate change
using constraints provided from single observed fields
using simple metrics of time averaged fields (e.g. Knuttiet al. 2006). Providing constraints on regional change may
be even more challenging. Multivariate techniques are
required in which the constraint is extracted from themodel and observed data using potentially rather complex
statistical techniques. The unfortunate upshot of this is that
it becomes difficult to understand how a multivariateconstraint operates using simple physical arguments.
We leave the development of complex constraints on
climate predictions using the ensemble experimentsdescribed here to other papers which use statistical tools
such as emulators (e.g. Rougier et al. 2009) and other
aspects of the Bayesian approach (introduction and Murphyet al. 2009) which require considerable explanation.
However, we should note that such endeavours are unlikely
to be either simple to understand, simple to describe orsimple to implement (see also Knutti et al. 2010). The
combination of data from model simulations with obser-
vations to produce predictions of climate change in whichuncertainties are quantified is likely to involve a level of
complexity on a par with the development of numerical
climate models themselves, or the subject of data assimi-lation in initial-value weather and climate prediction.
6 Discussion and conclusions
We have performed a comparison of various characteristicsof perturbed physics ensembles performed with the third
version of the Hadley Centre climate model with those
collected as part of the CMIP3 and CFMIP projects. Wefind the following:
1. The perturbed physics approach can sample a widerange of different model ‘‘errors’’ in two-dimensional
time-averaged climate fields for a number of different
variables that for many variables are comparable withuncertainties in the observations and comparable with
1762 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
the errors in the members of the multi-model archive.
The degree of sampling of errors in climatological
fields in the perturbed physics ensembles studied hereis dependent on the algorithm used for selecting the
values of the perturbed parameters.
2. The general situation for the perturbed physicsensembles is that more of the total error is contributed
from the systematic component than the random
component, that is the ratio of the errors which arecommon to all model versions to the errors that are
unique to particular model versions is greater than
unity. However, depending on the experimental designof the perturbed physics ensemble, the ratio of
systematic to random components of the distributions
of model errors can be controlled in order to mimic thebehaviour of the multi-model case in which the
random component tends to be of the same order of
magnitude or larger than the systematic component.Thus, it is possible to produce quite different baseline
climates with the perturbed physics approach such that
the ensemble-mean appears as the ‘‘best’’ model incomparison with any individual ensemble member.
3. The perturbed physics approach can sample a wide
range of global-mean feedbacks under climate change.With the experiments examined here, both the SW and
LW components of cloud feedbacks are, at first
inspection responsible for the major component ofthe feedback uncertainty in the perturbed physics case
while it is the SW only which is dominant in the multi-
model case. However, it is likely that there is aregional cancellation between LW and SW feedbacks
and that it is the SW feedbacks associated with low-
clouds that are the dominate driver of uncertainty inboth types of ensemble (see also Yokohata et al. 2010).
Perturbing ocean parameters however results in very
little spread in measures of the rate of ocean heatuptake in the forcing scenarios examined.
4. Using a simple method to compute radiative forcing
under past and future SRES A1B conditions, perturb-ing the parameters in the physical component of the
model is sufficient to generate some spread in the
radiative forcing. For the case of volcanic forcing andfor the combined natural and anthropogenic historical
forcing, this is of the same order of magnitude of that
seen in the multi-model case despite the use of acommon set of forcing input fields in the perturbed
physics case. For the future forcing, where signal-to-
noise ratios are higher, there is more spread in theCMIP3 multi-model ensemble presumably because of
the use of different input forcing fields in thatensemble.
5. There are no simple emergent relationships between
the gross-measures of model error used here and the
global climate-change feedbacks which could be
simply employed to constrain predictions of future
climate change. Techniques to make ‘‘climate con-straints’’ are inevitably complex and multivariate.
Note that the above conclusions related to integrated
global measures of errors, forcings and feedbacks. Forregional measures, and for variables not examined here
such as variability or extremes, there may be differences
between perturbed physics and multi model ensembleswhich do not fit with these general conclusions. It remains
a challenge to produce regional projections of climate
change and this we leave to future research.What are the desirable characteristics of an ensemble of
models used to quantify uncertainties in predictions of
climate change? Firstly we should seek to minimise thesystematic component of model error (here simply defined
as the ensemble mean) by using a model structure which
is, what we might call, well specified. That is, a modelstructure which satisfies the rigorous standards of climate
modelling in terms of conservation and even coding
practices, and in which we have a good chance ofachieving a low systematic error. Using that structure, we
should then generate ensemble members which are both
consistent with the relatively large uncertainty in theobserved fields we use in our multivariate definitions of
metrics of fidelity and which exhibit a wide range offeedbacks and spatial patterns of climate change. Should
the distribution of model errors measured against observed
climatologies, variability and trends in such an ensembleexhibit the ‘‘ensemble mean is the best’’ characteristics as
found in so many other modelling and forecast applica-
tions? Annan and Hargreaves (2010) discuss this issue.For very practical reasons, we might also wish to design
ensembles in a way which aids in the fitting of model
emulators (e.g. Rougier et al. 2009) to the ensemble inorder to produce probabilistic estimates of climate change
for policy makers.
While the above definition sounds sensible, there aresome aspects of ensemble design which are difficult to
achieve and measure. What is a ‘‘wide range of feedbacks’’
for example? It is tempting to always compare the per-turbed physics ensembles with the multi-model ensembles
with the latter the implied benchmark with which the for-
mer is to be measured. Yet, of course, the multi-modelensemble has in no way been systematically designed to be
an adequate sample of all possible models one could for-
mulate, and moreover, it might be that the process of‘‘tuning’’ to replicate certain basic aspects of historical
climate (notably planetary radiation balance) might result
in an unrealistically narrow spread of future climate changeresponses which does not fully reflect the full implications
of uncertainties in the many detailed individual processes
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1763
123
included in the models. Perturbed physics ensembles pro-
duced with different model structures may shed furtherlight on these issues (e.g. Yokohata et al. 2010). Ongoing
model development should result in better specified models
which also may produce different behaviour when used toproduce perturbed physics ensembles. In our companion
work on producing probabilistic climate change projec-
tions, we combine perturbed physics and multi-modelensemble information together with observations and esti-
mates of uncertainty in observations to produce projectsbased on as much information about the climate system as
is possible (Murphy et al. 2007, 2009).
Acknowledgments This work was supported by the Joint DECCand Defra Integrated Climate Programme—DECC/Defra (GA01101)and by the European Community ENSEMBLES (GOCE-CT-2003-505539). Hugo Lambert made useful comments on an earlier versionof the manuscript and we thank three anonymous reviewers for theircomments.
References
Ackerley D, Highwood EJ, Frame D, Booth BBB (2009) Changes inthe global sulfate burden due to perturbations in global CO2
concentrations. J Clim 20:5421–5432Adler RF et al (2003) The Version 2 global precipitation climatology
project (GPCP) Monthly precipitation analysis (1979-Present).J Hydrometeorol 4:1147–1167
Allan RJ, Ansell TJ (2006) A new globally complete monthlyhistorical mean sea level pressure data set (HadSLP2): 1850–2004. J Clim 19:5816–5842
Allen MR, Kettleborough J, Stainforth DA (2002) Model error inweather and climate forecasting. In: Proceedings of the ECMWFseminar series. http://www.ecmwf.int
Annan JD, Hargreaves JC (2010) Reliability of the CMIP3 ensemble.Geophys Res Lett 37:L02703. doi:10.1029/2009GL041994
Annan JD, Hargreaves JC, Ohgaito R, Abe-Ouchi A, Emori S (2005)Efficiently constraining climate sensitivity with ensembles ofPaleoclimate simulations. Sci On-line Lett Atmos 1:181–184
Aumann HH et al (2003) AIRS/AMSU/HSB on the Aqua mission:design, science objectives, data products, and processingsystems. IEEE Trans Geosci Remote Sens 41:253–264
Barnett DN, Brown SJ, Murphy JM, Sexton DMH, Webb MJ (2006)Quantifying uncertainty in changes in extreme event frequencyin response to doubled CO2 using a large ensemble of GCMsimulations. Clim Dyn 26:489–511
Boer G, Yu B (2003) Climate sensitivity and response. Clim Dyn20:415–429
Brierley CM, Thorpe AJ, Collins M (2009) An example of thedependence of the transient climate response on the temperatureof the modelled climate state. Atmos Sci Lett 10:23–28
Brierley CM, Collins M, Thorpe AJ (2010) The impact of perturba-tions to ocean-model parameters on climate and climate changein a coupled model. Clim Dyn 34:325–343
Brohan P, Kennedy JJ, Harris I, Tett SFB, Jones PD (2006)Uncertainty estimates in regional and global observed temper-ature changes: a new dataset from 1850. J Geophys Res111:D12106. doi:10.1029/2005JD006548
Cess RD et al (1990) Intercomparison and interpretation of climatefeedback processes in 19 atmospheric general circulationmodels. J Geophys Res 95:16601–16615
Collins WV (2006) Radiative forcing by well-mixed greenhousegases: Estimates from climate models in the IPCC AR4.J Geophys Res 111:D14317. doi:10.1029/2005JD006713
Collins M (2007) Ensembles and probabilities: a new era in theprediction of climate change. Philos Trans R Soc Lond A365:1957–1970
Collins M, Booth BBB, Harris GR, Murphy JM, Sexton DMH, WebbMJ (2006) Towards quantifying uncertainty in transient climatechange. Clim Dyn 27:127–147
Collins M, Brierley CM, MacVean M, Booth BBB, Harris GR (2007)The sensitivity of the rate of transient climate change to oceanphysics perturbations. J Clim 20:2315–2320
Colman RA (2003) A comparison of climate feedbacks in generalcirculation models. Clim Dyn 20:865–873
Da Silva A, Young C, Levitus S (1994) Atlas of surface marine data1994, volume 1: algorithms and procedures. NOAA AtlasNESDIS 6. US Department of Commerce, Washington
Dijkstra HA, Neelin JD (1999) Imperfections of the thermohalinecirculation: multiple equilibria and flux correction. J Clim12:1382–1392
Forest CE, Stone PH, Sokolov AP (2006) Estimated PDFs of climatesystem properties including natural and anthropogenic forcings.Geophys Res Lett 33:L01705
Forster PMdeF, Taylor KE (2006) Climate forcings and climatesensitivities diagnosed from coupled climate model integrations.J Clim 19:6181–6194
Forster PMdeF et al (2007) Changes in atmospheric constituents andin radiative forcing. In: Solomon S, Qin D, Manning M, ChenZ, Marquis M, Averyt KB, Tignor M, Miller HL (eds) Climatechange 2007: the physical science basis. Contribution ofworking Group I to the fourth assessment report of theintergovernmental panel on climate change. Cambridge Uni-versity Press, Cambridge, United Kingdom and New York, NY,USA
Frame DJ et al (2009) The climateprediction.net BBC climate changeexperiment part 1: design of the coupled model ensemble. PhilosTrans R Soc Lond A 367:855–870
Gleckler PJ, Taylor KE, Doutriaux C (2008) Performance metricsfor climate models. J Geophys Res 113:D06104. doi:10.1029/2007JD008972
Gordon CC et al (2000) The simulation of SST, sea ice extents andocean heat transport in a version of the Hadley Centre coupledmodel without flux adjustments. Clim Dyn 16:147–168
Gregory JM, Webb MJ (2008) Tropospheric adjustment induces acloud component in CO2 forcing. J Clim 21:58–71
Gregory JM et al (2004) A new method for diagnosing radiativeforcing and climate sensitivity. Geophys Res Lett 31:L03205
Grist JP, Josey SA (2003) Inverse analysis adjustment of the SOC air–sea flux climatology using ocean heat transport constraints.J Clim 20:3274–3295
Hagedorn R, Doblas-Reyes FJ, Palmer TN (2005) The rationalebehind the success of multimodel ensembles in seasonalforecasting. Part I. Basic concept. Tellus 57:219–233
Hansen J, Ruedy R, Sato M, Reynolds R (1996) Global surface airtemperature in 1995: return to pre-Pinatubo level. Geophys ResLett 23:1665–1668
Harris GR, Sexton DMH, Booth BBB, Collins M, Murphy JM, WebbMJ (2006) Frequency distributions of transient regional climatechange from perturbed physics ensembles of general circulationmodel simulations. Clim Dyn 27:357–375
Harrison EF, Minnis P, Barkstrom BR, Ramanathan V, Cess R,Gibson CG (1990) Seasonal variation of cloud radiative forcingderived from the Earth Radiation Budget Experiment. J GeophysRes 95:687–703
Held IM, Soden BJ (2006) Robust responses of the hydrological cycleto global warming. J Clim 19:5686–5699
1764 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123
Hibbard KA, Meehl GA, Cox PM, Friedlingsten P (2007) A strategyfor climate change stabilization experiments. EOS 88:20. doi:10.1029/2007EO200002
Huntingford C, Cox PM (2000) An analogue model to deriveadditional climate change scenarios from existing GCM simu-lations. Clim Dyn 16:575–586
Jackson CS, Sen MK, Huerta G, Deng Y, Bowman KP (2008) Errorreduction and convergence in climate prediction. J Clim21:6698–6709
Jones A, Roberts DL, Woodage MJ, Johnson CE (2001) Indirectsulphate aerosol forcing in a climate model with an interactivesulphur cycle. J Geophys Res 106:20293–20310
Joshi MM, Gregory JM, Webb MJ, Sexton DMH, Johns TC (2008)Mechanisms for the land/sea warming exhibited by simulationsof climate change. Clim Dyn 30:455–465
Jun M, Knutti R, Nychka DW (2008) Spatial analysis to quantifynumerical model bias and dependence: how many climatemodels are there? J Am Stat Assoc Appl Case Stud 103:934–947
Knutti R, Meehl GA, Allen MR, Stainforth DA (2006) Constrainingclimate sensitivity from the seasonal cycle in surface tempera-ture. J Clim 19:4224–4233
Knutti R, Furrer R, Tebaldi C, Cernak J, Meehl GA (2010) Challengesin combining projections from multiple climate models. J Clim(in press)
Lambert SJ, Boer HJ (2001) CMIP1 evaluation and intercomparisonof coupled climate models. Clim Dyn 17:83–106
Lambert FH, Chiang JCH (2007) Control of land–ocean temperaturecontrast by ocean heat uptake. Geophys Res Lett 34:L13704
Legates DR, Willmott CJ (1990) Mean seasonal and spatial variabilityin global surface air temperature. Theor Appl Climatol 41:11–21
McKay MD, Conover WJ, Beckman RJ (1979) A comparison of threemethods for selecting values of input variables in the analysis ofoutput from a computer code. Technometrics 21:239–245
Meehl GA, Stocker T et al (2007a) Global climate projections. I.Climate Change 2007: the physical science basis. In: Solomon S,Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M,Miller HL (eds) Contribution of working Group I to the fourthassessment report of the intergovernmental panel on climatechange. Cambridge University Press, Cambridge, United King-dom and New York, NY, USA
Meehl GA et al (2007b) The WCRP CMIP3 multimodel dataset: anew era in climate change research. Bull Am Meteorol Soc88:1383–1394
Min SK, Simonis D, Hense A (2007) Probabilistic climate changepredictions applying Bayesian model averaging. Philos Trans RSoc Lond A 365:2103–2116
Molteni F, Buizza R, Palmer TN, Petroliagis T (2006) The ECMWFensemble prediction system: methodology and validation. QuartJ Roy Meteorol Soc 122:73–119
Moore B, Gates WL, Mata LJ, Underdal A (2001) Advancing ourunderstanding. In: Houghton JT, Ding Y, Griggs DJ, Noguer M,van der Linden PJ, Dai X, Maskell K, Johnson CA (eds) Climatechange 2001: the scientific basis. Contribution of working GroupI to the third assessment report of the intergovernmental panel onclimate change, Cambridge University Press
Murphy JM (1995) Transient response of the Hadley Centre coupledocean–atmosphere model to increasing carbon dioxide. Part III.Analysis of global mean response using simple models. J Clim8:496–514
Murphy JM, Sexton DMH, Barnett DN, Jones GS, Webb MJ, CollinsM, Stainforth DA (2004) Quantification of modelling uncertain-ties in a large ensemble of climate change simulations. Nature430:768–772
Murphy JM, Booth BBB, Collins M, Harris GR, Sexton D, Webb MJ(2007) A methodology for probabilistic predictions of regional
climate change from perturbed physics ensembles. Philos TransR Soc Lond A 365:1993–2028
Murphy JM, Sexton DMH, Jenkins G, Boorman P, Booth BBB,Brown K, Clark R, Collins M, Harris GR, Kendon E (2009)Climate change projections. ISBN 978-1-906360-02-3
Myhre G, Highwood EJ, Shine KP, Stordal F (1998) New estimates ofradiative forcing due to well mixed greenhouse gases. GeophysRes Lett 25(14):2715–2718. doi:10.1029/98GL01908
Niehorster F, Spangehl T, Fast I, Cubasch U (2006) Quantification ofmodel uncertainties: parameter sensitivities of the coupled modelECHO-G with middle atmosphere. Geophys Res Abs 8, EGU06-A-08526
Piani C, Frame DJ, Stainforth DA, Allen MR (2005) Constraints onclimate change from a multi-thousand member ensemble ofsimulations. Geophys Res Lett 32:L23825. doi:10.1029/2005GL024452
Pope VD, Gallani ML, Rowntree PR, Stratton RA (2000) The impactpf new physical parametrizations in the Hadley Centre climatemodel-HadAM3. Clim Dyn 16:123–146
Raper SCB, Gregory JM, Stouffer RJ (2002) The role of climatesensitivity and ocean heat uptake on AOGCM transient temper-ature response. J Clim 15:124–130
Rayner NA et al (2003) Global analyses of sea surface temperature,sea ice, and night marine air temperature since the latenineteenth century. J Geophys Res 108, D14, 4407. doi:10.1029/2002JD002670
Reichler T, Kim J (2008) How well do climate models simulatetoday’s climate? Bull Am Meteorol Soc 89:303–311
Rossow WB, Walker AW, Beuschel DE, Roiter MD (1996)International Satellite Cloud Climatology Project (ISCCP)documentation of new cloud datasets World MeteorologicalOrganisation WMO/TD 737, pp 115
Rougier JC (2007) Probabilistic inference for future climate using anensemble of climate model evaluations. Clim Change 81:247–264
Rougier JC, Sexton DMH, Murphy JM, Stainforth DA (2009)Analysing the climate sensitivity of the HadSM3 climate modelusing ensembles from different but related experiments. J Clim22:3540–3557
Sanderson BM, Piani C (2007) Towards constraining climatesensitivity by linear analysis of feedback patterns in thousandsof perturbed-physics GCM simulations. Clim Dyn 30:175–190
Sanderson BM et al (2008) Constraints on model response togreenhouse gas forcing and the role of subgrid-scale processes.J Clim 21:2384–2400
Sato M, Hansen JE, McCormick MP, Pollack JB (1993) Stratosphericaerosol optical depths (1850–1990). J Geophys Res 98:22987–22994
Schneider von Deimling T, Held H, Ganopolski A, Rahmstorf S(2006) Climate sensitivity estimated from ensemble simulationsof glacial climates. Clim Dyn 27:149–163
Senior CA, Mitchell JFB (2000) The time dependence of climatesensitivity. Geophys Res Lett 27:2685–2688
Smith TM, Reynolds RW (2004) Improved extended reconstructionof SST (1854–1997). J Clim 17:2466–2477
Soden BJ, Held IM (2006) An assessment of climate feedbacks incoupled ocean–atmosphere models. J Clim 19:3354–3360
Soden BJ, Broccoli AJ, Hemler RS (2004) On the use of cloud forcingto estimate cloud feedback. J Clim 17(19):3661–3665
Sokolov AP et al (2009) Probabilistic forecast for 21st centuryclimate based on uncertainties in emissions (without policy) andclimate parameters. J Clim 22:5175–5204
Stainforth DA et al (2005) Uncertainty in predictions of the climateresponse to rising levels of greenhouse gases. Nature 433:403–406
M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics 1765
123
Stocker TF (2004) Climate change: models change their tune. Nature430:737–738
Stott PA, Forest CE (2007) Ensemble climate predictions usingclimate models and observational constraints. Philos Trans RSoc Lond A 365:2029–2052
Sutton RT, Dong B-W, Gregory JM (2007) Land/sea warming ratio inresponse to climate change: IPCC AR4 model results andcomparison with observations. Geophys Res Lett 34:L02701
Taylor KE (2001) Summarizing multiple aspects of model perfor-mance in a single diagram. J Geophys Res 106:7183–7192
Taylor KE, Crucifix M, Doutriaux C, Broccoli AJ, Mitchell JFB,Webb MJ (2007) Estimating shortwave radiative forcing andresponse in climate models. J Clim 20:2530–2543
Tziperman E, Toggweiler JR, Feliks Y, Bryan K (1994) Instability ofthe thermohaline circulation with respect to mixed boundaryconditions: is it really a problem for realistic models? J PhysOceanogr 24:217–232
Uppala SM et al (2005) The ERA-40 re-analysis. Quart J RoyMeteorol Soc 131:2961–3012
Webb MJ et al (2006) On the contribution of local feedbackmechanisms to the range of climate sensitivity in two GCMensembles. Clim Dyn 27:17–38
Webster MD et al (2002) Uncertainty in emissions projections forclimate models. Atmos Environ 36:3659–3670
Wielicki BA, Barkstrom BR, Harrison EF, Lee RB III, Louis SmithG, Cooper JE (1996) Clouds and the Earth’s Radiant EnergySystem (CERES): an earth observing system experiment. BullAm Meteorol Soc 77:853–868
Wylie DP, Menzel WP, Woolf HM, Strabala KI (1994) Four years ofglobal cirrus cloud statistics using HIRS. J Clim 7:1972–1986
Xie P, Arkin PA (1997) Global precipitation: a 17-year monthlyanalysis based on gauge observations, satellite estimates, andnumerical model outputs. Bull Am Meteorol Soc 78:2539–2558
Yokohata T et al (2008) Comparison of equilibrium and transientresponses to CO2 increase in eight state-of-the-art climatemodels. Tellus 60:946–961
Yokohata T, Webb MJ, Collins M, Williams KD, Yoshimori M,Hargreaves JC, Annan JD (2010) Structural similarities anddifferences in climate responses to CO2 increase between twoperturbed physics ensembles. J Clim 23(6):1392–1410
Zhang MH, Cess RD, Hack JJ, Kiehl JT (1994) Diagnostic study ofclimate feedback processes in atmospheric GCMs. J GeophysRes 99:5525–5537
1766 M. Collins et al.: Climate model errors, feedbacks and forcings: a comparison of perturbed physics
123