Good practice for the usage of climate model simulation results - a discussion paper

This paper presents guidelines and examples of good practice for the usage of climate model simulation results and some rationale for their application. These guidelines are relevant to climate modellers as well as to climate impact modellers and users of direct climate model output, e.g., for decision support. The topics covered here encompass general information on climate model data as well as recommendations for their use, interpretation, and presentation. This includes subjects such as definition of ‘climate projection’ versus ‘climate forecast’, recommendations for the application of scenarios, temporal and spatial resolution, reference periods, treatment of model biases and significance, treatment of different model generations, and optimal use of colour selection and scaling. Special attention is given to results from multiple simulations (ensembles), as evidence is mounting that there is a need to take ensemble results into account for decision making. The paper represents the view of an ongoing discussion of German federal and state environmental agencies in a semi-annual meeting series and aims at framing a set of minimum requirements and prerequisites for climate impact projects and decision support. Thus, the recommendations we give are under constant further development and we don’t claim completeness. However, since we frequently are asked to share out our discussion results to other user groups we herewith provide some well discussed topics and hope to improve on the communication between the climate modellers and the users of climate model results.


Background
There is a major challenge in the communication between 'producers' a and 'users' b of climate information: How can complex physical information be presented to users with little time and little physical background? This challenge is heightened (i) by the users' expectations with respect to the accuracy of the information delivered and (ii) by the producers' tendency to use a community-immanent 'lingo' . Moreover the producers of climate information often use lots of mathematical expressions (see, e.g., Kundzewicz et al, (2008)). Some recent publications deal with an update on semantic and communication issues. Knutti et al. (2010), e.g., present a framework of good practices with respect to the usage of multi-model ensembles as well as the interpretation of ensemble results. Concepts and the verbalization of uncertainties is the *Correspondence: frank.kreienkamp@cec-potsdam.de 1 Climate Environment Consulting Potsdam GmbH, Potsdam, Germany Full list of author information is available at the end of the article focus of Mastrandrea et al. (2010) and the issue of a changing climate variability which is particularly difficult to communicate is treated in Hawkins (2011). The need for agreeing on good practices has been acknowledged c for a number of reasons, such as: • They are crucial for an effective decision support for policy makers, economy and society. • Planning of adaptation measures and strategies to avoid climate impacts strongly rely on credible input. • The potential for misleading conclusion which might be disseminated in follow-up publications needs to be minimized.
Consequently it is assumed that a number of groups will benefit from the implementation of good practices. Policy makers and decision makers may be able to better assess uncertainties based on an input that was processed according to good practices standards. Funding sources may be using it as a source for a catalogue of minimum requirements. Those who apply climate model data, e.g., http://www.environmentalsystemsresearch.com/content/1/1/9 in impact models may find advice on characteristics of the data they are using as well as guidance with respect to the graphical aspects of communicating their results.
The producer/user interface is crucial, e.g., for devising strategies on climate change mitigation and adaptation. The involvement of users in the formulation of the 2-degree target to constrain climate change (cf. Hare and Meinshausen (2004); Meinshausen et al. (2009) or Randalls (2010)) is a milestone since it originates in users' needs and subsequently defines the bounding conditions.
With respect to devising and communicating good practises it is the perception of the authors that there are only a few international fora in which the players on the regional scale, such as relevant agencies, and 'hands-on researchers' can advance this communication. One such forum is the series of Communication and Education Sessions at the Annual Meetings of the European Meteorological Society (EMS). There we presented the German discussion and were subsequently asked for a paper which is helpful for different user groups. We follow the request with this discussion paper.

Uncertainties and decision making
In order to improve the usability of whatever input is produced for decision makers, the information needs to be processed in an analytic as well as 'experimental' way (Marx et al., 2007), i.e., taking into account personal experience of the user. Moreover, all processes of decision making have their own dynamics. Therefore the information has to be arranged and presented using terminology without ambiguity (cf. Section Terminology as well as clear, intuitive graphics (cf. Section Recommendation on the presentation of climate model data).
The complicated and complex process of decision making is complicated further by the kind of input information. Scientific information -in the context of climate change projections -it not definitive but contains uncertainty d . This adds to the decision makers' own uncertainty with respect to assessing the input's usability. For decision support additional contextual information for framing uncertainties is required in order to put decisions on a robust base. The complexity of the decision making processes is further increased because, ultimately, all decision are local whereas their impact is global (Breakwell, 2010).

Model vs. Data?
According to Edwards (2010) (Preface, page xiii) without models there are no data, meaning that whatever datasuch as observations or climate projections -there are, they exist because of processing through "data models". Direct climate observations are only available at certain points, i.e., measurement stations, and to derive continuous data in space and time these data are processed using models. Additionally, climate models are "data laden", i.e., they are sensitive to calibration procedures. These depend on the quality of the data (measured, then subjected to a data model) to which the calibration is performed (Parker, 2011). The shifting composition of the input is one source of uncertainty.
The process of turning the mathematical algorithms necessary to describe the atmosphere system into numerical algorithms fit to be running on computers adds further uncertainties, since there are issues of consistency, matchability, discretizing and particularities of numerical methods. Extensive overviews can be found in Müller (2010) and Gramelsberger (2011). This must be borne in mind when addressing the achievable quality of model output and devise recommendations on usage, interpretation and presentation of model results.

Devising recommendations for good practices -an ongoing process
The quest for good practices is an ongoing process of which the IPCC Assessment Reports, e.g., IPCC (2001a;2001b) or IPCC (2007) have been exhibiting a growing degree of awareness. Carter et al. (2007) comprehensively deals with overall quality and good practice issues with respect to vulnerability and adaptation assessments. En route to the IPCC's 5 th Report an Expert Meeting on Assessing and Combining Multi Model Climate Projections took place in Boulder, CO, in January 2010 (Stocker et al. (Eds), 2010b) where the Good Practice Guidance Paper  was consolidated. It provides a treatment of numerous relevant issues including sets of recommendations for (i) ensembles, (ii) model evaluation, (iii) model selection, averaging and weighting, (iv) reproducibility and (v) regional assessments. Of particular importance when building ensembles which aim at the description of regional climate change is the correction of biases which occur in the individual participating models (see, i.e., Christensen et al. (2008)).
Implementing the aforementioned mitigation and adaptation strategies is in the responsibility of regional agencies. In Germany, these are federal and state agencies, dealing with environment, ecology, land use, agriculture or consumer protection -see the Acknowledgements section of this paper. There is a meeting series initiated by members of the respective agencies of the German Federal States which addresses the inter-agency information exchange as well as the co-ordination of mitigation and adaptation activities. It involves expertise from a range of sources, including the German Weather Service, the German Federal Environment Agency and private companies. Early on, it became clear that the overwhelming amount of data from the climate modellers needs to be structured and 'translated' to improve its usability for climate impact research and policy advice. Furthermore, the http://www.environmentalsystemsresearch.com/content/1/1/9 interpretation itself and the choice of interpretation aids, such as viewgraphs, is discussed in this meeting series. Here, a strong need for advice and recommendations has been identified. A related paper is, e.g., Spekat and Kreienkamp (2007), where examples of good practices in conveying graphical information are given.
Among the aims of the inter-agency meeting series is the establishment of minimum requirements that foster the comparability of project results. Moreover, thought is given to the harmonization of the way in which relevant information is presented. This has been achieved by founding an informal working group Interpretation of regional climate scenario results. Among its activities is the drafting of such a set of recommendations e which form the basis of this paper. In the ongoing discussion process this topic is scheduled for re-evaluation twice a year.

Structure of this paper
The paper is structured as follows: Section General information on climate model data deals with general and necessary information on climate model data, Section Recommendation on the use of climate model data includes recommendations for the use of climate model data, Section Recommendation on the interpretation of climate model data gives recommendations for the interpretation of climate model data and recommendations for the presentation of the results are summarized in Section Recommendation on the presentation of climate model data. Conclusions are presented in Section That's that?

Terminology
A few fundamental and recurring terms of this paper are defined below.

GCM
Global Climate Model or Global Circulation Model is a time-dependent numerical model of the global atmosphere, oceans, land surface and ice. In it, physical processes of the global climate system are represented in a numerical way. Characteristic GCM grid sizes are on the order 250 × 250 km.
It is a common practise to describe the model's resolution using the terminology Txxx, e.g., T159. This refers to the concept of spectral modelling (McGuffie and Henderson-Sellers, 2005). The atmospheric features are decomposed into harmonics and a truncation at a certain wave number Txxx is applied, i.e., the higher the wave number the higher will be the degree of detail. In T159 resolution, e.g., the model will resolve features with a size of 1/159th of the earth's circumference, which can be transformed into a resolution (at the equator) that is either expressed in km or in degrees.

Downscaling
Also called regionalization. According to Benestad et al. (2008) it is the process of making the link between the state of some variable representing a large space [. . .] and the state of some variable representing a much smaller space [. . . ]. One downscaling approach is the application of an RCM in the area of interest (dynamical downscaling); an alternative approach is by way of statistical methods.

RCM
Regional Climate Model is a high-resolution version of a GCM, e.g., with a common grid size of 25 × 25 km. It is run in a limited spatial domain; a GCM is employed for information outside that domain and as driving conditions at the lateral borders.

ESD
Empirical Statistical Downscaling is a generic term for methods that derive statistical linkages between large scale climate information and the local climate.

Model cascade
Two definitions occur frequently, the first of which applies to the topic of this paper. (i) As described above there are different strategies to achieve downscaling. They involve chains or cascades of models of different resolution (dynamical, statistical, or both). (ii) In the terminology of the IPCC, a chain or cascade of different kinds of models is employed when devising emission scenarios: Building upon model assumptions of economic development, a model transfers this information into greenhouse gas (GHG) emissions which are converted into atmospheric GHG concentrations using a carbon cycle model and consequently determine the radiative forcing (see definition below). This constitutes the input to a GCM that may then be used to force an RCM. The results of the latter are ultimately used as driving data for, e.g., climate impact models.

Radiative forcing
The net radiant flux density is a core property governing the radiative and thermal equilibrium of the atmosphere. Changes in atmospheric constituents, such as greenhouse gases, can perturb this flux density. In a GCM the changing atmospheric conditions (historical, present day, or possible future) are expressed as changes of radiative forcing.

Climate change signal
The difference between the value of a climate property at a time interval t and a later time interval t + t. Only a change that exceeds certain threshold boundaries (depending on the climate property considered) constitutes a signal. These threshold boundaries are determined, http://www.environmentalsystemsresearch.com/content/1/1/9 e.g., from the climate variability of this property at time interval t.

Climate variability
The variation in time of the climate system around a mean state. The most frequently used time scale for this variability ranges from months to years to decades. The term natural climate variability is used for the portion which is not attributable to human activity.

Scenario
A plausible story line that describes the political, economic and ecological condition and/or the development of a future global society. Scenarios provide the basis for calculating GHG emissions (SRES, Nakićenović et al. (2000)) or GHG concentrations (RCP, Moss et al. (2008)) and are thus the central assumption for any climate change projection (cf. radiative forcing).

Projection
A description of a possible and plausible future of the climate system including the pathway leading to it. Those descriptions are usually model-derived.

Forecast
A deterministic description of the future state of the climate system which can be verified with what really occurs at that point in time. This future state has to be a most likely one. For the time t it is based on knowledge at time t − 1. Forecasts are usually restricted to weather prediction time scales. Current research aims at forecasting time scales of several months to a few decades (Keenlyside and Ba, 2010).

Data types
There are four types of model-derived atmospheric data which need to be considered.
1. Reanalyses: These data represent a homogenized, long-term climatology of the observed atmosphere in three dimensions. In essence they rebuild climate statistics from daily weather data (Edwards, 2010), using an assimilation system that consistently employs the same model of the atmosphere for the whole time series available (instead of changing models used over time during the collection of the data). Examples are the NCAR/NCEP data set (Kalnay et al., 1996) on the one hand which has been calculated using a horizontal T62 resolution f and a vertical resolution of 28 atmospheric layers for the years 1948 to the present. The ERA40 data set  and Kållberg et al. (2005)) has been produced using a much higher (T159) horizontal resolution and 60 vertical layers but is fixed to a time frame of 1957-2002. The ERA-INTERIM Reanalyses project (Dee et al., 2011) using T255 resolution in 60 vertical layers has been producing data for the period 1979-2010. It is considered the state-of-the-art for reanalysis-driven simulations of Regional Climate Models (RCM). 2. 20C: These runs of the Global Climate Models (GCM) aim to statistically reproduce the climate conditions of the 20th century. They are driven by all or a selection of (i) natural climate drivers: volcano eruptions and solar variability and (ii) anthropogenic climate drivers: aerosol contents and the equivalent radiative forcing due to GHG emissions as they occurred during that time. Comparing Reanalyses and 20C data is necessary to assess the climate models' offset (bias) with respect to what actually occurred (Christensen et al., 2008). When a cascade (cf. Section Model cascades) of GCM→RCM is used, additional 20C information for both model types has to be considered. 3. 20C-Reanalysis driven: RCM-simulations can be driven with Reanalysis data, to assess the model performance, i.e., the ability to reproduce the observed climate. For determining climate signals, however, it is necessary to use the GCM-driven 20C RCM runs for comparison with future scenario runs to avoid including any GCM-borne errors into the climate signal analysis. 4. Future climate projections: GCM simulation results under the assumption of a certain scenario of the future temporal evolution of GHG emissions or concentrations to assess the climate system's response to these changing conditions. Climate signals can be determined by computing differences between future (scenario) and current (20C) time intervals. Note: To determine climate signals it is paramount to compare 20C simulations with scenario simulations driven by the same GCM.

Model cascades
The different types of regional models (RCM or ESD) enable different routes or cascades to provide the regionalizations.
• GCM directly: A large scale overview of expected climate signals can be determined from extracting that information from GCM results. • GCM →RCM or GCM →ESD : In this case the GCM serves as external forcing at the lateral boundaries of a nested window if a RCM is considered or it serves as the source for climatological information upon which a ESD builds its transfer functions. • GCM →RCM →ESD : In this case the basis for generating the transfer function of the ESD are results of a dynamical downscaling using an RCM. http://www.environmentalsystemsresearch.com/content/1/1/9

Ensembles
Since there is no single 'optimal' model, the analysis of a number of simulations (ensemble) is a scientifically suitable strategy to describe the range of climate changes that can be expected. It has been empirically indicated that a joint application of several models is able to represent reality better than each individual model, see (Zhang et al. 2007), Reichler and Kim (2008) or Gleckler et al.
(2008) g . Therefore it appears permissible to consider an information, e.g., the magnitude of the climate signal of a certain climate parameter, that draws from many model simulations for the present (20C) and a future scenario (e.g., SRES A1B) a probable consequence of that scenario. Whether the ensemble members need to be weighted or should be treated with equal weight is a matter of ongoing research -see, e.g., Tab. 6.1 in van der Linden and Mitchell (Eds.) (2009) or Knutti R (Eds.) (2010b). An ensemble is a group of comparable model simulations . It needs to be set up keeping a unified aim in mind, i.e., it is not supposed to mix the different kinds of ensembles listed below: • Initial-condition ensemble: Ensemble encompassing different initial conditions (the same model, the same scenario but different simulations starting from different time points in the constant-forcing pre-industrial control simulation), often also simply called multiple runs of a model for the same time period and scenario. • Perturbed physics ensembles: The same model with different assumptions and approximations (Murphy et al., 2009). • Multi-model ensemble: Different models but the same scenario, e.g., SRES A1B. • Multi-model multi-scenario ensemble: Several models, several scenarios.
Note: Conceptually, we are dealing with collections of models (Parker (2011) or ensembles of opportunity Carter, 2010;Knutti et al., 2010)). This means that the ensembles' members come from a set of models the spread of which does not necessarily span the full possible range of uncertainty Parker, 2011) but simply collects what is available.

Climate variability
Beside the uncertainty that stems from the scenario and the selection of the GCM and the RCM (van der Linden and Mitchell (Eds.), 2009) or ESD, climate variability induces uncertainty into climate model results. The term refers to fluctuations of the mean state and/or other statistical properties (such as standard deviation, occurrence of extremes, et cetera) of climate on all temporal and spatial scales which extend beyond individual weather events Kravtsov and Spannagle (2008). The variability may be rooted in (i) natural internal processes within the climate system, e.g., El Niño/La Niña or the multi-decadal Atlantic Oscillation (internal variability) or (ii) in natural or anthropogenic external influences (external variability), e.g., aerosol contents and the equivalent radiative forcing due to GHG emissions. Recent developments in the attribution of climate change to natural and anthropogenic sources are compiled in Stocker et al. (Eds) (2010b). One of the main outcomes is a set of good practice guidances regarding detection and attribution of anthropogenic climate change Hegerl et al. (2010). Numerous studies to assess climate variability and its representation in the models have been carried out. Hawkins and Sutton (2009) break down variability in climate projections into fractions that belong to the scenario itself, the climate model used and internal variability. The shares of these fractions vary with the lead time of the projections and with the region for which the projections are made.

Recommendations for Data Types
As pointed out in Section Data types when computing climate signals that compare the current climate conditions with those of a future time frame it is of utmost importance to use 20C simulation data produced by the same GCM as the projection and not reanalyses for determining the current climate. It is virtually certain that reanalysis data, which are produced in an alternative way, do not include the specific errors of the GCM used for the future climate projections. Yet, under the assumption that the errors of a certain GCM are constant over time or only very little time-dependent, this approach minimizes the GCM-specific bias.

Absolute values vs. anomalies
It is recommended for the minimization of systematic errors (bias) of the participating models, to evaluate anomalies, i.e., deviations from a mean h instead of absolute values.

Climate projections vs. forecasts
The results produced by global climate models and all regionalizations based upon them, cannot be interpreted as forecasts -although it is a misguided notion that they can. Rather, they are climate projections which, unlike weather forecasts, only have the capability to simulate possible climate developments in a statistical sense, i.e., concerning means, variability and extremes. They are, however, not capable to predict the climate, neither for a distinct future point in time nor for a point in space. Among the main reasons for this is the fact that climate has a chaotic component and even the current climate is just one of several equally probable realizations of the http://www.environmentalsystemsresearch.com/content/1/1/9 current state of the climate system. An other important reason is that, in order to perform their computations, climate models require information about the future development of factors which have an intrinsic uncertainty. Some of these developments which crucially influence climate are relatively well-known, e.g., mean solar irradiation or the earth's orientation with respect to the sun. Others are considerably less well known, such as the future greenhouse gas (GHG) emissions (which are in turn strongly dependent on the economical development and the growth rate of the world's population) and their ensuing concentrations in the atmosphere, volcanism or the land use and land cover changes. The longer the time horizons, the less well known are these factors. Moreover, climate change scenarios themselves are subject to change. For example, the IPCC Special Report on Emission Scenarios (SRES, Nakićenović et al. (2000)) established standards for climate change scenarios, climate impact research and mitigation/adaptation measures that have been in use for about a decade. It was acknowledged that the SRES Scenarios were an adequate set of boundary conditions for the climate model versions of that time, albeit a first set that requires improvement, e.g., with respect to the assumptions concerning atmospheric sulphur (Johns et al., 2011) or the inclusion of direct political climate change mitigation measures. Right after the publication of the IPCC's 4th Assessment Report in 2007 activities were launched to establish new scenarios which are describing the concentrations of GHGs in the atmosphere (Hibbard et al., 2007), and are not hinged, as in SRES, on GHG emissions. In Moss et al. (2008), this new approach was presented extensively; figure I.1 on page 10 of that report juxtaposes the new and the SRES approaches. The core of these new scenarios is the concept of Representative Concentration Pathways (RCP) which assess the temporal evolution of the atmosphere's fraction of GHGs i .

Multitude of scenarios and Ensembles
If possible, all available emission scenarios should be analysed in order to have a representation of the scope of the projections. If this is not possible, e.g., for funding reasons, it is recommended that the analysis is based on at least two scenarios: one with a rather high (e.g., SRES A1FI/RCP8.5, see endnote i) and one with a rather low (e.g., SRES B1/RCP4.5) temperature response, to build up an envelope of possible developments. If there are limitations with respect to the number of scenarios used, an explanation of the scenario choice is required (e.g., selection of a worst case scenario or a mitigation scenario). This does, however, not imply that any one scenario is considered more probable than an other. For example, the development of the GHG emissions in recent years is above the course of all projections (Poruschi et al., 2010) which makes it difficult to single out the most appropriate scenario.
All simulations which are used in an ensemble constitute possible climate developments. Yet, there is the issue of representativity of an ensemble (Parker, 2011), particularly when it has only a few members. According to Knutti (2010a) it should be interpreted as a kind of 'best guess' , see also the note to Subsection Ensembles on page 5.
When assembling an ensemble to describe the scope of climatic change, the task at hand needs to be considered: 1. If a mean value is to be computed, only initialcondition ensembles or multi-model ensembles should be used since the computation of a mean across several scenarios is not physically meaningful. 2. If the bandwidth of possible changes is to be depicted, an ensemble across several scenarios is permissible. However, no ensemble mean can be computed for the reasons mentioned above.
In order to evaluate the climate variability of an ensemble of simulations, one of the factors (scenario; model) in the simulations of the ensemble should be kept constant (e.g., the same model for several scenarios or the same scenario for several models, including GCMs, RCMs and ESD).

Type of regionalization
When interpreting climate model results it should be kept in mind that regional signals are embedded in signals on a larger scale -for that purpose it is important to have an idea of the respective GCM results, if possible from several realizations of several models and scenarios.
It is recommended, in order to ensure the quality of subsequent analyses using the downscaled results, to employ at least two different downscaling methods. This encompasses the use of regional climate models (RCM), nested into global climate models (GCM) for an area of interest as well as employing empirical statistical downscaling methods (ESD). If possible both model types, i.e., a RCM and a ESD should be considered as well as a range of scenarios to determine the 'event space' .

Compatibility with older versions of climate models
There are simulations with older model versions. The question arises, as to how those data sets should be treated. Since the results differ concerning the models' assumptions and approximations, the principle of precaution should be applied. A quote from Knutti et al. (2010) may be advisable here: If we indeed do not clearly know http://www.environmentalsystemsresearch.com/content/1/1/9 how to evaluate and select models for improving the reliability of projections, then discarding older results out of hand is a questionable practice.
It should be noted that each model version constitutes a representation of 'reality' which is specific to the chain of information (economy→GHG→climate projections) employed. In accordance with the above quote, it is recommended to indicate that there are other projections which may be derived from older -i.e. less sophisticatedmodel versions, but which cannot be ruled out for certain as long as no direct model flaws are detected.
Note: Many GCMs derive from the same model code some years back which implies that these models cannot be considered as fully independent from each other (Edwards, 2011).

Temporal averaging
Climate variability on a multi-decadal scale is a source of uncertainty with respect to the timing of the expected changes. Thus it is not meaningful to carry out an analysis for one distinct future year or short time period. For special analyses, addressing, e.g., decadal variability and short-term environmental responses to it, a time horizon of 10 years may be appropriate.
For other climate studies and in accordance with WMO (2010) the time intervals to be considered for climate changes of atmospheric properties including temperature and precipitation should at least encompass 30 years j . A majority of climate change and impact studies focuses on mean values of the climate parameters. In addition, the climate variability on different time scales (hourly to decadal) as well as the analysis of extreme events should be considered. If applicable, the latter may require longer time intervals, particularly concerning precipitation.
As to the actual 30-year climate normal period used, the WMO recommendation (WMO (2010) and Arguez and Vose (2011)) is 1971-2000 which has the advantage of a good data coverage for many areas and, due to its higher actuality, better captures variables that have a trend over time.
When processing climate scenario data for public display all bodies involved (Federal, State, third parties) should, as a matter of principle, take into account the following: • Usage of 30-year periods to reduce over-interpretation of decadal climate variations • Usage of the reference period recommended by WMO, i.e., 1971WMO, i.e., -2000 Winter (December-January-February), spring (March-April-May), summer (June-July-August) and autumn (September-October-November). This ensures inter-comparability with other studies and enables the tracking of climatological parameters across a year in an optimal manner. • Furthermore, the type of temporal aggregation of meteorological properties needs to be taken into account. Pertaining to the individual models the temporal resolution may be hours or days. Therefore it is neither meaningful nor possible to interpret daily data on shorter time scales.

Spatial averaging
It is not always meaningful to analyse results for individual meteorological stations. The representativity of the results is better met, when regions are considered. Problems arise when station (point) and grid-box (area) data are compared (Ballester and Moré, 2007). Grid-box based studies should, in particular with respect to precipitation, aggregate over several grid-boxes. Maraun et al. (2010) state in their section on limitations of RCMs that a typical RCM grid scale of 25-50 km provides information on scales of ≈ 100 km with an additional dependency on season and topography. Moreover, there is the Nyquist Principle from information theory according to which the detection of features requires at least two grid points, preferentially more. For the REMO model k a spatial interpolation using at least 4, preferably 9 grid points is recommended by the modellers. When analyzing data from the regional model CCLM l , the modellers recommend the use of 5 × 5 grid boxes. If station-based data are used, the evaluation should aim at incorporating several stations which exhibit close spatial correlation and which have large similarities in their climatic specifications. In order to detach from station names and grid point coordinates, a nomenclature based upon region names is recommended. When impact models are being run, it is advisable to use the resolution provided by the downscaling method http://www.environmentalsystemsresearch.com/content/1/1/9 (RCM or ESD) and perform the spatial averaging in the impact models' results.

Magnitude of change signals
For the analysis of trends an application of the signalto-noise strategy is a recommended step. Using the standard deviation of a climate parameter, e.g., annual mean temperature in the modelled period 1971-2000, an indicator for the variability of that parameter for current climate conditions can be determined. As an example, for Germany the order of magnitude for that variability is approximately ±0.3K for temperature and approximately ±10% for percentual change of precipitation.
In relation to the above assessment of climate variability, signals of climate change, i.e., the difference between projected scenario data and the results of the models's 20C runs, can only be considered significant if the magnitude of the change is larger than the model-specific variability of the respective climate parameter on decadal time scales. Additionally, a distinction between statistical and physical significance should be made m .
In this context it is important to bear in mind that uncertainty embedded in the models is large and only partly quantifiable (van der Linden and Mitchell (Eds.), 2009). This restricts the applicability of rigorous statistical significance tests. Therefore the robustness of the results n is to be taken into account (Parker, 2011). In case of probability distributions of an atmospheric property in a future climate this can be assessed by looking at (i) the sensitivity concerning assumptions that are currently under debate o and (ii) the near-future continuity in the projections, i.e., if they are subject to unplausible major change with the incremental development of the models (Parker, 2010).

Visualization of model results
If results of climate scenarios are to be visualized, thorough consideration should be given to the application of the colour scale as well as the selected value intervals for the colour coding. Spekat and Kreienkamp (2007) address the parameter-dependent choice colour schemes. For topics of colour psychology and cultural particularities in colour perception, readers should refer to Williams (2003). Brewer et al. (2003) p specify some 35 colour schemes for a variety of cartographic purposes and give detailed technical information on how to reproduce them. Gardner (2005) is a valuable source for information on the barrier-free usage of colours.
A few do-s and don't-s concerning colour coding and scale setup are given below, relevant examples can be found in Figures 1, 2 and 3 q . Note that these figures are deliberately stripped of additional visual aids such as cities, borders, rivers or mountains, to show more clearly the effects of the different colour codings and scales. Also note that the properties displayed in Figures 1, 2 and 3 are temperature and precipitation, yet the principles apply to other atmospheric properties as well.
1. The usage of a continuous colour scale is not recommended since it would lead the viewer to assume that there is a quite precise knowledge about the magnitude of the values displayed at each point of the map (see left-hand panel of Figure 1) r . 2. Rather, an adequate colour scale should be set up so that the uncertainty of the values displayed in a map is reflected in the value range covered by the individual colour classes; this means that it is advisable to use just a few (on the order of 10) discrete colours (see centre panel of Figure 1); if the property to be displayed has large margins of error this number might need to be reduced further. 3. If the value range to be displayed is generally of the same sign (as, e.g., for future temperature increases) the colour coding should be just built upon the shades of one colour, as in the centre panel of Figure 1. 4. When selecting colours, apart from colour-psychological aspects such as warm-red and cool-blue, attention should be given to applying colour in a barrier-free way, e.g., avoiding red-green scales. The colour scale in the right-hand panel ( Figure 1) is an example for several mistakes: (i) The value range is positive throughout, yet the colour bar goes from colour A (here: red) to colour B (here: blue) by way of a third colour (here: yellow); (ii) The 'signal colour' yellow over-emphasizes the value range around the middle; (iii) Particularly strong temperature increases are associated with the colour blue which is psychologically unsound. 5. Frequently the data to be displayed can assume both positive and negative values (e.g., for changes in precipitation, sunshine duration, wind speed, or humidity). In such a case, (i) the scale should indeed start at a colour A, then pass a mid-point for zero values with white colour (avoiding a signal colour, such as yellow) and continue to reach a colour B; (ii) the colour scale should be organized symmetrically, i.e., with zero in its centre, as given in the centre panel of Figure 2. The example in the left-hand panel of Figure 2 shows a selection to avoid, since it uses a colour scale that is applicable for properties that do not change sign. 6. As an option, values below the computed significance threshold (cf. Magnitude of change signals section) or inside the IPCC likelihood scale item About as likely as not (Mastrandrea et al., 2010)  right-hand panel of Figure 2 but it ultimately helps the eye to focus on what is noteworthy. 7. In order to improve the comparability of graphs, the mapping of values to colours should aim for consistency between graphs displaying the same parameter, as shown in Figure 3 which underlines the recommendation for unified colour scales. When justified exceptions need to be made, these have to be explicitly mentioned.

Visualization of ensemble results
When showing ensemble results, a list of all ensemble members included in the analyses displayed needs to be given. Furthermore, it should be documented if the results of individual members of the ensemble bear little similarity to the rest. Beside the average value a quantification of the bandwidth (e.g., the standard deviation or a measure of dispersion) must be given. Additional information on extremes or so-called outliers can be valuable if the reason for this outlier is known (e.g., a less realistic representation of a relevant process in the model showing the outlier value).
It has been a common practice, e.g., in IPCC Assessment Reports, to use the agreement between models as an indicator for confidence in the results, although Parker (2011) cautions that it has to be kept in mind that this agreement alone should not be the basis for statements on future climate change.  (Kreienkamp et al., 2010). http://www.environmentalsystemsresearch.com/content/1/1/9 Figure 3 Rationale for using a joint colour scale instead of single scales for analyzing groups of data. On display are the colour scales for the change in mean temperature between 1971-2000 and 2071-2100 according to the cascade SRES A1B→ECHAM5/MPI-OM run 1→WETTREG2010 (Kreienkamp et al., 2010) in Germany. The left-hand side shows the colour scales for the four seasons and the right-hand side shows a unified scale covering the whole range of signals in all four seasons. The unified scale is further modified to yield range separations that are either whole or half degrees.

That's that?
The majority of the above recommendations reflect what is common sense within the climate modelling community. However, publications and presentations frequently show a dash of 'artistic license' . This might -in some cases -increase the entertainment level in a presentation and thus might in fact improve the concentration of the audience. It might also in some cases highlight the results more prominently. Unfortunately, it might reduce the understandability and comparability of the results for non-members of that community.
This paper aims at identifying common ground for good practices when setting up climate change studies and communicating climate change scenario results. The recommendations are a general agreement which is periodically reconsidered, updated and advanced by the informal working group Interpretation of regional climate scenario results of the German federal environment agencies.
We provide this paper to share our current status of agreement as a basis for other user groups to review their presentation and communication practice. We hope to further the joint efforts to improve on our communication skills and to improve the communication between the 'producers' and 'users' of climate modelling results. Because, in the end, if climate modelling results are not perceived and understood correctly by climate impact researchers, politicians or the general public, our information is not feeding into the relevant communities.
Endnotes a By this term we summarize the community of climate modellers ('producing' climate scenarios). b By this term we summarize those who deal with the application of model results (using climate scenario results). This includes climate impact modellers who use the output of climate models as input for their climate impact models as well as those involved in decision making processes. c See, e.g., the preface to Stocker et al. (Eds) (2010a) which states the necessity to clarify methods, definitions and terminology or the preface to Stocker et al. (2010b) which points out the need to summarize methods used in assessing the quality and reliability of climate model simulations. d Nine major sources of uncertainty in climate models and their relevance for various aspects of the modelling and application process are given on p. 11ff of van der Linden and Mitchell (Eds.) (2009). e http://klimawandel.hlug.de/fileadmin/dokumente/ klima/fachgespraech/Leitlinien zur Interpretation-reg-KMD-05-2011.pdf (in German). f The Txx nomenclature is explained in GCM entry of the Terminology section. g However, Parker (2011) cautions against over-confidence with respect to the expectancy that robust conclusions can be drawn from the fact that a collection (as it is called there) or an ensemble of models are in agreement -this has to do, e.g., with the fact that the models used for an ensemble are a sample of unknown representativity. h For example the deviations from a long-term mean over a period or the deviations from the annual cycle of a climate variable. i In broad terms, a mapping of SRES onto RCP scenarios can be achieved. RCP4.5 is close to SRES B1, RCP6.0 is located between SRES B1 and SRES A1B and RCP8.5 is http://www.environmentalsystemsresearch.com/content/1/1/9 higher than SRES A2 and close to SRES A1FI. RCP2.6 is below any SRES scenario and close to the ENSEMBLES E1 scenario which aims at keeping the global warming below 2 o C above pre-industrial levels. j WMO (2010), for one, indicates that there is distinction to be made between the climatological standard normals, which should use non-overlapping 30-year intervals, starting in 1901, i.e., 1901-30, 1931-60 and 1961-90 on the one hand and the climate normal period on the other hand which is of 30 years length but subject to updates in tenyear intervals. For two, this source notes that the interval of 30 years itself is a compromise, because temperature studies may require fewer years whereas precipitation or extremes studies (e.g., Trewin (2007)) frequently require much longer averaging periods, see also, e.g., Angel et al. (1993) or Carter et al. (2007). Concerning this aspect, the IPCC reports, for example, draw from a number of sources in which the treatment of the averaging period issue is not consistent. k REMO information can be found at www.remo-rcm.de. l Not to be confounded with the Community Land Model; CCLM information can be found at www.clmcommunity.eu m This means that the statistical concept of significance, on the one hand, as it is used in hypothesis testing, may not always be applicable. Physical significance, on the other hand, describes events with a high impact. Example: A precipitation event yielding 25mm/day may be physically significant (of high impact) in a flat area, but not in mountainous terrain, although it is probably statistically significant in both. n The concept of robustness has a distinct meaning in the context of statistics where a robust result is understood to be derived using non-parametric methods, i.e., methods in which not values of a property but their ranked sequence is analysed. However, the implied meaning of robustness in this text adheres to the context of climate model evaluations or decision making, where this term is linked to sturdiness and low dependency on influencing factors. o There are properties concerning which the climate modelling community has a variety of concepts and approaches. Their importance is frequently subject to judgement about their quality and relevance. p A relevant web site is colorbrewer2.org. q The reader is referred to Spekat and Kreienkamp (2007) for a more detailed picture of this matter. r A note of caution: Users of colour-bar maps should refrain from over-interpretation. Even though such a map contains discrete values, their assignment to a specific location needs to take into account that they were produced by an interpolation algorithm. The placement of the colour bars is specific to that algorithm. Nevertheless, the interpretation of displays with a limited number of colours is more straightforward, a point which is taken up in item 2.