The Impact of Model Uncertainties on Analyzed Data in a Global Data Assimilation System

The impact of model uncertainties on analyzed data is investigated using a global data assimilation system. This issue is explored in a 3D-Var system based on the National Centers for Environmental Prediction (NCEP)/Department of Energy (DOE) system using two convective parameterization schemes, the Simplified Arakawa scheme and the Community Climate Model (CCM) scheme. Two sets of six-hourly analysis data are generated for the summer of 2004. The difference between the resulting analyses using different convective parameterization schemes is found to be significantly greater than that between two well-known analyzed data sets, the NCEP/National Center for Atmospheric Research (NCAR) reanalysis (RA1) and the NCEP/DOE reanalysis (RA2). This dependency is more pronounced in data-sparse areas like the East Asian region than in data-rich areas like the North American region. Our study indicates that predictabilities for shortto medium-range forecasts in the global forecast system are indirectly influenced by forecast model accuracy via the quality of the initial conditions.


InTroDUcTIon
A data assimilation system is a comprehensive system utilizing meteorological measurements, a statistical approach, and an atmospheric model to describe the state closest to the true atmospheric state (Daley 1991).The key objective in such a system is generally to create analysis fields that represent optimal states of the atmosphere and, ultimately, to provide initial fields as input to numerical weather prediction models.Additionally, such systems are used to reproduce sequences of long-term, reliable analysis datasets for weather and climate research, also known as reanalysis data (Kalnay et al. 1996;Kistler et al. 2001;Kanamitsu et al. 2002b;Uppala et al. 2005).
The major components of a data assimilation system ** are the observational data, a data assimilation algorithm, and a forecast model, all of which are closely coupled.The application of a data assimilation system may include various sources of error, mostly due to observations and the fore-cast model (Lu and Browning 1998).It is common for such observed data to have inevitable subjective, instrumental, or other errors.Therefore, observation error statistics corresponding to the observation type are assigned in the data assimilation module.
Moreover, forecast models for background fields inherently suffer from errors due to uncertainty in the initial data and from model error, which is related to the chaotic behaviors of nonlinear systems in the evolution of the atmosphere.The data assimilation algorithm itself is not considered to possess these errors, since it is the method used to find the best estimate of the atmospheric state using the difference between the background of the forecast model and the corresponding observations.However, the assimilation system is processed with the pre-determined error statistics ** Throughout this paper, we define the assimilation algorithm as a matrix solver for the model and observation errors, used to obtain the optimal state of the atmosphere, over the data assimilation system including the quality control component, the assimilation algorithm, and the forecast model used to generate the guess field.Terr. Atmos. Ocean. Sci., Vol. 22, No. 1, 41-47, February 2011 from the forecast model and the observations.Thus, there are relative pros and cons associated with the methods used in various assimilation algorithms, such as the three-or four-dimensional variational (4D-Var) method and the Kalman Filter.The reader is referred to Kalnay (2003) for an overview of data assimilation.
The majority of data assimilation research has been conducted on data assimilation algorithms and the associated data acquisition and quality control, rather than on the exploration of the forecast model component.It is generally believed that errors in the forecast model are mostly due to model physics, which causes systematic errors in the analyzed data.Although it is known that differences in physical parameterization schemes can result in different forcings in a prediction model (Hack 1994;Lu and Browning 1998), impact studies of model uncertainty on the quality of the initial conditions are relatively rare.
In this study, the influence of forecast model uncertainty on analyzed data is investigated in a 3D-Var system by employing two different cumulus parameterization schemes, the simplified Arakawa-Schubert (SAS; Pan and Wu 1995;Hong and Pan 1998) and the community-climate model (CCM; Zhang and McFarlane 1995) algorithms, in an identical assimilation system.The primitive variables, such as temperature and moisture, are evaluated against the corresponding radiosonde observations.The resulting precipitation is assessed in relation to the observations over the time frame studied.The analyzed data sets are also compared to other two popular reanalysis data sets.The discussion focuses primarily on the differences resulting from the choice of forecast model, rather than on the departure from the in-situ observations.Kang and Hong (2008) demonstrated the overall outperformance of the SAS scheme over that of the CCM scheme when simulating the East Asian summer monsoon in a regional climate model study.It is important to note, however, that the purpose of our study is not to judge the superiority of one cumulus parameterization scheme over another, but rather to assess model uncertainty, since parameterization of deep convection is one of the most important and uncertain components in the forecast model.
Section 2 presents the experimental design, while section 3 contains evaluations of the precipitation and observation verifications of the analyzed fields.This paper ends with concluding remarks in section 4.

ExpErIMEnTAl DESIGn
Two sensitivity experiments are conducted to discuss the impact of different model physics, the SAS and CCM schemes, under an identical data assimilation system.Hereafter, the resulting data sets are referred to as the Gsas and Gccm data, respectively.The 3D-Var system used in these two experiments closely follows the National Centers for Environmental Prediction/Department of Energy (NCEP/ DOE) reanalysis-2 system (RA2; Kanamitsu et al. 2002b).However, the forecast model for generating the guess field for Gsas and Gccm is different from that in RA2.The dynamical framework in the model is based on that of the RA2 system, but the physical processes are updated as described in Kanamitsu et al. (2002a).In addition, the Yonsei University scheme (Hong et al. 2006) for vertical turbulence diffusion is adopted in these sensitivity experiments.Since Gsas and Gccm experiments use a similar data assimilation algorithm to that of RA2, but with a different forecast model, a comparison of the experimental results with those predicted by RA2 can provide an estimation of the uncertainty due to forecast model errors in a data assimilation system.The summer of 2004, which recorded nearly normal seasonal precipitation over East Asia, was selected for this study.
The NCEP/National Center for Atmospheric Research (NCEP/NCAR) reanalysis (RA1; Kalnay et al. 1996) and NCEP/DOE reanalysis-2 (RA2) systems, released to the public, are compared as references.The RA2 system is an updated version of RA1, with certain components enhanced, corrected, or newly introduced (Kanamitsu et al. 2002, see Appendix).Most of these changes concern components related to the forecast model, such as model physics and fixed boundary fields; the principal module of the data assimilation algorithm is still used in RA2.New precipitation data assimilation and smoothed orography are only adopted in the assimilation algorithm in RA2.Data assimilation systems are generally regarded as a combination of model forecast and an assimilation algorithm that digests observations.Although the assimilation algorithm is mostly unchanged in RA2, the analysis fields are significantly altered since the background field used in assimilation was changed due to the improved forecast model.RA2 shows significant differences from RA1 in parameters such as soil moisture, global radiation budget, tropospheric humidity, precipitation, cloud cover, and near-surface temperature.
The Gsas and Gccm data differ only with respect to the forecast model, having different cumulus parameterization schemes.The differences between RA1 and RA2 involve the forecast model physics, corrections of various errors in the assimilation algorithm resulting in improvements to fixed fields such as surface albedo, snow, and ice, and the introduction of new system components.That is, the RA1 and RA2 data are the two different analyses with major differences in their respective assimilation systems, whereas the differences in the Gsas and Gccm data are due to a specific, uncertain component of the forecast model within identical assimilation systems.
To evaluate the accuracy of the analyzed data, radiosonde observation data (RAOBs) for specific humidity and temperature and the Global Precipitation Climatology Project (GPCP) data for precipitation are used.The RAOBs originate from the University of Wyoming website (UWYO, http://weather.uwyo.edu/)and are processed through a sim-ple quality control.The GPCP (Huffman et al. 2001) data, with a 1° × 1° spatial resolution, are used for evaluation of precipitation in each analysis.

Temperature and Moisture
Analysis fields are compared with the corresponding radiosonde observations in order to compute the bias (analysis minus observation).With regard to data assimilation, the sign of the bias is opposite from that of the observation increment, which is the correction of the model error (observation minus guess field).The biases in temperature over East Asia are relatively small for the RA1 and RA2 data in the lower troposphere and for the RA2 and Gsas data in the middle troposphere (Fig. 1a).Overall cold biases appear in the upper troposphere in all of the analyses studied.The bias in RA2 decreases with warmer temperatures in the middle troposphere and with colder temperatures in the lower troposphere compared to the bias of the RA1 profile.This may be due in part to enhanced vertical mixing in the Hong and Pan (1996) scheme as compared to that in the local diffusion scheme of Louis (1979).The non-local mixing of Hong and Pan (1996) tends to exhibit warming within the boundary layer and cooling above it, as compared to the local approach of Louis (1979).
The biases over North America are considerably less than those in East Asia and are smaller for the Gsas and RA2 than for the Gccm and RA1 throughout the entire troposphere (Fig. 1b).This may be due to differences in the amounts and qualities of observation data between the two regions, since these differences in turn determine the dependency of the resulting analysis on the assimilation module or forecast model (Kistler et al. 2001).In other words, changes in the data assimilation system could affect the quality of analysis in a data-rich region, though not in a data-sparse region.It is true that rawinsonde and surface observations are not available over oceans; hence, the East-Asian region contains a smaller number of observation data sets than does the North American region.The typical distribution of observations can be found in Kalnay (2003).The improvements from RA1 to RA2 indicate the overall effect of changes in the data assimilation system, including the observational data, the assimilation algorithm, and the forecast model.
Figure 2 shows the verification of humidity against radiosonde observations.Note that the bias patterns in specific humidity are similar over both regions in spite of the different amounts and qualities of observational data between the two regions.It can be explained that the humidity variable is strongly influenced not only by observational data but also by the model; this is unlike the temperature variable, which is strongly influenced by observational data (Kalnay et al. 1996).As was the case for temperature, the Gsas and RA2 profiles exhibit comparatively smaller biases than those of the other two sets.The humidity profiles from the Gccm and RA1 data reveal severe dry biases centered around 800 hPa, and the bias for the Gsas data is much smaller than that for the Gccm data at altitudes below the middle troposphere.It is also clear that the bias difference between the Gsas and Gccm data appears greater with respect to specific humidity than with respect to temperature, highlighting the idea that the model physics affect moisture fields more significantly than they do temperature fields (see Hong and Pan 1996).Notably, the dry bias in the lower troposphere frequently appears in the Gccm and RA1 data even though these data are generated by two assimilation systems that differ in many aspects.
Our results also demonstrate that the difference between the Gsas and Gccm profiles is, on the whole, greater than that between RA2 and RA1.The vertically averaged differences in temperature (Fig. 1) between Gsas and Gccm are 0.188 and 0.135 K over East Asia and North America,  respectively, whereas the corresponding differences between RA1 and RA2 are 0.167 and 0.082 K.For moisture (Fig. 2), the differences between Gsas and Gccm are 0.165 and 0.163 g kg -1 over East Asia and North America, respectively, whereas the corresponding values between RA1 and RA2 are 0.152 and 0.102 g kg -1 .This suggests that the uncertainty in the forecast model pertaining to the data assimilation system can significantly affect the analyzed data.

precipitation
To investigate the effects of the convective parameterization scheme in the forecast model on precipitation in each data set, the three-month averaged zonal mean precipitations are calculated and compared (Fig. 3).It is apparent that the analyzed data tend to overestimate the precipitation in both hemispheres, but the patterns are more diverse in the Northern Hemisphere than in the Southern Hemisphere.
Tropical precipitation values over the ITCZ from the Gsas data are closer to what was observed than are those from the Gccm data in terms of latitudinal location, although both data sets exaggerate the peak intensity.This peak is also overestimated by the RA2.Again, the difference between the Gccm and Gsas data sets is greater than that between RA1 and RA2, except for the ITCZ.
Three-month total precipitation values over the globe are compared with the GPCP precipitation results (Fig. 4).The tabulated amounts and pattern correlation scores for the seasonal mean precipitations are shown in Table 1.Precipitation values other than those from RA1 are similar to each other; the greatest agreement is observed between the RA1 and GPCP data.Precipitation peaks over the tropics are exaggerated by the RA2, Gsas, and Gccm data.The relative deterioration of the precipitation distributions when changing from the RA1 to the RA2 data is explained by the negative effects of the physics change in the NCEP operational model (see Kanamitsu et al. 2002a for the details).An indistinct yet discernible impact is seen between the Gsas and Gccm results.The pattern correlation of the three-month precipitation from the Gsas data is better than that from the Gccm over the different geographical regions of the globe.

conclUDInG rEMArkS
The impact of model uncertainties on analyzed data is examined using a global data assimilation system that closely follows the NCEP/DOE reanalysis system.The uncertainties of model errors are estimated by applying the forecast model with two different cumulus parameterization  schemes, namely, the SAS and CCM algorithms.A sixhourly assimilation cycle is performed for the summer of 2004, and the resulting analyses are evaluated for accuracy against radiosonde observations collected over East Asia (a data-sparse region) and North America (a data-rich region).The resulting precipitation values are also compared.Two popular reanalyzed data sets, the NCEP/NCAR RA1 and NCEP/DOE RA2, are used as references.
Our major finding is that the quality of the analysis depends highly on the forecast model embedded within the data assimilation system.Given identical data assimilation algorithms and observation data, the bias from a system using the SAS scheme is typically smaller than that from a system using the CCM scheme, a finding which is consistent with evaluations of convection schemes in simulating the East Asian summer monsoon (Kang and Hong 2008).The differences in large-scale fields (such as temperature and humidity) between the analyses with the two different convection schemes are as great as the disparities between the RA1 and RA2 data.These features are more prominent in the data-sparse region over East Asia than they are in the data-rich region over North America.
Note that the RA2 assimilation system is significantly different from the RA1 system in observational data and the assimilation system due to the system improvements, the correction of human processing errors and many problems in observational processing, and changes to the model physics and fixed fields such as albedo, snow, and ice.It is certain that the effective use of observational data and development of data assimilation algorithms can decrease deviation in analysis fields from actual observations.It has been realized that advanced assimilation algorithms such as 4D-Var represent a significant increase in the quality of initial conditions, which leads to improvements in mediumrange forecast accuracy (Rabier et al. 2000;Laroche et al. 2005).For example, the quality of the ECMWF reanalysis data using 4D-Var is generally better than that of RA1 or RA2 (e.g., Ponte and Dorandeu 2003).It has also been shown that the prediction skill of a medium-range global forecast system is highly dependent on the accuracy of the initial data (Lorenz 1963(Lorenz , 1993;;Palmer 2000).However, our evaluation of the analyzed data sets compared to radiosonde observations also ensures that improvements to the internal physics of the forecast model can be as significant as improvements in the assimilation system and increases in the quantity of archived data.Therefore, we may conclude that continuous efforts should be made toward refinement of model physics as well as enhancement of data assimilation systems in order to improve forecast skill in global forecast systems.

Fig. 1 .
Fig. 1.Vertical profiles of temperature biases (K) for four different analyzed data sets relative to the radiosonde observations: RA2 (thick dotted line), RA1 (thin dotted line), Gsas (thick solid line), Gccm (thin dotted line) over (a) East Asia and (b) North America.

Table 1 .
Precipitation amounts (mm day -1 ) in June-July-August of 2004, and pattern correlation coefficients relative to the GPCP observations.