Statistics of 6-hour Forecast Errors Derived from Global Data Assimilation System at the Central Weather Bureau in Taiwan

The statistics of 6-hour forecast errors for z, u, and v derived from the global data assimilation system at the Central Weather Bureau in Taiwan are presented. One point moments, including mean, standard deviation, skewness, and kurtosis of the forecast errors, are calculated at radiosonde stations to evaluate the statistical properties and define how close the dis­ tribution of the forecast error is to the Gaussian distribution. The degree to which the analyses fit the observation is also examined. The overall evaluations with respect to different domains show that the lower order statistics, mean and standard deviation, are reasonable and comparable to the results of other operational centers. The higher order statistics show that the distributions of the forecast error form an approxi­ mate Gaussian distribution. The spatial distribution of the one point moment shows that the mean and standard deviation of forecast errors are sensitive to the orographic effect (e.g., the Tibetan Plateau), the Asia and North American monsoon activities, and the mid-latitude disturbances. The pattern of the mean and standard deviation exhibits large-scale variability, which may be attrib­ uted to the background errors and suggest that the model error is domi­ nated by large scales. The skewness and kurtosis have many local extremes, suggesting that observational errors dominate these higher order statistics. (


INTRODUCTION
The optimal interpolation (OI) analysis technique is based on the assumption that the deviations from a background field are normally distributed (Lorenc 1986). The background fields are usually provided by short range forecasts produced by an operational data assimila-1 central Weather Bureau, Taipei, Taiwan, ROG lion system and such deviations can be called forecast errors. For a meteorological variable F, the forecast errors (Efi ) can be separated into two terms, i.e., observation error (F b -F T ) and � OS background error (F1,, -F T ): where subscripts "fest", "obs" "!st", and 'T' refer to forecast error, observed, forecast, and true values, respectively. Because the true values are never precisely known, certain assump tions must be made for the observation and background error. The observation error is as sumed to have a spatially independent Gaussian distribution with uniform variance. The back ground errors are also assumed to have a Gaussian distribution, and the spatial correlation is assumed to be homogeneous and isotropic (Daley 1991 ). However, there are many factors that may cause the observation and background errors to deviate from a Gaussian distribution. For example, nonlinearity of the dynamical or physical processes in the numerical model and gross errors due to measurement and data transmission problems could result in a non-Gaussian distribution of background and observation error. Thus, the assumption about the statistical structure of the forecast error must be examined, especially with an operational data assimilation system, e.g., Devenyi and Schlatter (1994) for MAPS (Mesoscale Analysis Prediction System) in FSL/NOAA, Hollingsworth et al. (1986) for ECMWF data assimilation system.
The magnitudes of the observation/analysis differences are usually used to indicate how well the analysis fits the observations (Hollingsworth et al. 1986, and Mitchel et al. 1990, 1996, while the magnitude of the forecast error is the measure of the forecast accuracy. Following the theory of optimal interpolation, that the analysis is equal to the observation only occurs when the observation error (including instrument error and error of representativeness) has been exactly removed in the analysis procedure. In principle, we would like to reduce the statistics of the observation/analysis difference as well as forecast error throughout the analysis, although a closer fit to the observation does not necessarily lead to better 6-h forecast.
At the Central Weather Bureau (CWB) in Taiwan, a second-generation Global Forecast System (GFS) was implemented in January 1995 (Liou et al. 1997). The multivariate optimal interpolation procedure following the schemes of Lorenc (1981) and Barker (1992) was used to produce the analyzed field in a 6-h cycle on a 360x 181 global Gaussian latitude-longitude grid (resolution about 1°). The goals of this paper are (I) to introduce the data assimilation system (GFS) at CWB and show that it works in a reasonable way by verifying the forecast and analysis against the observations; (2) to compare the statistics of forecast error and analy sis performance to other operational centers; (3) to investigate whether the statistical proper ties of the forecast error follow an approximate Gaussian distribution. The assimilation experi ment was performed from 1 May to 30 June 1998 by taking advantage of the 6-h intensive radiosonde observations around East Asia during South China Sea Monsoon Experiment (SCSMEX, Lau et al. 2000). An overview of the data assimilation system at CWB is given in section 2. Section 3 describes the dataset used to compute the statistics. Section 4 presents the results, and a summary is given in section 5.

DESCRIPTION OF DATA ASSIMILATION SYSTEM
The global data assimilation system at CWB operates on a 6-h cycle with a data window of ± 3 hours. One cycle consists of performing (1) 6-h forecasts on cr surface to produce the first guess; (2) objective analysis of the forecast error on P surfaces. The forecast error is the difference between the observations and the first guess interpolated to the observation location.
(3) An incremental nonlinear normal-mode initialization procedure on cr surface to eliminate the gravity wave activity. A flow chart of the data assimilation cycle is depicted in Fig. 1. The details and performance of GFS are available in Liou et al. (1997).
The forecast model is a hydrostatic PE model with a resolution of 18 cr levels in the vertical. The horizontal resolution is 120 waves of triangular truncation (T120). The forecast model is initialized by incremental nonlinear normal-mode initialization (Ballish et al. 1992). The model includes various physical parameterizations: a TKE (turbulent kinetic energy) dis sipation scheme for planetary boundary layer parameterization (Detering and Etling 1985), a relaxed version of Arakawa-Schubert cumulus parameterization (Arakawa and Schubert 1974), a shallow convection parameterization (Tiedtke 1984), a longwave and shortwave radiative transfer calculation with consideration of fractional cloud following Harshvardhan et al. (1987), a gravity wave drag parameterization (Palmer et al. 1986), and a grid scale condensation calculation. The three-dimensional optimal interpolation (OI) scheme (Lorenc 1981;Baker 1992) ana lyzes geopotential height and horizontal wind components at a 360x 181 global Gaussian latitude-longitude grid (resolution about 1 °). The analysis is performed on 16 constant pres sure levels including 15 mandatory levels from 1000 to 10 hPa plus 925 hPa level. The over lapped-volume method is used to maximize the utilization of available observations. The dy namic constraints of hydrostatic and geostrophic balance are included in the formulas defining the error covariance matrix.
The observational data used in global data assimilation system are mainly from the Glo bal Telecommunications System (GTS), including TEMP (upper air temperature, wind, and humidity by radiosondes), PILOT (wind observation by the pilot balloons), SYNOP (surface observation over land), SHIP (surface observation on ship), AIREP (aircraft observation), SATEM (retrieved temperature profile), and SATOE (cloud track wind). The data quality control is pe r formed after receipt from GTS. The check includes the departure of the reported value from climatology, hydrostatic check (Collins and Gandin 1990), and consistency com parison among nearby observations (DiMego 1988). The observations which pass the data quality control are called screened data and used in the verification of the data assimilation fields in section 4.

3.DATASET
The assimilation experiment was performed from 1 May to 30 June 1998. The datasets were collected from 5 May to 20 June because intensive 6-h rawinsonde observations around the East Asia area operated between 5-25 May and 5-20 June during SCSMEX. The collected data include analyzed fields (A), 6-h model forecasts (FJ, and screened RAOBs (0) for height (z), u, and v wind component. The statistics were derived at RAOB locations and only assimi lation times at OOZ and 12Z are available. The RAOB stations are included only if the observa tions contained 60 or more samples. This was an attempt to avoid irregular observations and promote the statistical representation. A total of 575 stations, shown in Fig. 2, met the criterion. The analyzed fields and 6-h model forecasts are grid data and interpolated into the RAOB locations using the bi-cubic interpolation method.

4.RESULT
The behavior of one-point statistical moments of forecast error ( 0 -FJ, mean, standard deviation, skewness, and kurtosis, are discussed. The latter two quantities are related to the symmetry and length of the tails of the distribution. For a realization of N samples, x1, x2, x3, ••• , xN , the estimate of the skewness (S) and kurtosis (K) can be formulated as (Panofsky and Brier 1958) and where x ands are the mean and standard deviation of the sample, respectively. For an asym metrical distribution, S * 0, S<O indicates that a distribution is skewed to the left (i.e., long left-hand tail) while S>O indicates that a distribution is skewed to the right (i.e., long right hand tail). The asymmetry of a distribution is significant as the absolute value of Sis greater than I. Two distributions may have the same mean, standard deviation, and skewness, but differ in kurtosis, which measures whether the central portion of the distribution is peak or flat. One distribution may have relatively few cases near the center, so that the histogram appears flat or has long tails (high KJ, or most of the cases may lie near the center (low K) such that the distribution has short tails. For a Gaussian distribution, Sis equal to 0 and K is 3. Therefore, it is possible to determine the significance for a given distribution from the Gaussian distribution by the estimation of skewness and kurtosis. Hollingsworth et al. ( 1986) indicated that various types of upper air sound used over North America and Europe might lead to the different statistical properties of the forecast errors. It is also well known that the characteristics of atmospheric disturbance as well as the biases of the model forecast are expected to be geographically dependent due to orography and land-sea contrast. Thus, the statistics of the forecast errors, defined as RAOB observations minus model forecasts The coverage of each domain is shown in Fig. 2. Both level-averaged and spatial distributions of the statistics on particular levels are presented as follows. Figure 3 is the mean, standard deviation, kurtosis, and skewness for z over each of the five domains. The figure shows that for z the negative mean forecast errors increase from the sur face and reach a peak at 700 hPa with larger negative value (--8.7 m) for North America and the Southern Hemisphere and relative small values (--2.6 m) for Asia. The mean forecast errors at the upper troposphere tend to scatter for various domains: increase on the positive side with height for Asia whereas on the negative side for North America and small values for the European and Southern Hemispheric areas. The standard deviations of the forecast error for z are quite similar below 300 hPa except for slightly smaller values for the Southern Hemi spheric area below 700 hPa. In general, the standard deviations are less than 13 m below 700 hPa and increase with height. Both the mean and standard deviation over the global domain and those over the European area are very close. A larger spread of the standard deviation occurs above 300 hPa, with smaller values for North American and Southern Hemisphere and larger values for Asia. The magnitude and profiles of the mean and standard deviation for z are reasonable and similar to the results of Mitchell et al. (1993Mitchell et al. ( , 1996 and Hollingsworth (1986). The absolute value of skewness in Fig. 3 is usually less than 0.3 in the troposphere, implying  a nearly symmetrical distribution. A slight asymmetry occurs at 250 hPa for the Southern Hemisphere area and above 200 hPa for the American and European areas. The K value is usually between 3.5 and 4.5, which indicates a distribution with longer tails than associated with a standard Gaussian distribution. It is observed that over Asia the mean height error increases positively at heights above 400 hPa and negatively over North America, as shown in Fig.3. Such a statistical features should link some specific physical characteristics. Figure 4 is the spatial distribution of the mean height errors at 200 hPa over the Asian and North American areas. The Cressman scheme was used to create the contour maps. The influence radius used in the Cressman scheme was 5° and grid values were set as null if the number of RAOB stations was less than 3 within the influence radius in order to avoid unnecessary "bull's eyes" in the contour map. Figure 4a shows that the systematic positive mean height error is dominant over Mainland China, espe cially over the downstream area of the Tibetan Plateau. Though limited information can be provided by the mean forecast error statistics, the systematic bias could be attributed to the inappropriate handling of the orographic effects on Tibetan Plateau in the model. Fig. 4b shows that the negative bias dominates the North American area and increases negatively toward the north. The latitude-dependence of the bias might be associated with the increased variability to the north due to the activities of the upper level trough during the season. One of the inter esting phenomena in Fig. 4b is that there is a positive bias around the southern United States and Mexico, which is evident from 500 to 200 hPa. The feature is well correlated to the activi ties of the North American monsoon (Douglas et al. 1993;Higgins et al. 1999) during the warm season. Thus, the disturbance associated with the North American monsoon is possibly attributed to the systematic positive bias around Mexican area. Figures 5 and 6 are the statistics for forecast errors of u and v. The mean forecast errors of u and v are usually small negative with absolute values less than 0.5 mis in the troposphere except for v below 850 hPa over the Asian area. The standard deviation of the forecast error profiles for u and v over each domain are similar throughout the troposphere. The typical values are about 3 mis at 1000 hPa, with slight variations below 700 hPa, increasing to less than 5.5 mis at 250 hPa, and then decrease. Both the mean and standard deviations of forecast errors of u and v are reasonable and comparable to the results of Mitchell et al. ( 1990Mitchell et al. ( , 1993Mitchell et al. ( , 1996 and Hollingsworth (1986). However, larger mean and standard deviation values exist over the Asian and Southern Hemispheric areas above 300 hPa, which may be attributed to the lower forecast skill over these regions. The lower forecast skill might be possible related to the Fi g. 5. The same as fig. 3, but for u. The unit is mis.  Figure 7 shows that the statistics of the forecast error for z over Mainland China are characterized by a relative smaller value and larger scale pattern of mean and standard deviation and rather close to the Gaussian distribution (S-0, K-3). In general, the observa tional error is spatially uncorrelated. Thus, the large scale pattern of mean and standard devia tion are mainly attributed from background errors and suggest that the model error is domi nated by large scales. There are larger mean and standard deviations around the Southern Asia and Tibetan Plateau area, which is possibly due to the weaker forecast skill resulting from the Asia monsoon disturbance and the orographic effect of Tibetan Plateau. Figures 8 and 9 define the statistical behavior of the forecast errors for u and v. Similar to z, the larger mean forecast errors occur around the Tibetan Plateau and Southern Asia areas and have a smaller and smoother pattern over Mainland China. The spatial distribution of the standard deviation for the wind field has rather large scale characteristics and varies smoothly across the domain. Many extremes of the skewness and kurtosis statistics are centered around specific RAOB locations, which suggests that individual observation errors might dominate these higher-order statistics. That locations of the extremes on u and v are not always the same implies that the errors in the vector wind observations have different effects on the u and v components. There is a pattern in the spatial distribution of the standard deviation of u, v, and z, that is a systematic increase northward (towards the midlatitude). The structure also exists over the North American and European area (not shown), which is possibly related to the increased synoptic disturbance in midlatitude. Figure 10 is the mean and standard deviation of the 0-A and 0-F for z, u, and v over the Asian area. The figure shows that the magnitudes of 0-A, either in its mean or standard deviation, are smaller than 0-F, and reveals that the analyses fit the observation more closely than does the first guess. The mean values of 0-A are generally small and less than 2 m for z, 0.2 mis for u and v below 150 hPa. The vertical profiles of the standard deviation of 0-A are quite similar to 0-F. The standard deviation of 0-A for z decreases from 8.6 mat 1000 hPa to Fig. 8. Same as Fig. 7, but for u. The contour intervals are 0.5 mis for mean and standard deviation, 0.1 and 0.5 for skewness and kurtosis. 645 about 6.5 m below 700 hPa and then linearly increases to about 29 ma t 100 hPa. 0-A for u and v has a smaller standard deviation in the troposphere, about 2.2 mis, and reaches a maxi mum of about 3.1 mis for u and 3.5 mis for v in the layer of 200-250 hPa. The profile and the order of magnitude for the statistics of 0-A over Asia, especially at the 1000 hPa, are slightly larger than the results of the Canadian Meteorological Centre (Mitchell et al. 1996) and ECMWF (Hollingsworth et al. 1986).

SUMMARY
The statistics of the forecasts errors for z, u, and v derived from the global data assimila tion system at the Central Weather Bureau in Taiwan were presented. The statistical results could serve as a reference basis for the forecast and the model developer. In general, the data assimilation system was found to be functioning well, as revealed by the reasonable statistical results of the forecast error and compared to other operational centers. The level-averaged distribution of the forecast error with respect to different domains shows that the lower order statistics, mean and standard deviation, are reasonable and comparable to the results of other forecasting centers (Mitchell et al. 1990(Mitchell et al. , 1993(Mitchell et al. , 1996Hollingsworth 1986). The distribution of the forecast errors verified by the skewness and kurtosis, appears as an approximate Gaussian distribution. 0-A quantities were used to evaluate how well the analysis fit the observations. The profile and the order of magnitude for the statistics of 0-A over the Asian area, espe cially at the 1000 hPa, were slightly larger than the results of the Canadian Meteorological Centre (Mitchell et al. 1996) and ECMWF (Hollingsworth et al. 1986). The spatial distribution of the one point moment shows that the statistics of forecast errors are sensitive to the orographic effect (e.g., the Tibetan Plateau), the Asia and North American monsoon activities, and the mid-latitude disturbances. The spatial distribution of the mean and standard deviation exhibited large-scale variability, which may be attributed to the background errors and suggest that the model error is dominated by larger scales. The skewness and kurto sis have many local extremes, suggesting that the observation errors, e.g., gross errors in the rawinsonde observation, dominate these higher order statistics. Thus, a detailed investigation and modeling the higher order moment station by station may serve as a criterion for the quality control of radiosonde observations.