The Initial Field Constructions for NuIDerical Weather Prediction in the Low Latitudinal Regions by a New Mixed Type Autocorrelation Function statistical a n a l y s i s scheme for meteorological

A univariate statistical scheme for the analysis of geopotential height fields has been developed based upon optimum interpolation. The data archived from a set of subjective analyses were used to compute the required correlation functions. The computed height autocorrelation functions were fitted by an appropriate function which would definitely reftect the charac­ teristics of the statistical structure in the low latitudinal regions. Rather than representing the model by the negative squared exponential form proposed by Gaudin (1963), a representation consisting of a polynomial exponential func­ tion was used to fully trace the finer structure of the statistical quantities. The analyzed geopotential height fields were obtained through the compu­ tation of the geopotential height autocorrelation model. A s a measurement of accuracy, this scheme has been tested at target stations. The root mean squares between the analysed data and those obtained from radiosonde data have been carried out to verify its performance as well.


INTRODUCTION
This paper describes the te-sts of a univariate statistical objective scheme intended for use with numerical weather prediction models in the low latitudinal regions. In the past few decades, many scientists have shown considerable interest in the research of the initial field constructions for meteorological variables. The initial stage of optimum interpolation was first carried out using univariate variables, such as the geopotential height, the wind compo nent, the temperature and the dew point temperature to compute their statistical quantities. Later, Gandin (1963) proposed a multivariate statistical analysis scheme for meteorological 190 TAO, vT ol. 7, 1Vo.2, .Ju11e 1996 . . applications. Difficulties he met were due to the lack of information on the cros s-correlation elements that had not been determined at that time. Meanwhile, howe\i·er, Buell (1958Buell ( , 1960Buell ( , 1972a, Gandin (1964) Petersen and Middle.ton (1964) and Petersen (1973c) found out that correlations between the height and wind components are related to geostrophic approxima tions at mid-and high latitudes. Since then, Kluge ( 1970), Rutherford ( 1973) and Schlatter (1975Schlatter ( , 1976 have used the multivariate objective analysi s scheme in weather predicti on mod els. Additionally and almost at the same time., Eddy (1973) and Petersen (1973c) proposed a multivariate. statistical scheme such that, in the vertical, the final analyzed geopotential height fields a14e subject to thermal wind 14elations. In this paper, the uni\1ariate optimum interpolation in the low latitudinal regions and its application in initial field constructions are the primary concern. More specifically� this paper atte.mpts to construct a 2D autocorrelation model which is suitale for the analysis of geopotential height fields in the present research area. By using the manual subjective anal)'Sis data sets which are 500 hPa, 1200 GMT during the winter months (December through February) of 1977 to 198 3, it is hoped that the scheme will demonstrate itself as being capable of practical implementati on and of use in operational initial height field constructions for numerical weather prediction.
The remainder of this paper is organized as follows: Section 2 of this paper presents the definitions of various correlation functions, the construction of the h-h autocorrelation matrix as well as the modeling of the autocorrelation functions in tenns of distance. Section 3 lays out the mathematical treatment of these algorithms in optimum interpolation. Section 4 demonstrates the. ability of the analysis scheme to fi t the observation data, the response of the scheme. to \1arious inputs and modifications and characteristics of the analysis scheme are also mentioned. Finally, in Section 5, the findings are summarized.

ANALYSES
The procedure followed here is specified to the problem at hand, namely, the statistical structure constructions of the geopotential height fi elds and its application in the initial height fie. Id analysis for numerical weather prediction models. Specifically, the interest in th is paper is in the optimum interpolation of scattered station data interpolate to regular grids simultaneousl)'. As these data are univariate in themselves, they may include a spatial and temporal set of meteorological variables such as the geopotential height, the scalar wind component, the temperature and the dew point temperature as well.

Correlation Function
In the method of optimum interpolation, the statistical structure of the. meteorological fi elds is the main role in the analy sis process. Autocorrelation and cross-correlation functions are usually selected to represent statistic. al quantities. The important properties are briefly mentioned belo\v: f ( r ) can be taken as the meteorological variable; it is defined as the sum . . of the mean .f ( r ) and the deviation .f·' ( .,.� ) , where r· is the geographic point with respect to a reference point. The cross-covariance . f' unction C'1:9 for variables .f and g is given by: where r1 and · r 2 are two different geog14aphic points. Furthermore, the autocovariance. func tion C1JJ('r .. 1,1--2 ) of the. univariate .f can also be defi ned from Equation (1).

191
The mean square of the anomaly of the quantity f on the point ri is \Vritten as: The square root of D f is sometimes called the variance of f' at that point. The cross-correlation function of variables j' and g on two distinct points r 1 defi ned as: and r? is -The autocorrelation function of the same variable is defined when .f· -g, which is in the t' orm of: (4).
When the covari ance or correlation function depends not upon the coordinate of the points but rather upon the distance alone, it can be said that the function satisfies the homo geneous and isotropic conditions.
In order to check to what extent the homogeneity and isotropy hypothesis is to be satisfied, it is convenient to proceed as follow·s: fixing one of the points, the correlation fu nctions for all possible positions of the second point are calculated. If the isolines of such autocorrelation functions are circles, it means that the hypothesis of homogeneity and isotropy are. met.
As Gandin suggested in his paper, the station distribution can be selected arbitrarily'. In order to find the patterns of correlation functions (among h, u and v variables) between Taipei and all the other 77 stations, some of the stations must be put on the right hand side of Taiwan Island which is borde-red by the vast Pacific Ocean. The data on these prespecified points were archived from the historical Japane. s weather map and have been analysed by man)' experienced meteorologists there. During the readout of these prespecified station data, the task was mainly done by careful subjective analyses. In Figure 1, the stations used in these experiments are numbered in order of increasing di stance from Taipei, with the city of Taipei numbered as 58 and lying almost in the center of the station network. In this experiment, the correlations were calculated fr om a set of I 08 case-s on each point. Each observations were separated by a fiv·e-day interv·al, thereby making this kind of pick up independent to ensure isolation among cases.
For the large scale meteorological phenomena, the ( (h-h)) autocorrelation is indeed true in satisfying the homogeneous and isotropic conditions for the. order of some thousands of kilometers on some isobaric levels. Studies investigated by others have shown that for the geopotential height, tempe1·ature, humidity, etc., the-assumption is fulfilled with sufficient ac.curacy up to a distance of several thousands of kilo1neters. In addition, this assumption substantially simplifies the problems at hand here . The autocorrelaton fu nctions of the wind components ((u-u)) _ and ((v-v)) are a little bit defonned from true circle, and the patterns with the major axis were shown as being oriented toward the zonal direction in the case of the u component but toward the meridional direction for the v component. Such differences do not strictly satisfied the conditions of homogeneity and isotropy. The test of the analysis scheme was based upon the autocorrelation model computed from the geopotential height fields in the geographic domain. In this section, the autocor relation computation of this univariate variable, the modeling of these correlations and its applications in constructing initial geopotential height fields are discussed.
Autocorrelations of the geopotential height field were computed for ((hi-hfi)(h j -hfj)) where i and j range over 78 stations in the geographic domain.
The bracket symbol in ( (h�-hj)) denotes the statistical averaging over many cases. The definition gives: The ensemble mean is given by:

h-u cross-cor r elation function
Correlations among the variables h, u and v. The master station is Taipei.
In this way, the ensemble mean of the long term data (108 cases involved) in the present study were selected as the initial guess h f i. The autocorrelati on functions for every possible pair points are plotted as a function of separation distance. Because this is a symmetric matrix, 78 stations or (78*77) I 2 + 78 possible station pairs \Vere considered. The resultant 3081 autocorrelations for the station pairs are presented in Figure 3, (refer to Onn, Wang and Ts eng 1993 ). The figure shows that the qualitative agreement is the same as othe. rs obtained from the mid-and high latitude regi ons of the earth; however the lines of equal correlation are slightly different. The autocorrelation functions of the geopotential heights generally decrease mo1·e slowly with an increasing separation distances in the farther distance than in the ne.ar distance.
It is important to consider these autocorrelations as a two-point function. The full corre lation arrays \:vere replaced by arrays of correlations averaged over intervals in the separati on distance before being fit to the autocorrelation mode l. In this way, the computer time for the fitting process can be reduced to an acceptable level, while much of the heterogeneity within these data can be eliminated. In calculating the distance averaged data for these au tocorre lation fu nctions, it is reasonable to use the classification of the . results based on the . Fig. 3.

195
Autocorrelations for height fields as a fu nction of separation distance (Km) among the station pairs for a total of 78 stations. (The climatological mean is chosen as the fi rst guess.) gradation of the distance. Usually the gradation of about 200-250 km was taken to be the unit interval. Points c. orresponding to negati\1e correlations are not appropriate for the curve fitting of the exponential fu nction, and therefore were not considered in the least-squared curve fitting process.
In the past few decades, the Gaussian fu nction p( r) a exp( -br 2 ) has been adopted to fi t the height-height autocorrelation data in the least squared sense. Improving the specifi cations of the autocorrelation model has been approached both observationally and theoreti c.ally, and extensi\1e literature now exits on the subject. A number of functional fo r1r1s have also been proposed for the continuous two-dimensional homogeneous isotropic correlation function.

The Least-squared Method for Construction of the Height Autocorrelation Model
In this paper, the new model that is proposed to analyse the distance averaged autocor relation data is a fu nction combining the polynomial and exponential fo rm.
where ma and mb are two arbitrary positive integers. The resulting sum of the squared errors between the correlation model and the correlation data is given by: 196 1�0, vTol. 7, No.2, June 1996 where ( d j , p j ) represent the characteristic distance and the associated distance averaged autocorrelation data respectively. B)1 differentiating Equation (6) with respect to a k and bk, From Equations (7) and (8), the (1na + m. b + 2) nonlinear algebraic equations can be solved simultaneously using the Newton-Raphson method by choosing the arbitrary initial \7a}ue for a k and bk. The final solution is considered to be consistent when both of the fallowing conditions are met: Max a �e r+ la � e r < 1 0 -5 w he re k 0 , ma Max b �ter+ lb � e r < 10 -5 w he r e k 0 , m b.
The convergence processes of the maximun errors for ak and bk are indicated in Figure  4. The curve fitting of the distance a\'eraged data for the autocorrelation model is shown in Figure 5 . The respective coefficients for lLk and bk of order 5 are also shown in Table I i=l where a i rep r esents the weighting factor at the observation point i and N is the·number of stations used for interpolation. The weights a i were determined from the required condition such that the above equation yields the best results when averaging is done over a large number of cases. In other words, the mean square of errors, represent a minimum, where zo is the true value at grid o, and the bar denotes the statistical averaging. Substituting Equation (9) into Equation ( 10) yields: ( 1 1 ) ( 1 2 ) ( 1 3) It can be shown that the conditions of Equation (13) are both necessary and sufficient for the minimum of E. This yields a system of sin1ulta·neous N linear algebraic equations to solve the weights of the associated stations.
In order to detenr1i11e the weights from the system, the mean product of Z i z j and z.i zo are required. However, since zo is 11ever known in the physical domain, it is better to use the expression below. Let ( 1 4 ) where the bar and the prime denote . the statistical ensemble mean and the deviation from its mean respectively. Substituting these into Equation ( 11 ) , it becomes N � z� zj a j -z� zb (i-1,2�···�N).
( 1 5 ) j== 1 Unlike Equation (13), in Equatior1 (15) these formulas have the mean products of the valu. es not of the elements themselves, but of the deviations from the ensemble mean. It should be noticed that when (15) is substituted into (11) and let the result divided by the variance C00, the relative interpolation error can also be given by This quantity can be denoted as a measurement of accuracy in the analysis scheme. Let 3?0, � i be the variance of the geopotential height fields on the grid point o and the observation point i respectively: Wang et al.

199
The autocovariance function can be defined as: ( 1 7 ) Similarly, the autocorrelation function can be expressed in terms of the autocovariance and variance., which is in the for111 of: Equatio· n (15) now becomes Pi j a j -Pio , ( 1 9 ) where i and j are the observing points, and o is the target grid point. Once the members of the observing points are known, then the coordinates of the grid point and the surrounding observing points are used to compute the relative distance among these points. The optimum weight a i associated with each of the observation points i are then computed from Equation ( 19). The deviation of the analyzed geopotential height can thus be obtained from: The analyzed z0 is the sum of the deviation and its respective climatological mean z0 , thus: (21) At this point, the analyzed geopotential height on the two dimensional grids can be obtained. In practice, for the univariate 01 scheme, there are two ways to construct the autocorrelation function in Equation ( 19). First, as for the explicit form, the autocorrelation in (19) is constructed by using the distance-dependent autocorrelation model. The optimum interpolation coefficients in (19) can be obtained directly from the computation of the rel ative distances among the stations and the grids. Second, regarding the implicit form, the autocorrelation function in ( 19) is constructed by using the observational data on the stations and the analysed data on the target grids; numerical weather prediction model output data are also valid accordingly. In fact, the explicit autocorrelation model (distance dependent autocorrelation model) is not a necessary requirement for the construction of the initial field analysis. The optimum interpolation coeffients associated with the specific grid can be ob tained directly from the nearby real observational data to compute the correlation functions among the pairs of grids and stations. However, in this way (i.e., the implicit form), the memory requirements would be large if the global grid data are computed. By using the distance dependent autocorrelation model, there are no such problems. 200 TAO, Vo l. 7, No.2, June 1996 The optimum interpolations can be easily done by calculating the relative distances among the grid and the stations. The optimum coefficients associated with the nearby stations can thus be solved from the simultaneous equati ons.

RESULTS
Excellent evaluation for the test analysis scheme was based upon the root mean squares between the observational data and the analysed data from the scheme previously discussed.
The univariate statistical analysis scheme with autocorrelation calculated in section 2 was applied at 500 hPa to compute the root mean square differences between observations and analysed values for the 108 cases involved.
Both the explicit and implicit forms are used under different conditions. In solving Equation ( 19), the observational data are used to compute the left hand side of the element Pi j while the proposed autocorrelation model is used to compute the right hand side of this element Pio . The analysed target grids include 25 stations, ranging from the observation index points 1 to 24, and the Taipei station was indexed as 58, the climatological mean was chosen as the first guess for the computation on the left hand side. The root mean square differences between the analyzed and observed values were computed for the selected 108 days involved.
Tab le 2 shows how the results depend upon the number of reports used to estimate the univari ate ge-opotential height at these target stations.
In Table 3, the relative interpolation error E' was computed using the observational data for the analyses of the previous 2 5 target stati ons. Since the-quantit)' denotes the measure ment of accuracy, the observational data instead of the explicit autocorrelation model would be Table 2. Root mean square differences of the geopotential height fields between the analysed and observed values(m). more appropriate for the computation of Pio in the relative interpolation error computatio11. It can be easily seen that the root mean squares and the relative interpolation errors definitely decrease as more stations are used in the analysis scheme. Gandin (1963) assumed that the root mean squares would decrease indefinitely as more stations are used in the interpolation process, but in fact all measurements are subject to error. Instrumental errors especially are always inherent in instrument or manual operations. Additional reports in the estimation improve the accuracy of the estimate but only up to a certain limit. Normally, for a report providing high-quality data, most of the information pertinent to the analysis can be obtained . from the nearest 6-8 stati ons. The addition of more �tations does not significantly improve the estimate. The previous re sults can also be shown in Tables 4 and 5 which are the analysis of the relative interpolation error computation for the u and v components of the wind fields.    The 500 hPa geopotential heights at Ta ipei 'Ai'ere analysed based upon a single obser vation of height. The experiments were conducted repeatedly, and each time the observation station report was chosen to be farther from Ta ipei. From Figure 6, it is seen that as the distance of the station is farther from Taipei, the root mean square is increasingly higher than the previously near station. This implies that the more distant observations are only slightly correlated with the grid points and have little influence on the estimates.

Number of Stations
It is important to make a comparison be.tween the classical Gandin fo rm and the new proposed mixed-type form . The distance averaged autocorrelation functions of the 500 hPa geopotential height fields are fitted to the negative e. xponential form as proposed by Gandin  1963). Figure 7 shows the results with a= 0. 87654 and b = 0.31 543; the optimum constants are determined by the same least square method as that of the mixed-typed autocorrelation models. Accordingly; it is obvious that due to the fi xed form of the negative exponential autocorrelation model, the curve fitting is not as good as the new proposed mixed-type autocorrelation 1nodel. In order to determine the superior capability of the two forms in analyzing the geopo tential height fields, the two autocorrelation models were also utilized to compute the root mean squares between the analysed and the observational values. In Figures 8 and 9, the eight stations are used for interpolation. The root mean squares for the i-th station were computed using the formula: 108 Sig( i)j =l Figure 8 is for the classical Gandin form, while Figure 9 is for the proposed mixed typed form. Comparing the two fi gures, it is evident that the amplitudes of Figure 9 are smaller than those of Figure 8 in estimating the root mean squares of the total 78 stati ons. This implies that the proposed mixed-typed autocorrelati on model has a better capability in analyzing the geopotential height fields.

CONCLUSIONS
. The statistical structure of the three meteorological variables (h, u and v) have been derived by computing the autocorrelation and cross-correlation fu nctions, and the geopotential height autocorrelation pairs among the 78 stations have also be. en computed. These pairs were grouped in 200-kilometer distance intervals fr om the range of zero to six thousand ki lometers. The re sulting autocorrelation . fu nctions were then plotted against station separation; the new autocorrelation model was fitted to the scatter diagram in the least squared sense. The. fo llowing conclusions can be made: (1) The scatter pattern shows that the clustering does normally di stribute along the expo nential function patterns with the magnitude of the correlation coefficients decreasing slowly with distance.
(2) Zero correlation coefficients occur at separations of some three thousand and eight hun dred kilometers. The distance averaged data are shown to spread along the exponential '"11ang et al. 205 fu nctions, a finding can be determined by the new autocorrelation model in the leas� square sense which is a function of the relative distance only. (3) The statistical correlation functions of the geopotential heights and wind components derived from the lowand mid latitudinal regions have been found to be extremely useful in the initial field constructions for numerical weather prediction. The original data base are from historical observational data. Thus, it is likely that the autocorrelation model for the geopotential height fields would definitely reflect the statistical properties in this research area. By adding the order of the polynomial terms, the autocorrelation model proposed in this paper proves to be more powerful in tracing the finer structure of the autocorrelation fu nctions, especially in the low latitudinal region where the large area is surrounded by the sea. The accuracy of the autocorrelation model is important fo r the full range of separation distances between the target points and the interpolation stations. (4) Because of the similar patterns in autocorrelation data distributions for the mid-and low latitudinal regions, it can be anticipated that the algorithms for the autocorrelation model are also valid for different la ti tu des of the earth as well.