A Statistical Algorithm for Clear-sky Temperature and Humidity Retrievals from TOVS Radiance

In this study a regression algorithm was developed to retrieve the at­ mospheric parameters from TOVS (TIROS Operational Vertical Sounder) clear-sky radiances. The development of this algorithm requires calcula­ tions, at RAOB (RAdiosonde OBservation) sites, of the cross-covariances among fields obtained from rawinsonde observations, satellite radiances, and forecast products. The data were collected in every assimilation cycle for three months from May to August 1997. The retrievals were verified statistically with the co-located RAOB measurements for two periods of experiments. The results show that the bias is comparable to the first guess. Compared with the forecast, the root mean square errors of the retrievals in the troposphere were better by about 1-1.5 K in temperature and 0.2-0.4 g/Kg in the mixing ratio. The root mean square error of retrievals at surface showed an improvement of 0.5-1 K in temperature and 1.5-2 g/Kg in the mixing ratio. The results during the August period were noticeably better than in the June period. This implies that the retrieved profiles were closer to the RAOB profiles than the 12hour forecast profiles were. The contour field of the retrievals showed a slight correction on the first guess field both in temperature and the mixing ratio except in the sub­ tropical area. The results are also encouraging in that the retrievals in this case had a local coherence both in the horizontal and vertical. (

(NWP).For example, the SATEMs (satellite temperature and humidity profile) generated by NOAA/NESDIS (National Environmental Satellite, Data and Information Service) are used in many operational NWP centers, such as the Central Weather Bureau (CWB) in Taiwan.Though investigations have shown that the SATEMs have a positive impact on the southern hemi sphere and a neutral or negative impact on the northern hemisphere, the data show a system atic, air mass dependent bias arising from the included statistical or climatological component in the SATEMs (Andersson et al., 1991).Hong (1999) has recently presented the impact of SA TEMs on the operational limited area model in the CWB and has concluded two major points.First, the objective analysis which includes the SATEM shows a low correlation at levels below 500 hPa.This is due to the low resolution of the SATEMs in the lower tropo sphere.Second, there is significant analysis increment around cloud areas, which may be due to the systematic bias of SATEM data in cloudy regions.
Recent progress in satellite data assimilation is due to two distinct features from earlier approaches.First, the analysis equation is written in nonlinear form, which must rely on inten sive computer power (Eyre et al., 1993, Chou 1998).Second, the measured radiances are used instead of retrieved profiles, thereby ensuring error covariances used in the assimilation are being managed properly (Andersson et al., 1994, Derber andWu 1998).However, the regres sion retrieval scheme has the advantage of simplicity, and the retrieved profiles are convenient as they have the same physical unit of predictive variables as the NWP model.
In general, for the regression approach, the desired solution can be written_ in finite differ ence form as:

T=Cx,
where T is a matrix whose columns are the temperature differences from the mean for one particular profile out of a large sample; x is a matrix.whose columns represent the brightness temperature differences for a given profile; and C is the matrix of solution coefficients which are given from a large sample of observed brightness temperatures and initial guess and which can be expressed as:

C=S T xsx-1'
(1) where STx and Sx stand for sample cross-covariance and covariance matrices.In general, re gression retrieval methods can be considered as being made up of two primary components: a method of selecting the initial guess, and a retrieval algorithm which modifies the initial guess.
For example, several techniques based on constrained regression have been developed and discussed for satellite meteorology application (Crone 1996).With regard to the initial guess, it has been common practice to use climatology (Smith et al., 1979).However, the initial guess profiles used in the retrieval are spatially correlated and the correlated errors can propagate to the retrieved profiles, which means isolating retrieval errors is made impossible.Thus, the NWP forecast results, which provide a better estimate than climate or a library of profiles, are recommended as the initial guess fields (Hayden 1988, Eyre et al., 1993).
In this paper, a retrieval algorithm that takes advantage of the existing analysis system as well as the real-time ingestion of radiance data from the NOAA's polar orbiting TIROS-N series satellites is presented.The retrieval scheme proposed by Kim (1993) is based on statis tical regression and requires information from the cross-covariances among the observed and computed radiance, temperature/humidity profiles from rawinsode as well as forecasts.The computed radiances are simulated by using a radiative transfer model (called RTTOV) devel oped by Eyre (1991), and the first guess profile is the 12-hour forecast products from the operational Limited Area Forecast System (LAFS, Yeh et al., 1994) in the CWB.
This paper mainly focuses on the evaluation of the performance of the regression retrieval system in the CWB LAFS.A description of the algorithm, data processing, and the statistics for retrievals versus co-located RAOB (RAdiosonde OBservation) measurements are presented in the following sections.

METHODOLOGY AND DATA PROCESSING
Figure 1 presents the flow chart of the regression retrieval procedure from TOYS radi ance in the LAFS.The procedure is composed of three major components.First, the computed radiances are simulated by the radiative transfer model based on the atmospheric profile pro vided by the LAFS's 12-hours forecast.The radiative transfer model used here was the Euro pean Center for Medium-Range Weather Forecasts' (ECMWF) forward model called RTTOV (Eyre 1991).The information of the cloud fraction used in the forward model was estimated by the observed AVHRR (Advanced Very High Resolution Radiometer) pixels within the HIRS (High resolution Infrared Radiation Sounder) field of view (jov).Since only clear cases are presented in this paper, the complexity of the estimation of cloud information has been greatly reduced.Second, the difference between the observed radiances and computed radi ances were taken as the observation increment.Third, the observation increment in radiances was applied to retrieve the profile increment from the first guess.The term of first guess was defined as the LAFS' s 12-hours forecast here.
The use of locally ingested radiance data provides important advantages in three ways: (1) The NWP center can understand error sources attributed to satellite data processing proce dures, such as cloud clearing, limb correction and bias corrections.(2) The NWP center can exploit full resolution AVHRR data in screening cloud affected HIRS sounder data.(3) The NWP center can monitor radiance data in the region and easily check its quality.

Algorithm for Regression Retrieval
Here we used the constrained least-square estimation proposed by Kim (1993) to retrieve the profile increment (the details are shown in Appendix): (2) where T' is the retrieved profile increment from the first guess which is defined as the LAFS's 12-hours forecast; x is the observation increments in brightness temperature (HIRS channel 3 -8 and 10 -15), which is the difference between the observed brightness temperature and the computed brightness temperature from first guess; y is the increment in atmosphere profiles, which is the difference between RAOB and the first guess.
There are 18 elements in the profile increment ( T' and y) vectors which are composed of 9 mandatory pressure levels in temperature ( 1000 -200 hPa), 7 levels in the mixing ratio ( 1000 -300 hPa), surface temperature, and the surface mixing ratio.Thus, (2) provides a simulta neous retrieval of the temperature and humidity profile as well as surface information.The matrices in (2) are defined as follows: where R 0 :measured brightness temperature; R b :computed brightness temperature from the NWP profiles; R r :computed brightness temperature from the RAOB profiles; P h :the NWP profiles of T and q; P r :the RAOB profiles of T and q; () stands for the sample mean; cov(a) represents the autocovariance matrix of a; and cov(a,b) is the cross-covariance of variable a and b for a large sample collocated at the RAOB site.
Since the matrix Eis the covariance between the computed radiance from the RAOB and first guess, it represents errors contained in the radiative transfer model and the NWP forecast.The scalar A in (2) is the weighting of E relative to S x x .In the case of A=O, the number of covari-ance matrices in (2) is reduced to two, and then the coefficient matrix in (2) is similar to (1).
This implies that the errors produced by the radiative transfer model and the NWP forecast were ignored.The value of A can be determined statistically.Kim (1993) suggested that A could be obtained via cluster analysis.We set A=l in this paper.The value was somewhat chosen arbitrarily Obviously, there are many weaknesses in the statistical regression algorithm.
(a) The linear form in (2) is unable to resolve the nonlinear characteristics of the atmosphere.
(b) The least-square solution tends to underestimate deviations in the predictands from their mean when the predictors have measurement errors.
(c) As the regression coefficient is determined in a certain period and area, the retrievals may be constrained by the specific weather pattern and may therefore be difficult to correct when such bias exists.Thus, it is necessary to update the error covariances to improve the results.
( d) A long period of time is required to collect the necessary samples because the construction of the covariance matrix in (2) relies on a large number of RAOB measurements.
On the other hand, the regression method in (2) also has several advantages.
(a) E and Sxx are symmetric and positive definite such that Sxx + 'AE is non-singular and its inverse can be solved efficiently.
(b) Since the calculation in ( 2) is only the multiplication of the matrices, high computing per formance can be reached once the covariance matrices are determined.
(c) (2) uses the NWP forecast as the initial guess, which can be expected to provide a better estimate than climate or a library of profiles (Eyre 1988).
(d) The rawinsonde data, which represents the ground truth, were used in (2) to construct the covariance matrices.Thus, the retrieved profile can be statistically expected to be closest to the rawinsonde (truth).

(e) E and S xx
implicitly contain a systematic error in the NWP forecast as well as in the forward radiative transfer model.This is especially important due to the great uncertainty in the surface information.
(f) Since the solution in ( 2) is the retrieved profile increment and has the same physical unit of predictive variables, these are the most suitable to ingest to the OI (Optimal Interpolation) module as an independent measurement, which is able to avoid the correlation between the background and observation error..... \ ..... .cific Ocean play an important role to in determining the weather system around the East Asian area, for example, the movement or evolutions of a Mei-Yu front system during the early summer, a typhoon track and even thermal thunderstorm activities in summertime etc.How ever, it is clear that there are not enough observations over the ocean area to correct forecast errors.Figure 2 also shows the image of the observed radiance of HIRS channel 8 in two typical consecutive paths of the NOAA-12 polar orbiting satellite when it passed over Taiwan.

The NWP Model
In order to construct the covariance matrix in (2), it was necessary to collect the measured radiances and profiles from NWP forecasts and RAOB measurements long enough to ensure statistical stability.Since the error covariance matrices in (2) were constructed at the RAOB site, the NWP forecasts were interpolated from the model grid to the RAOB station.The representative brightness temperatures of TOYS observations at the RAOB site are the aver age of the possible clear fovs which are located within the area of 0.5° centered at the RAOB site.In the meantime, not only did these data have to be collocated correctly, but also they were subject to quality control.The pre-processing of the data set is discussed in greater detail in the following sub-section.

Measured Radiance
The TOYS and A YHRR data were extracted from the raw High-Resolution Picture Trans mission (HRPT) data that are operationally received by the Meteorological Satellite Center in the CWB.The available satellite paths over Taiwan were chosen within a time window of ± 3 hours at synoptic times, OOZ and 12Z.Multi-channel measurements were required in (2) to retrieve the profile increment.Table 1 presents the descriptions of the characteristics of the TOYS channels.In this study, only HIRS channels 3-8 and 10-15 were considered.
One of the most important concerns in satellite measurement is the effect of clouds.Of particular concern in this study was determining which HIRS pixels were cloud-free.The HIRS instrument resolves a circular area that is 17.4 km in diameter at the satellite subpoint, whereas the A YHRR resolves a circular area that is 1.1 km in diameter at the satellite subpoint.
Thus, each HIRS fov can collect about 300 -450 A VHRR pixels depending on the viewing angle.The statistics of the A VHRR pixels within each HIRS fov can then provide the opportu nity to estimate the cloud information of that HIRS fov.
In this paper, we adopted the method proposed by Aoki (1980) which involves matching and extracting the A VHRR pixels in each HIRS fov.The methodology to estimate the cloud information followed that of Kim and Chou ( 1995) and Kim ( 1996).Spatial cloud distribution was not of interest here, but the fraction of cloud coverage was.The original design in Kim ( 1996) involves computing the smoothed density function of the A VHRR pixels in each HIRS fov and finding the local minimum as a threshold in order to separate one cloud mass from another.Thus each cloud mass has a unique representative value which implies a unique cloud top height The simulation results in Kim (1996) show that the method is capable of estimating the proportion of clouds with different cloud-top heights within a HIRS fov.Following Kim's algorithm, the density function of the AVHRR pixels should exhibit a single peak pattern when the atmosphere is cloud-free or overcast.Figure 3 is an example of the density function of the AVHRR pixels with a single peak in a TOVSfav.A total of 324 AVHRR pixels were • �� : .�ti t?4WJ: �l�� ; _ i�� ����H��!�{g: Q;:�: : • .: � ; \_\: : -� !: �:.\ �J�i)� -;�� ::;: gt:::J.J?� CQif ):/.: .: YPV���� )' .The figure shows most of the cloudy area was in fact removed.

Computed Radiance
The forward model used in this study was the RTTOY that was first designed for a satel lite sounding system (Eyre 1991).This model requires an input profile vector of 67 elements, including temperature at 40 fixed pressure levels, mixing ratios at 15 fixed levels, skin tem perature, surface temperature, surface mixing ratio, and surface pressure.These input profile vectors were interpolated or extrapolated from the NWP grid data or RAOB profile.
Figure 5 is the scatter diagram of the measured versus computed brightness temperature which was calculated from the RAOB profile during 20 June to 15 August 1997.HIRS chan nels 3 -8 and 10 -15 in Fig. 5 were used in the study (the characteristics of the TOYS channels are described in Table 1).The exclusion of HIRS channels 1 and 2 was based on the fact their peak level of the weighting function was far from the retrieval levels of interest (lower than 100 hPa) in (2).Channels 18 and 19 were also excluded because the reflection of the sunlight in daytime had a large impact on the observed radiances.A bit of scatter in channels 11 and 12 (water vapor channel), as shown in Fig. 5, was probably due to the inappropriate treatment of the humidity profile.The small bias and root mean square error shown in Fig. 5 not only implies a good performance on the radiative transfer calculations but also provides a good basis for statistical regression retrieval.

3.RESULTS
Two series of experiments were designed to demonstrate the performance of the regres sion retrieval.In the first experiment, hereafter referred to as the June experiment, the neces sary data from a total of 248 clear samples were collected at the RAOB site from 15 May to 20 June 1997 to construct the error covariance matrices.The statistics of the error covariance matrix became stable as the sample size became greater than 150 (not shown).Using the covariance constructed in the June experiment, the regression retrieval experiment collocated at the RAOB site was processed from 20 June to 1 July 1997 and verified by the RAOB measurements.In the second experiment, hereafter referred to as the August experiment, the covariance matrices were constructed using the data collected from 20 June to 1 August 1997 (239 cases), and the retrieval results were verified from 1 August to 15 August 1997.The two experiments represent two kinds of weather regimes.The June experiment was sampled dur ing the Mei-Yu season, which is the seasonal transition period and the atmosphere is more convectively unstable (Chen 1983(Chen ,1991)).The August experiment was sampled during sum mer time that was mainly dominated by the subtropical High and tended to be much more stable.Figure 6 is the regression coefficient: C =S [A.E+S J-1, for the June and August ex- periments, where the sub-script x denotes the channel index (HIRS channel 3 -8, 10 -15) and y is the profile index (temperature and mixing ratio for mandatory and surface levels).The figure shows that the coefficients in the June experiment either at peak value or the square summation were larger than those in the August experiment.The distribution of the coeffi cient was rather smooth for the August experiment.Because of the weather-regime depen dence of the regression retrieval, the two experiments provided the opportunity to highlight the differences in the regression coefficients under various weather regimes.
Figure 7 is the bias of the 12-hour forecast error (defined as the RAOB minus the first guess) and the retrieved profile (defined as the RAOB minus retrievals) for the temperature and mixing ratio in the June experiment (86 samples).Figure 8 is the August experiment (117 samples).Figures 7 and 8 show that the LAPS has a cold/wet bias through the troposphere, a  ,.,.,  ,..., ,.,..

---'
------�--"-t"-"?J'!-.�=• Tole 01000 0050 0850 QTOO 0000 0400 """' TIOOO ,... ,.,.,, noo. � , --7_ 1 __ ..,,_ __ l;------.warm bias in the near-surface layer, and a cold/dry bias at the surface layer.The bias of the retrieved temperature profile is larger than the forecast errors below 500 hPa in Fig. 7 but closer to the forecast errors through the entire atmosphere in Fig. 8.The bias of the retrieved mixing ratio profile is quite close to the forecast errors in Figs.7 and 8.That the retrieval bias is in the "vicinity" of the first guess is not surprising because the retrieval procedure is first guess dependent.Figures 9 and 10 are the root mean square errors of the June and August experiments.The reference ground truth is supposed to be the RAOB measurements.The figures show that the root mean square error of the fo recast temperature error is about 1-1.5 K within the tropo sphere and up to 2.1 Ka t the surface.However, the retrieved profile is only 1 K through the troposphere and 1.4 Kat the surface, which is significantly smaller than the forecast error.The retrieved mixing ratio profile increment also has an improvement of 0.2-0.4g/K � in the tropo sphere and up to 1.2 g/Kg at the surface.This improvement implies that the retrieved profiles are closer to the RAOB profiles (as truth) than the 12-hour forecast profiles.
Because the weather system during summer is primarily dominated by the sub-tropical High, which is more steady and stable, the root mean square error of the forecast errors in Fig. 10 is reduced substantially, especially the mixing ratio in the mid-troposphere .The figure shows that the smaller root mean square error of the retrievals up to 0.5 Kin temperature and less than 0.4 g/K g in the mixing ratio is achieved.The root mean square error of retrievals at surface has an improvement of 1 K in temperature and 2 g/Kg in the mixing ratio.The results are noticeably better than those in the June experiment.
Figure 11 shows the contour field of the retrieved temperature on 700 and 500 hPa for every 3 HIRS fov on 0928 LST, 27 Aug 1997.As expected, the patterns of both the retrieved It is reasonable that the modification of the retrievals relative to the background is limited if the forecast error is small (Eyre 1989).In spite of the limited correction, the figure shows that the retrieved field at 700 hPa enhances the temperature perturbation near Japan and Korea.This figure also shows that there exists a larger difference between the retrieval and background fields in the sub-tropical area (south of 30°) on 700 hPa and 500 hPa.This is common in other cases (not shown here).Such a geographic dependence on the retrievals may be due to the different air mass characteristics between the mid-latitude and subtropical areas.
The differences in the mixing ratio field (Fig. 12) on 500 hPa are more prominent than 700 hPa.This figure also shows that the retrieval field tends to weaken the moist tongue near N through 700-500 hPa.The results from this case study are encouraging in that the regres sion retrieval algorithm exhibits characteristics which are locally coherent in the horizontal and vertical.Such characteristics are important to ingest the retrieval products in the analysis module.

SUMMARY AND CONCLUSIONS
In this study a regression algorithm was developed to retrieve the atmospheric parameters from the TOYS (TIROS Operational Vertical Sounder) clear-sky radiances.To construct the covariance matrix, this retrieval system requires a database, which collects the measured radi ance, the profiles from the NWP forecast and RAOB observations.The RTTOV radiative transfer model used in the European Center for Medium Range Weather Forecasts (ECMWF) was chosen to simulate the computed radiance.The regression retrieval system and data pro- Clouds are an important factor in the radiative transfer processes in the atmosphere.In this paper, the authors combined the AVHRR and HIRS data to estimate the cloud information in a HIRSfov and determine which HIRS fov is "cloud-free".Two experiments were designed to show the retrieval performance, and were statistically compared with the co-located RAOB measurements.The results show that the bias was comparable to the first guess.This is not surprising because the retrieval is first guess dependent.The root mean square errors of the retrievals in the troposphere were better than the forecast by about 1-1.5 K in temperature and 0.2-0.4g/Kg in the mixing ratio.The root mean square error of retrievals at the surface had an improvement of 0.5-1 Kin temperature and 1.5-2 g/Kg in the mixing ratio.The results of the  Mixing ratio (g/Kg) Fig. 12. Same as Fig. 11, but for mixing ratio.The con tour interval is 0.5 g/Kg.
August experiment were noticeably better than those in the June experiment.This improve ment implies that the retrieved profiles were closer to the RAOB profiles than the 12-hour forecast profiles were.The contour field of the retrievals for the case on 0928 LST 27 Aug 1997 shows that the temperature enhanced the thermal ridge and weakened the moisture tongue.The differences between the retrieval and background fields in the sub-tropical area (south of 30°) were larger than in the mid-latitude area.This phenomenon has also been noted in other cases and there fore not a special case here.The reason may be due to the different air mass characteristics between the mid-latitude and subtropical area.
This paper focuses mainly on the evaluation of the performance of the regression retrieval system in CWB LAPS.The pre-processing of cloud-cleared radiances has also been discussed.

Fig. 1 .
Fig. 1.Procedure of the regression retrieval from the TOYS ra diance in the LAFS System.
One of the important components in (2) is the first guess supported by the NWP model.The model used here was the Limited Area Forecast System (LAPS) which is operated daily at the Central Weather Bureau (CWB) in Taiwan.The LAPS is a hydrostatic PE model incorpo rating various physical parameterizations: the Kuo scheme(Kuo 1974) for cumulus param-eterization, long wave and short wave radiative transfer calculations following Harshvardhan et.al. (1987) and the turbulent kinetic energy (TKE) dissipation scheme for planetary bound ary layer (PEL) parameterization (Detering and Etling 1985).Thi.'{ vertical coordinate, defined by the terrain pressure at 20 levels, is used in the model.The governing equation is horizon tally differentiated on the Arakawa-C grid system with a coarse mesh domain of 161x121 and nested fine mesh size of 91 X9 1.The resolution is 60 km in the coarse domain and 20 km in the fine domain.Thus, the entire model domain covers most of Asia and the Western Pacific Ocean, as shown in Fig. 2. The objective analysis in the LAPS is the OI scheme with an update cycle of 12 hours.Most of the upper air observations in the LAPS are RAOB data.The distribution of the RAOB sites within the LAPS domain is shown in Fig. 2. The figure also shows the rarity of RAOB measurements over the ocean area.In some circumstances, the subtropical High over the Pa-NOAA-12 HIRSCH 8 972031058 I ----� .--------.: ---.---.:

Fig. 2 .
Fig. 2. HIRS channel 8 brightness temperatures (K) in two consecutive paths of the NOAA-12 satellite when it passed over Taiwan in 09 18 LST and 1058 LST June 22 (Julian Day 203 as shown in the upper-right corner), 1997.The outer and inner boxes are the coarse and fine domains in the LAFS, respectively.The heavy dots are the RAOB site used in the LAPS assimilation system.
---+-------+--------ir---------l Surface temperature 19 TOYS fov.The figure shows the homogeneity of the A VHRR brightness temperatures with the mean of 292.49K and the standard deviation of 0.43K.However, it was necessary to further distinguish whether the HIRS fov was a clear or overcast case when the density function of the A YHRR pixels exhibited as a single peak pattern.The HIRS fov could be regarded as overcast if the difference between the observed and computed (from the NWP forecast) brightness temperatures in HIRS channel 8 was greater than 4 K. Figure4is the distribution of clear TOYS fov sifted according to the above algorithm in the case of Fig.2.

•Fig. 3 .Fig. 4 .
Fig. 3. Density plot of the AVHRR chan nel 4 brightness temperatures in a HIRS field of view.The horizontal and vertical axes are the brightness temperature (K) and density, re spectively.A total of 324 AVHRR pixels were collocated in the HIRS field of view .

Fig. 5 .
Fig. 5. Scatter diagrams of the measured HIRS brightness temperature (K) ver sus the computed brightness temperature (K) which were calculated from the RAOB profile.The sample number, bias (stands for mean), and root mean square errors (stands for RMSE) are shown in the bottom right comer of each panel.

Fig. 7 .Fig. 8 .
Fig. 6.Distributions ofregression coefficients, C =S [ 'AE+S ]-1, for the 12 chan-Y-" yx xx nels (x, xis the channel index) with respect to the components in the profile (y, y indicates the profile index, see details in context).Thick line: June experiment.Thin line: August experiment.The horizontal axis is the value of the coefficient and the vertical axis is the profile index.SM6 and SM 8 are the sum of the squares of the coefficients for the y index in the June and August experiments, respectively.

Fig. 11 .
Fig. 11.Temperature field of 700 hPa (a) and 500 hPa (b) for the path on 0928 LST, 27 Aug 1997.The dashed line is the 12-hour forecast field and the solid lines are the retrieved field.The contour interval is 0.5K.The dots represent ev ery three HIRS fovs available in the path.

Table 1 .
Characteristics of HIRS channels (adapted fromSmith et al., 1979).The shading indicates the channel used to retrieve the atmospheric sound mg.