Effects of Domain Selection on Singular-Value-Decomposition Based Statistical Downscaling of Monthly Rainfall Accumulation in Southern Taiwan

A singular-value-decomposition (SVD) statistical downscaling technique was developed for monthly rainfall over southern Taiwan. The statistical model was applied to seven different general circulation models. Seven different geographical domains for the large-scale atmospheric predictors were tested and their effects on rainfall projections were evaluated. Because different climate models indicate different future rainfall projections, a multi-model ensemble approach was applied to provide best guess estimates. Using the multi-model ensemble, and a range of metrics, it was found that the different predictor geographical domains had little influence on the projected monthly rainfalls. Two emission climate change scenarios (A1B and B1) were used to project the future rainfalls for the period from 2010 to 2045 across southern Taiwan. Overall, future rainfall shows an increasing trend during the May-to-October wet season and a decreasing trend during the November-toApril dry season.


InTRoDucTIon
Rainfall has become more extreme in recent years.Climate change is one of the reasons and a critical issue not only for climatic research but also for hydrological research (Tolika et al. 2007;Labraga 2010).The Fourth Assessment Report of the United Nations Intergovernmental Panel on Climate Change (IPCC 2007) points out that possible extreme events of rainfall become more serious under climate change and lead to flood or drought.Therefore, the impact assessment of climate change on rainfall has become an important world-wide issue.
To infer the variability of future rainfall, general circulation models (GCMs) are used as the primary tool to tackle this issue through the help of a range of plausible future emission scenarios (Chu and Yu 2010).Since GCMs have coarse grid resolutions, the output data from GCMs cannot present local characteristics.Therefore, downscaling methods are used to deal with this problem.The purpose of downscaling is transforming the information of coarse resolution to that of finer resolution.There are two categories of downscaling methods: statistical downscaling and dynamic downscaling.Statistical downscaling methods construct a statistical relationship between large-scale GCMs outputs (i.e., atmospheric variables) and local weather variables; dynamic downscaling methods employ high-resolution regional climate models nested in a GCM to obtain local weather variables (Chen et al. 2010).The statistical downscaling methods are frequently used due to its low computational consumption.
Before using statistical downscaling methods, users have to decide the large-scale atmospheric variables of GCM, local weather variables, statistical methods, and geographical domains.Large-scale atmospheric variables of GCM are usually used as predictors and local weather variables as predictands.Statistical methods rely on a stable relationship between the predictor and the predictand.Moreover, the predictors have to be well simulated by GCMs (Wilby et al. 1999).The choice of predictor and predictand is largely driven by reliable existing observations and user needs.Generally, large-scale atmospheric variables for projecting local rainfall include geo-potential height (Timbal et al. 2003), sea level pressure (Chu et al. 2008), and wind speed (Murphy 1999;Haylock et al. 2006).
In addition to the choice of predictor and predictand, a statistical method requires the determination of the geographical domain for the large-scale atmospheric predictors.The domain size applied to predictors to optimize the recognition of synoptic systems from unnecessary background noise is a key parameter of the statistic model (Timbal and McAvaney 2001).Timbal and McAvaney (2001) and Benestad (2001) indicated that different geographical domains may cause different results of temperature downscaling.Benestad (2001) also suggested that the geographical domain should change with different seasons.Benestad (2004) further suggested selecting a suitable geographical domain where the boundary with a zero correlation coefficient between a predictor and a predictand.Wetterhall et al. (2007) chose the proper geographical domains for different seasons for rainfall downscaling.Paul et al. (2008) used the meteorological point of view to explain their selection of geographical domains.The above studies found that the geographical domain affects the downscaling results of rainfall or temperature.In this work, different geographical domains for the large-scale atmospheric predictors were tested; and, their effects on rainfall projections were evaluated.
Various statistical downscaling schemes have been developed for constructing the relationship between the predictor and the predictand, such as multiple regression models (Wilby et al. 1999), artificial neural networks (ANNs) (Harpham and Wilby 2005), canonical correlation analysis (CCA) (Juneng et al. 2010), support vector machine (Yu et al. 2006), and the singular-value-decomposition (SVD) based statistical downscaling method (Chu et al. 2008).The SVD-based statistical downscaling method was used successfully in Taiwan (Chu et al. 2008;Chu and Yu 2010) and the East Asia (Paul et al. 2008), which was adopted in this work for downscaling monthly rainfall accumulations.
This paper describes an application of a statistical downscaling technique to southern Taiwan.A key focus is on the validation of the statistical downscaling model and in particular the geographical domain required for the predictors.The rest of this paper is organized as follows.Section 2, "Study Area and Data Set," provides a summary description of the study area (i.e., southern Taiwan) and the data set.Section 3, "Methodology," describes the SVD-based statistical downscaling model and the evaluating metrics.Section 4, "Results and Discussion," describes the validation of the statistical models, the determination of the geographical domain required for the predictors, and the effect of climate change on rainfall projections under two emission scenarios (A1B and B1).Finally, Section 5, "Conclusions," sums up and offers a direction for future work.

STuDy AREA AnD DATA SET
The southern environs of Taiwan with an area of nearly 6000 km 2 , including Chiayi, Tainan, Kaohsiung, and Ping-tung counties, was chosen as the study area.In this area, there are four important reservoirs (i.e., Tsengwen, Nanhua, Akungtien, and Mudan Reservoirs) which are mainly for water supply.Due to the significant difference of rainfall between the May-to-October wet season and the Novemberto-April dry season, rainfall in the wet season has to be retained in reservoirs to prevent a shortage of water supply in the dry season thus providing a large challenge with regard to water supply allocation in this area.
The data set used in this study includes the local weather variable (i.e., the monthly rainfall) and large-scale atmospheric variables of GCM (i.e., the sea level pressure and the meridional wind field at 850 hPa).The monthly rainfall was used as the predictand in the statistical downscaling model.The statistical downscaling model was performed at each of the 82 rainfall stations in the study area.Figure 1 shows the spatial distribution of these rainfall stations.The spatially-averaged monthly rainfalls during the period from 1975 to 2000 were calculated using the Thiessen polygon method (Thiessen 1911).The temporal distribution of mean monthly rainfall is shown in Fig. 2. From this figure, the study area receives temporally uneven rainfall.Around 85% of the annual rainfall occurs during the wet season but only 15% of the annual rainfall occurs during the dry season.
The large-scale atmospheric variables of GCM include two variables which were used as the predictors in the statistical downscaling model.The two large-scale atmospheric variables on a monthly scale are the sea level pressure (SLP) and the meridional wind field at 850 hPa (v850).The statistical downscaling model was applied to seven different GCMs under the 20 th Century Experiment Scenario (20C3M) and two future emission scenarios (i.e., A1B and B1).Table 1 lists the information about the seven GCMs.
The data from the SLP and v850 from the outputs of seven GCMs under 20C3M for the period from 1975 to 2000 were used for downscaling rainfall which was served as the baseline rainfall projection for comparing future rainfall projections.For projecting the future rainfalls, the data from the SLP and v850 from the outputs of seven GCMs under A1B and B1 for the period from 2010 to 2045 were used.The Program for Climate Model Diagnosis and Intercomparison (http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php)provided the aforementioned data for this work.
In this study, seven different geographical domains for the large-scale atmospheric predictors were tested and their effects on rainfall projections were evaluated.

Statistical Downscaling Method
One of the strategies for our downscaling study is to relate local rainfall to the observed large-scale predictors and build a downscaling scheme based upon the assumption that GCMs do not well simulate interannual variability found in observed large-scale variables (Cheng et al. 2008).Downscaled results will be obtained by projecting predictors of GCM outputs to the scheme.However, when the strategy for downscaling is applied, the systematic biases in a given model will not be removed.The systematic biases found in a given model may cause more uncertainties when the state of future climate is estimated.Nevertheless, the effect of systematic biases for a given model on climate change can be modified when downscaling is made by conducting fitting equations between a predictand and a predictor of model output (Feddersen et al. 1999).Therefore, in the present study, downscaling schemes are constructed by means of relating local rainfall to a predictor of a given model output.Similar procedures were provided by Nishimori and Kitoh (2006).First, the time series of regional rainfall and large-scale variables are reconstructed through the use of their respective empirical orthogonal functions and principal components to filter most spatial noise.Then, the SVD is applied to extract coupled patterns between regional rainfall and large-scale variables which can be expressed in the following equations: Here the total number of SVD modes is denoted by m.The large-scale circulation anomaly field, Z predictor (t, x), and the observed station rainfall anomaly field, Z predictand (t, x), are normalized.U i (x) and R i (x) denote the singular vector of the predictor and the singular vector of the predictand, respectively, in the i th mode.S i (t) and K i (t) indicate the time series of expansion coefficient of the i th SVD mode for the predictor and the predictand.Finally, the results of downscaling will be obtained by applying GCMs data, which are based on different scenarios.The transfer functions are as follows: Here PRJ(t, x) represents the downscaled projection and n denotes the total number of the SVD modes retained.In this study, the leading 10 modes are retained.The details of the downscaling method are given by Kim et al. (2004) and Feddersen and Andersen (2005).For a given month, the downscaling procedure is applied respectively to a selected predictor for each GCM which means that fitting equations are built separately by GCMs, by predictors, and by months.
The data of each of the seven GCMs, the outputs of which are driven on the basis of 20C3M, are subjected to the same process.For the sake of keeping the downscaling scheme from over-fitting, the method of cross validation is applied to generate a set of 26-year downscaled data for validation (Michaelsen 1987).The cross-validation method excludes one set of data during the construction of a statistical scheme and subsequently uses the model to project the value of the predictand that was excluded from the model calibration.
The process of excluding one data point is repeated N times, here N is the total number of year of observation records.
Following that sequence, a set of independent projection data with N time series will be generated (Benestad et al. 2008).

Evaluation Metrics
In order to compare downscaling results, three evaluation metrics, which include the Gerrity skill score (GSS) (Gerrity 1992), root mean square error (RMSE), and mean absolute percent error (MPE), were used.
GSS is a skill score for a categorical deterministic forecast recommended by the World Meteorological Organization (WMO) in the standardized verification system for long-range forecasts (WMO 2002).In a three-by-three contingency table, the GSS equations can be expressed as follows: , (5) Here P ij denotes the relative sample frequency, which is defined as the ratio of the cell counts n ij to the total amount of the forecast/observation pair N; and, S ij indicates a score matrix of the reward or penalty for every forecast/observation Fig. 3. Domains used for atmospheric predictors (from Domains 1 to 7).
GSS ranges from -1 to 1.The larger the skill scores for the results, the better the downscaling scheme performs.If the skill score is lower than zero, then poorer skill than that of randomly guess will be expected.
RMSE and MPE are frequently-used metrics for the differences between values predicted by a model or an estimator and the values are actually observed from the thing being modeled or estimated, in which MPE usually expresses accuracy as a percentage.RMSE and MPE are defined as: Here P o (t) denotes the t th observed rainfall (mm).P s (t) denotes the t th simulated rainfall (mm), and n is the number of data.The smaller RMSE and MPE for the results, the better the downscaling scheme performs.

RESulTS AnD DIScuSSIon
Rainfall downscaling was performed at each of the 82 rainfall stations on a monthly scale in the study area.Based on the downscaled monthly rainfalls of the 82 rainfall stations, the spatial-averaged monthly rainfalls were calculated by the Thiessen polygon method for further analyses.The monthly data of SLP and v850 from the outputs of seven GCMs under 20C3M, as well as the observed monthly rainfall at the 82 rainfall stations, for the period from 1975 to 2000 were used to build the statistical models for rainfall downscaling.The monthly data of SLP and v850 from GCM outputs under A1B and B1 emission scenarios for the period from 2010 to 2045 were used for future rainfall projection by the validated statistical models.The following subsections describe: (1) the validation of the statistical models, including the choice of suitable predictors and the detection of different numbers of GCM for MME (multi-model ensemble), for finding the suitable setting of the statistical models over southern Taiwan; (2) the determination of the geographical domain required for the predictors by comparison of downscaling results for different domains; (3) the effect of climate change on rainfall projections under two future emission scenarios.

choice of Suitable Predictors for Statistical Model
For choosing the suitable predictors for rainfall downscaling, three cases of downscaling results have been compared for a given domain and a given GCM.The cases are: (1) the downscaling result using the sea level pressure (SLP) as predictor; (2) the downscaling result using the meridional wind field at 850 hPa (v850) as predictor; and, (3) the average downscaling result of the two former predictors ( 2 predictor

AVEDR
).To compare the downscaling performances among the above three cases, the evaluation metrics, GSS and RMSE, were used.The values of GSS and RMSE were calculated for the three cases for each month, domain and GCM.The best downscaling results of the three cases for the largest value of GSS and the smallest value of RMSE, respectively, were decided for each month, domain and GCM.For one of the three cases, the number of month with the best downscaling result has been counted; the ratio of this number to the total number of month for rainfall downscaling (i.e., 7 GCMs × 7 domains × 12 months = 588) was calculated as "performance percentage."The performance percentages by GSS and RMSE, respectively, for the three cases are shown in Table 2. From this table, the performance percentages by GSS for the three cases are close to one another, which delineates the evaluation metric, GSS, and does not necessarily distinguish the downscaling performances of the three cases; the performance percentages by RMSE for the three cases are 9%, 32%, and 59%, respectively, which reveals the third case (i.e., average downscaling result of the two predictors, 2 predictor

AVEDR
) has the best downscaling performance.Therefore, the 2 predictor AVEDR was chosen for the following analyses.

Application of Statistical Model to Different gcMs
Due to the best downscaling performance of 2 predictor

AVEDR
, the 2 predictor AVEDR was used for evaluating the effects upon seven different geographical domains with regard to rainfall projections using the metrics, RMSE and MPE.Figures 4 and 5 show the values of RMSE and MPE of downscaling results, respectively, in each month  for the seven different GCMs and the five geographical domains (i.e., Domains 1 to 5).For a given GCM (e.g., CCCMA), Fig. 4a shows that the values of RMSE for the different domains are close and smaller during the period from November to April, but still diverse and larger during the period from May to October; Fig. 5a displays that the values of MPE for the different domains are close and less than 20% in each month except February and October.Due to the fact that the temporal patterns of RMSE and MPE, respectively, for the seven different GCMs in Figs. 4 and 5 are roughly similar, it is difficult to decide the better GCM.

Detection of Different numbers of gcM for MME
More stable and skillful downscaled results can be obtained through the use of MME which is a simple average of downscaled outputs from all models (Tebaldi and Knutti 2007).It raises an interesting question about how many GCMs should be adopted using the MME approach.Therefore, the effects of different combinations for a different GCM number used for MME on downscaling performance were detected here.The number of GCM used for MME varies from two to seven to form different combinations.The detection procedure for the different combinations of two GCMs used for MME is illustrated as follows.Two of the seven GCMs used for MME have 21 different combinations (i.e., C 2 7 ).The downscaling performance of each combination was presented by the calculation of RMSE for each domain.Thereafter, 21 values of RMSE from the 21 combinations were averaged to present the downscaling performance of two GCMs used for MME for that domain.
The same procedures were performed for the numbers, three to seven, of GCM used for MME, respectively.Based on the results of the previous procedures, it can be seen that the mean value of RMSE decreases with the increasing number of GCM used for MME in a month for each domain.That means the MME approach using more different GCMs has better downscaling performance.In this study, the downscaled results averaged by MME from the seven GCMs are the best.Due to the similar trends of downscaling performance for the entire twelve months, only the results in March and August are illustrated in Fig. 6 for representing the relationship between the number of GCM used for MME and the mean value of RMSE for each domain.From the above analysis, the MME approach adopted all the seven GCMs hereafter in this work.

comparison of Downscaling Results for Different Domains
For evaluating the effects of the seven different domains on rainfall downscaling, the evaluation metrics, RMSE and MPE, of downscaling results were used.Such an evaluation is helpful in determining which domain is suitable for southern Taiwan.For each geographic domain, Figs.8a -e reveal that the MME approach can reduce the downscaling errors compared with those of the individual GCM; the MME results for different domains (i.e., Domains 1 to 5) are almost identical.Because the MME results for different domains have little influence on downscaling performance, it is difficult to decide the optimal domain from these five different domains.Domains 6 and 7, considering the seasonal situations, were further used for detecting their effects on the downscaling results.As displayed in Figs.7f -g and 8f -g, it is found that the MME approach can reduce the downscaling errors compared with those of the individual GCM; Domains 6 and 7 have little difference in downscaling performance using the MME approach, and has little influence on downscaling results as those of Domains 1, 2, 3, 4, and 5. Figures 7h and 8h show the small differences among all the MME results of seven different domains.
It is found that the MME results are insensitive to the domain selection, which may result from the fact that the fitting equations between the predictors and predictand are built separately by months.In this way, the "seasonal memory" of predictors is lost and the adopted procedure might have an over-fitted problem resulting in similar results.Though the downscaling results from these seven domains are similar, the computational efficiencies using different domains were considered for choosing the best domain in the study.The smaller domain, the faster computational efficiency performs.Therefore, Domain 1 with the best computational efficiency was chosen for the following analysis.
Based on the observed monthly rainfalls and downscaled monthly rainfalls using the MME in Domain 1, the values of mean and standard deviation of the observed and downscaled monthly rainfalls, respectively, for each month were estimated and listed in Table 3.It reveals the statistical characteristics of both the observed and downscaled monthly rainfalls are close; the downscaled monthly rainfalls using the MME in Domain 1 are thus reasonable.

Effect of climate change on Rainfall Projections
In order to assess the effect of climate change on rainfall projections, the GCM outputs under A1B and B1 emission scenarios for the period from 2010 to 2045 provide largescale atmospheric predictors (i.e., SLP and v850) for future rainfall projections.For each month, the change amount of the future rainfall projections to the baseline rainfall projections under 20C3M scenario is defined as: where ΔR is the change amount for month i, R future (i) denotes the future rainfall projections (i.e., mean monthly rainfall for month i) under a given scenario, and R 20C3M denotes the baseline rainfall projections (i.e., mean monthly rainfall for month i) under 20C3M scenario.
Figure 9 shows the monthly change amounts for each individual GCM and the MME of seven GCMs under A1B and B1 emission scenarios.Based upon the MME results, the figure reveals that the change amounts under A1B and B1 in each month are close; positive change occurs during the wet season (May to October) except June with a value around -23 mm; the change in July is the largest with 38 and 47 mm under A1B and B1, respectively; the changes in May, August, September, and October are in the range of 5 -10 mm; during the dry season (November to April) the change is small (around ±1 mm) except February and April with values around -5 mm.Overall, future rainfall shows an increasing trend during the May-to-October wet season and a decreasing trend during the November-to-April dry season.
In southern Taiwan, the temporal rainfall distribution over the course of a year is uneven which makes this region prone to droughts in the dry season.The results of change analysis imply that southern Taiwan may face a worse situation which a more uneven temporal rainfall distribution will occur in the future.How to enhance the allocation and management of water resources in the future over southern Taiwan should be concerned.

concluSIonS
An SVD statistical downscaling technique was developed for monthly rainfall over southern Taiwan.A key focus is on the validation of the statistical downscaling model and in particular the geographical domain required for the predictors.The effects of seven different GCMs and seven different domains on downscaling results were investigated.Since different GCMs have different results on rainfall downscaling, the MME approach was applied to provide best guess estimates.The analysis results show that the different geographical domains had little influence on the projected monthly rainfalls.Due to the similar downscaling results for different geographical domains, the computational efficiency for different geographical domains is under consideration.Domain 1 (120 -122.5°E,20 -27.5°N) with the best computational efficiency was chosen to be the optimal domain in this work.Finally, two emission climate change scenarios (A1B and B1) were used to project the future rainfalls for the period from 2010 to 2045 across southern Taiwan.Overall, the future rainfalls show an increasing trend during the May-to-October wet season and a decreasing trend during the November-to-April dry season.The results warn that southern Taiwan may face a worse situation with a more uneven temporal rainfall distribution in the future.The allocation and management of water resources in this area have to be enhanced in the future.
It is notable that the MME results of SVD-based downscaling process among seven different domains are almost identical.The probable reason is that the transfer function between the predictors and predictand was built month-bymonth in this work.Because of this method, the "seasonal memory" of predictors is lost and the adopted procedure might have the over-fitted problem, resulting in similar findings.However, for the hydrological applications (e.g., rainfall-runoff modeling), monthly scale for downscaling procedure is more practical than the longer time scales
Fig.6.Relationships between number of GCM and mean RMSE in March and August.

Fig. 7 .
Fig. 7. Values of RMSE for different GCMs and the MME in Domains 1 to 7.

Fig. 8 .
Fig. 8. Values of MPE for different GCMs and the MME in Domains 1 to 7.
(e.g., seasonal and yearly scales).To verify this argument, future work may adopt the other methods (e.g., CCA-based or ANN-based downscaling method) or use different time scales to build the transfer function in comparison with the results in the present work.

Fig. 9 .
Fig. 9. Change amounts for different GCMs and the MME under A1B and B1 emission scenarios.

Table 1 .
GCMs used in the study.

Table 2 .
Performance percentages (PPs) for three cases.
Note: Cases 1, 2, and 3 denote the downscaling results by SLP, v850, and the average downscaling results of Cases 1 and 2, respectively.

Table 3 .
Mean and standard deviation of rainfall observation and downscaling result by MME in Domain 1 for each month.