Probabilistic assessment of drought states using a dynamic naive Bayesian classifier

Drought is a slow-onset hazard affecting ecosystems and human society. Although it is difficult to assess the uncertainty associated with drought, it is very important to identify the severity of drought. Using a dynamic naive Bayesian classifier (DNBC), this study combined the strengths of three conventional drought indices, the Standardized Precipitation Index (SPI), the Evaporative Stress Index (ESI), and the Vegetation Health Index (VHI), and developed a DNBC-based drought index (DNBC-DI) to identify overall drought conditions. After comparing recent actual drought events with the drought indices, the drought severity was classified into five states using them: severe wet, moderate wet, normal, moderate drought, and severe drought. We evaluated the performance of the DNBC-DI for representing actual hydrological droughts that occurred since 2000. In this study, the actual hydrological drought was represented by the Streamflow Drought Index (SDI). Our results indicated that the accuracy of the DNBC-DI was 60%, which was higher than SPI (40%), ESI (40%), and VHI (0.41%). Even though in practice, the evaluation of drought is highly dependent on the drought index, this study tried to develop a practical drought index that can be used for comprehensive drought assessment. Article history: Received 11 August 2018 Revised 29 July 2019 Accepted 8 October 2019


INTRODUCTION
Due to the widespread impact and complex mechanisms of drought, various studies have been devoted to drought monitoring, planning, and mitigation (Prabhakar and Shaw 2008;Mishra and Singh 2011). In drought monitoring, it is important to estimate and accurately predict the characteristics of drought. In general, the characteristics of drought are evaluated using standardized drought indices, which are usually based on various drought-related variables such as precipitation, temperature, and streamflow (Zargar et al. 2011;Hao and Singh 2015). These drought indices are used for the quantification of drought and the reference information of monitoring; however, they are heavily dependent on the purpose and availability of data (Tsakiris et al. 2007). Accordingly, droughts are classified as meteorological, agricultural, or hydrological drought according to the purpose and variables of interest (Wilhite and Glantz 1985). A drought index integrates one or more cli-matic or hydrological variables, such as precipitation, temperature, soil moisture, and streamflow (Steinemann et al. 2005). In recent years, significant progress has been made in employing multiple drought indices for comprehensive drought management (Sun et al. 2012).
It is important to assess the overall drought situation for regional drought planning and mitigation. Thus, there are many studies on comprehensive drought evaluation. Presently, drought indices are classified as either univariate or multivariate. Univariate drought indices use variables related to meteorological, hydrological, and agricultural drought (Niemeyer 2008). The Standardized Precipitation Index (SPI) (McKee et al. 1993) and the Palmer Drought Severity Index (PDSI) (Palmer 1965) are representative indices for meteorological drought, the Palmer Hydrological Drought Index (PHDI) (Palmer 1965) and the Surface Water Supply Index (SWSI) (Shafer and Dezman 1982) for hydrological drought, and the Relative Soil Moisture (RSM) (Thornthwaite and Mather 1955) and the Crop Specific Drought Index (CSDI) (Meyer et al. 1993) for agricultural drought.
The Korea Meteorological Administration (KMA) and the Korea Water Resources Corporation (K-water) mostly use the SPI to evaluate drought events in South Korea. However, univariate drought indices are not suitable for expressing complex droughts (Heim 2002;Steinemann and Cavalcanti 2006;Hao and AghaKouchak 2013).
Multivariate drought indices combine two or more drought indices. Examples of multivariate drought indices are the Hydrological Drought Index (HDI) (Karamouz et al. 2009), the Multivariate Standardized Drought Index (MSDI) (Hao and AghaKouchak 2013), and the Objective Blend of Drought Indicator (OBDI) (Svoboda et al. 2002), which were developed for comprehensive drought monitoring and assessments. Drought begins with extreme rainfall shortages, which then lead to lack of soil moisture, river flows, groundwater levels, and groundwater flows. Thus, drought should be analysed by considering various causes. Keyantash and Dracup (2002) developed the Aggregate Drought Index (ADI) considering precipitation, runoff, and soil moisture together, and Wilhite (2005) noted that various weather variables should be employed, including precipitation, for drought monitoring and early warning. The National Drought Mitigation Center (NDMC) takes six indices to produce U.S. drought monitoring information.
Although the combination of various drought indices may provide a more comprehensive drought assessment than univariate index approaches, there is a lack of systematic methods for their combination, application, and evaluation (Steinemann and Cavalcanti 2006). Sun et al. (2012) developed a multi-index drought (MID) model to combine various drought indices for agricultural drought assessment in Canada. The results showed that the MID model was better than using any univariate drought index to represent drought characteristics, and provided a more reliable and comprehensive drought assessment. In recent years, the Hidden Markov Model (HMM), which is a type of probabilistic statistical model with wide practical applications, became popular in performing probabilistic drought forecasting and assessment (Mallya et al. 2013;Ramadas and Govindaraju 2016;Chen et al. 2017Chen et al. , 2018. Especially, Chen et al. (2018) used a dynamic naive Bayes Classifier (DNBC) to classify drought severity using various drought indices for integrated drought assessment, and compared the DNBC with preceding unidirectional indices such as SPI, SDI, and the Normalized Vegetation Supply Water Index (NVSWI).
Drought indices in the past have been developed with the aim of assessing meteorological, hydrological, and agricultural droughts. However, in recent years, to comprehensively assess drought severity, a comprehensive drought index has been developed (Niemeyer 2008), which incorporates various drought information and can be used for drought planning or response. In this study, representative indices for meteorological, hydrological, and agricultural droughts were chosen to calculate a comprehensive drought index; the SPI for meteorological drought, the SDI for hydrological drought, and the VHI for agricultural drought.
The purpose of this study is to develop a comprehensive drought index considering various drought causes, and to be used as a basis for drought planning and risk analysis. We used the dynamic naive Bayesian classifier (DNBC) to calculate a multiple-drought index. Various efforts have been made to apply the Bayesian theory to drought assessment since the 1950s. It is possible to learn the probability of occurrence of ideas mechanically by extending one or more factors that contribute to a particular thought. In this study, the factors contributing to drought were identified as precipitation, streamflow, and evaporation. The DNBC is an extension of the HMM and provides a stochastic domain with a dynamic nature. The DNBC is capable of directly utilizing all the information generated by the dynamic process as attributes to avoid any loss of training data information. Hence, this study employed the DNBC to perform multiindex drought assessment by aggregating the effects of different physical dimensions of drought and considering the inherent uncertainty.

STUDY AREA AND DATA
Our study area covered South Korea except Jeju Island, as shown in Fig. 1, within latitudes of 33 -38°N and longitudes of 126 -131°E. There are five major river basins in South Korea containing 113 subbasins. Specifically, the Han River basin has 30 subbasins, the Nakdong River basin 33 subbasins, the Geum River 21 subbasins, the Seomjin River basin 15 subbasins, and the Yeongsan River basin 14 subbasins.
We used satellite data for analysis and comparison of drought in the subbasins. The precipitation data used to calculate the SPI were combined with Tropical Rainfall Measuring Mission (TRMM) data (January 2001 to March 2013) and Global Rainfall Measurement (GRM) data (April 2013 to December 2014). The evapotranspiration data used to produce the ESI were the actual evapotranspiration and potential evapotranspiration produced by the Moderate Resolution Imaging Spectroradiometer (MODIS). The land surface temperature (LST) and the normalized difference vegetation index (NDVI) to produce the VHI used data from the MODIS. The SDI used in the verification was the natural flow rate provided by the Korea Institute of Civil Engineering and Building Technology (KICT). Natural flow data were estimated using the continuous rainfall-runoff model. The procedural steps of this study are shown in Fig. 2.

Standardized Precipitation Index (SPI)
The SPI is a useful tool to define drought severity on the basis of the standardized deficit of precipitation at various time scales, e.g., 3-, 6-, 9-, and 12-month. The  short-period time scales are used for agricultural drought management and the long-period time scales are used for water resource management. Chang et al. (2006) compared and evaluated the spatio-temporal characteristics of drought using the SPI. Similarly, Bonaccorso et al. (2003) analyzed the spatial diversity of drought using the SPI. We calculated the SPI for our studied subbasins using satellite data, as shown in Fig. 3a.

Evaporative Stress Index (ESI)
The ESI incorporates the water supply between the surface and the atmosphere using actual evapotranspiration and potential evapotranspiration, as given by Eq. (1). Anderson et al. (2011) developed a drought map using the ESI across the United States, confirming that it was more applicable than the existing drought indices such as SPI and PDSI. We calculated the ESI using satellite data, as shown in Fig. 3b.

ESI PET AET =
(1) where AET and PET are the actual and potential evapotranspiration, respectively.

Vegetation Health Index (VHI)
As an agricultural drought index, the VHI was used to assess drought condition in agricultural areas of Africa (Rojas et al. 2011). The VHI combines the Vegetation Condition Index (VCI) and the Temperature Condition Index (TCI), as given in Eqs.
where NDVI is the normalized difference vegetation index. LST min and LST max are the minimum and maximum of the land surface temperature, respectively. In this study, the SPI, ESI, and VHI were calculated for subbasins to compare and evaluate the onset and affected areas of drought. In addition, their values were reproduced to have five states: state 1 for severe wet, state 2 for moderate wet, state 3 for normal, state 4 for moderate drought, and state 5 for severe drought, which were used to further compare with the DNBC-DI.

Streamflow Drought Index (SDI)
The calculation of SDI is the same as that of SPI except for using streamflow instead of precipitation. Similar with Tabari et al. (2013) and Won and Chung (2016), the SDI was used for assessing hydrological droughts in this study.

DYNAMIC NAIVE BAYESIAN CLASSIFIER (DNBC)
The DNBC is a simple probabilistic classifier based on Bayes' theorem with strong naive assumptions of conditional independence among the attributes given the hidden state. The model is composed of a set A = {A t |t = 1, …, T}, where each A t = {A t n |1 ≤ n ≤ N} is a set of N attribute values generated by the dynamic process at state S t = {1, …, m} (Chen et al. 2018). Thus, A t n identifies a specific attribute, e.g., an individual drought index, while S t denotes a realization of the drought state with different severity at time t in this study. The joint likelihood of observed attributes and latent states in a DNBC can be taken by Eq. (5). where P(S 1 ) is the initial probability distribution for the hidden state S t at time t = 1, P(S t + 1 |S t ) is the transition probability from state S t to state S t + 1 , and ( ) ( ) P A S PA S is valid for its naive conditional independence assumption among the attributes given the class. Moreover, the DNBC follows two main assumptions.
(1) The dynamic process of S t follows the first-order Markov chain property, i.e., the next state is only dependent on the current state.
(2) The dynamic process is stationary, i.e., the transition probability is not time-dependent. We estimated the parameters of the DNBC using the R package 'depmixS4' (Visser and Speekenbrink 2010), which is based on the expectation-maximization (EM) algorithm iteratively maximizing the expected joint log-likelihood of the parameters given the attribute observations and states. In the DNBC, the complete set of parameters for a given model was defined as θ = (θ 1 , θ 2 , θ 3 ) with three vectors demonstrating the parameters for the initial, transition, and emission distributions, respectively. Thus, the joint loglikelihood can be written as Eq. (6). In this study, the previously calculated drought indices SPI, SDI, and VHI were adopted as three input attributes for drought assessment in the DNBC. Assuming that the input variables were independent each other, we chose a Gaussian distribution for the emission distribution of each attribute, as given by Eq. (7). v are the mean and the variance of the Gaussian emission distribution for the ith latent state and the nth observed variable. Chen et al. (2018) selected the Gaussian distribution due to its easy computation and availability to account for the drought-related indicator's complex process. With the estimated optimal DNBC parameters, the most probable path of the latent drought state that maximizes ( ) P A : , together with the probability of each state at every time step, can be obtained using the Viterbi algorithm (Chen et al. 2018).

RESULTS
As introduced in previous sections, using monthly data, three different drought indices were calculated for 14 years from January 2001 to December 2014; the SPI for meteorological, the ESI for hydrological, and the VHI for agricultural drought. The values of SPI and ESI ranged between -2 and +2, whereas the VHI varied from 0 to 100. Thus, we converted the value of VHI to be between -2 and +2.
In this study, we developed a DNBC-based drought index (DNBC-DI), which combined the SPI, ESI, and VHI, and was used to identify drought severity for 113 subbasins. For example, Fig. 4 shows the DNBC-DI, SPI, ESI, and VHI for the Han River basin with different drought states. Severe droughts occurred in the Han River basin in 2001, 2008 -2009, and 2012. As shown in Fig. 4, the SPI, SDI, and VHI did not clearly represent the drought events that occurred, whereas the DNBC-DI identified most of the drought events.
Although it was considered to be most reasonable to compare with the spatio-temporal information about actual drought-damaged areas, it was difficult to directly apply because the information on drought damage was limited. Hydrological droughts may have widespread impacts by reducing or eliminating water supplies, limiting irrigation water, causing crop failures, and influencing the diversity of economic and social activities (Mishra and Singh 2010). Therefore, we selected hydrological drought closest to the actual drought as the verification target, and the SDI was se- basin. According to Chen et al. (2018) It is necessary to perform nationwide drought assessment for developing a comprehensive drought plan. Thus, this study extended Chen et al. (2018) results nationwide and confirmed the applicability of drought evaluation. This study investigated the performance of drought indices to identify drought occurrence in a quantitative way using the proportion correct (PC), as given by Eq. (8).
where a, b, c, and d are values for hit, false alarm, miss, and correct rejection, respectively, as shown in Table 1. For accuracy evaluation of drought indices, the PCs were calculated for DNBC-DI, SPI, ESI, and VHI with the SDI, and their results are shown in Fig. 6. In addition, Table 2 summarizes the average, maximum, and minimum of PC corresponding to Fig. 6. Comparing the SPI with the SDI, the average PC was 0.3990, the maximum 0.4630, and the minimum 0.3274. Comparing the ESI with the SDI, the average proportion correct was 0.3980, the maximum 0.4702, and the minimum 0.3036. Comparing the VHI with the SDI, the average proportion correct was 0.4091, the maximum 0.4702, and the minimum 0.3274. The DNBC-DI represented the drought severity with five states in consideration of various drought perspectives. Comparing the DNBC-DI states with the SDI, the average PC was 0.5959, the maximum 0.8155, and the minimum 0.3869. Thus, assessing droughts using the DNBC-DI more accurately determined the drought onset and severity compared with univariate drought indices. The maximum accuracy of the SPI, ESI, and VHI was lower than the average accuracy of the DNBC-DI. As shown in Fig. 6, the DNBC-DI showed higher accuracy over the country.

CONCLUSION
Climate change has increased the interest in drought assessment, and comprehensive drought planning is be-coming necessary. A univariate drought index is not sufficient for this purpose, so in this study, we developed the DNBC-DI reflecting various drought effects as a multivariate drought index. In addition, the DNBC-DI is available for probabilistic drought assessment compared with actual drought occurrences.
Drought can be assessed from various perspectives. However, as shown in Fig. 4, the SPI and ESI have similar patterns during the historical drought periods, but the VHI showed different. Since drought is classified as meteorological, hydrological, or agricultural, and may be observed differently depending on the point of view, when establishing drought plans, various drought indices should be used separately. However, this may lead bias and confusion in decision-making. Instead, a comprehensive drought plan must be established that can consider not only a specific drought but also its various causes. It is very important to thoroughly assess droughts because their causes vary and their consequences are extensive.
In this study, assessing drought states using the DN-BC-DI was more accurate than with the commonly used indices such as the SPI, ESI, and VHI in determining the occurrence, extent, and termination of droughts. In comparing with actual droughts in March 2001 to June 2001, September 2008 to May 2009, and January 2012 to June 2012 in Han River basin, the SPI, ESI, and VHI did not clearly identify the drought events, but the DNBC-DI indicated overall dry conditions. As a result of this study, our DNBC-DI classified drought into five states and can be useful for drought planning. It can be used as a reference in the process of establishing a step-by-step drought plan.