Statistical Tests for Pre-earthquake Ionospheric Anomaly

1 Institute of Statistics, National Central University, Chung-Li, Taiwan, ROC 2 Institute of Space Science, National Central University, Chung-Li, Taiwan, ROC 3 Center for Remote Sensing and Space Researches, National Central University, Chung-Li, Taiwan, ROC 4 Institute of Geophysics, National Central University, Chung-Li, Taiwan, ROC * Corresponding author address: Dr. Yuh-Ing Chen, Institute of Statistics, National Central University, Chung-Li, Taiwan, ROC; E-mail: ychen@stat.ncu.edu.tw The anomalous depression of the maximum plasma frequency in the ionosphere (foF2) appears significantly within 1-5 days before the M ≥ 5.0 earthquakes in the Taiwan area during 1994 1999 (Liu et al. 2003). In this paper, we propose two statistical tests against the foF2 anomaly as a candidate for a precursor of earthquakes based on criteria including the success rate, alarm rate, probability gain, and R score. One statistical test is conducted to investigate the significance of the observed foF2 anomalies related to the recorded M ≥ 5.0 earthquakes in the Taiwan area during 19941999. The other statistical test is designed to compare the foF2 anomalybased method with competitive alternatives for predicting the earthquakes under study. The involved alternatives are a naive prediction based on a coin-tossing experiment and a simple prediction method constructed from the current M ≥ 5.0 earthquakes catalog during 1994 1999. The simulation results indicate that, contrast to possible foF2 anomalies, the observed foF2 anomalies are significantly earthquake-related. Moreover, comparing with the alternative predictions under study, the foF2 anomaly remains valid for temporal alarming of the M ≥ 5.0 earthquakes in the Taiwan area during 1994 1999. (


INTRODUCTION
The question that is usually raised after a major and destructive earthquake is whether the catastrophe was predictable.In fact, many papers have discussed the possible signals of different phenomena including electrical, electromagnetic, and luminosity that either accompany or are followed by earthquakes (Fujinawa and Takahashi 1990;Hayakawa 1999;Enomoto et al. 1997).However, due to limited events, it is quite difficult to statistically prove the observed electromagnetic phenomena to be earthquake related.
After the occurrence of Chi-Chi earthquake (M = 7.3, 1999/9/21), the scientists in Taiwan, as usual, looked back at their records to try to discover some possible precursory, geophysical signals.Liu et al. (2000) reported that the maximum plasma frequency in the ionosphere, foF2, decreased dramatically in the afternoons of 1 -6 day before the 14 M ≥ 6.0 earthquakes, which occurred within 400 km of the Chung-Li ionospheric station (25°N, 121°E) in Taiwan during 1994 -1999.Liu et al. (2003) further reported that there are 307 foF2 anomalies before 184 M ≥ 5.0 earthquakes, which occurred on 170 days within 500 km of the ionospheric station during 1994 -1999 (Fig. 1).Moreover, the results showed a significant 1 -5 days lead time of the foF2 anomaly to the earthquakes under study.In this paper, to verify that the foF2 anomaly serves as a candidate for a reliable precursor (Wyss 1991), it remains a problem of whether or not the successful outlook in time for the associated earthquakes based on the foF2 anomaly is simply due to random chance.In the debate on evaluation of the VAN method (Varotsos et al. 1981) lead by Geller (1996), the statistical significance of the method for earthquake prediction has been emphasized, in which the set up of a reasonable null hypothesis is of major concern.Contrast of the VAN predictions to the historical or simulated earthquake catalog was suggested by, for example, Aceves et al. (1996), Kagan (1996), Mulargia and Gasperini (1996) and Wyss and Allmann (1996).Aceves et al. (1996) supported the VAN method, but wrongly assumed the uniform distribution for the occurrence of Greece's earthquakes in time.Mulargia and Gasperini (1996) as well as Wyss and Allmann (1996) proposed some statistical tests based on the Poisson distribution of the earthquakes in time which resulted in outcomes against the VAN method.However, the assumption of a Poisson distribution for the occurrence of moderate earthquakes with magnitudes between 5 and 7, which are frequently clustered in time and space, may not be appropriate.Therefore, Kagan (1996) pointed that the non-Poisson behavior of earthquake sequences needed to be properly accounted for in such a significance test.
On the other hand, since the earthquake times and places are difficult to simulate adequately, Stark (1996) suggested comparing the success rate of the VAN method with that of some reasonable random prediction methods.Stark (1997) then proposed two random prediction algorithms based on a coin-tossing experiment and a Gamma renewal process (Daley and Vere-Jones 2003), respectively, for testing against the VAN method.Following the spirit of Stark (1996Stark ( , 1997)), Musson (1997) further suggested a worthless prediction algorithm by taking into account the background seismic rate.The 90 -percentile of the success rates generated from the worthless prediction then provides the reference success rate for any other candidate predictions to check for their significance.Unfortunately, the worthless predictions proposed in Musson (1997) are generated from, again, the inappropriate Poisson distribution.
In this paper, we treat the 307 foF2 anomalies as 1 -5 day alarms for 170 M ≥ 5.0 earthquake days during 1994 -1999.We employ some reasonable criteria, in Section 2, to evaluate the efficiency of the foF2 anomaly in raising the alarm for the earthquakes under study (Kagan 1997;Aki 1981).We further construct two simulation tests, in Section 3, against the pre-earthquake foF2 anomaly with respective to the criteria.The first one is to check, among all possible anomalous foF2 sequences, if the observed foF2 anomalies are significantly related to the earthquakes under study.The second one is to compare the foF2 anomalybased method with alternatives for raising the alarm prior to M ≥ 5.0 earthquakes under study.
In Section 4, we draw some conclusions on the basis of the simulation results together with a discussion for possible future research.

EFFICIENCY CRITERIA
To issue an alarm for earthquakes is, in fact, a decision-making process.To evaluate the reliability and validity of any earthquake alarming or prediction, Kagan (1997) considered the alarm rate (proportion of earthquake days preceded by an alarm, denoted by AR) and the success rate (proportion of alarms successfully followed by earthquakes, referred as SR), respectively.However, a foF2 anomaly is often followed over certain days by several earthquakes owing to seismic clustering.On the other hand, an earthquake is often preceded in several days by some foF2 anomalies since the occurrences of foF2 anomalies are also clustered in time.Therefore, to investigate the efficiency of the foF2 anomaly in raising the alarm for impending earthquakes, we prefer to use probability gain, denoted by PG, which is the ratio of earthquake occurrence rate during the alarm days to the background earthquake occurrence rate (Aki 1981).In fact, the probability gain reflects how often the earthquakes occur during the alarm-time period compared to background earthquakes.
Denote the numbers of earthquake days with and without alarms by a and b, respectively.Moreover, let c and d be the numbers of non-earthquake days with and without any alarm, respectively.Then, we have a 2 2 × contingency table (Table 1) and the probability gain is given by Notice that PG can be written as [a/(a+b)]/[(a+c)/(a+b+c+d)].Therefore, PG is also the ratio of alarming rate during earthquake days to the background alarming rate.
However, the probability gain simply focuses on the correct decision for raising the alarm Table 1.The classification of study periods according to the earthquake and alarm days.
of associated events, it neglects the wrong decision of raising the alarm for non-earthquake days or false alarms.Therefore, to credit a correct alarming and penalize a false alarm, we compute the Hanssen-Kuiper skill score, or R score as The R score can be interpreted as the proportion of correct alarms less than the proportion of false alarms with regards to earthquakes under study.Notice that the R score preserves an equal weight for both the correct and wrong decisions.In other words, the R score is a reasonable criterion if loss due to a wrong decision and benefit from a correct decision are the same.Moreover, the value of R score is between -1 and +1, where a positive R score indicates a better chance for making correct decisions, while a negative R score provides evidence against the decision-making procedure.In fact, the R score also takes into account the correct decision of not raising an alarm for non-earthquake days by d/(c+d), as well as the wrong decision of not raising the alarm for earthquake days through b/(a+b).
The four criteria stated above for raising the alarm before the 170 earthquake days based on 307 observed foF2 anomalies are presented in Fig. 2. We learned from Figs. 2a, b that, for example, about 73% (81%) of M ≥ 5.0 (M ≥ 6.0) earthquakes experienced the foF2 anomalies within 5 days.Moreover, about 51% (9%) of the foF2 anomalies successfully alerted the following M ≥ 5.0 (M ≥ 6.0) earthquakes in 5 days.It can be seen that larger earthquakes tend to have a higher chance of experiencing the foF2 anomaly.However, the scarcity of M ≥ 6.0 earthquakes in the background leads to less likely of the successful of pre-earthquake foF2 anomaly.Therefore, probability gain, adjusted for the background seismic rate, provides a more reasonable criterion than the success rate for evaluating the efficiency of earthquake alarming.The probability gain of the foF2 anomaly shown in Fig. 2c reaches its maximum at 3.75 for alerting the M ≥ 6.0 earthquakes in 1 day.This means that, for instance, the M ≥ 6.0 earthquakes occur 3.75 times more frequently on the days one day after an foF2 anomaly than in the six years during 1994-1999.Finally, in Fig. 2d, the positive R scores imply that the decision to raising an alarm for the earthquakes under study based on the foF2 anomaly is more likely to be a correct decision than not.In addition, a 3-day alarming for M ≥ 6.0 earthquakes based on the foF2 anomaly has the largest R score at 0.43.

STATISTICAL TESTA
A statistical test is usually conducted to search for significant evidence in favor of the alternative hypothesis ( H A ) to the null hypothesis (H 0 ).In general, the burden of proof is on the alternative hypothesis.If the observations provide significant evidence for supporting the alternative hypothesis, then the null hypothesis is rejected and the conclusion is reached according to the alternative hypothesis.If, however, the observations do not show significant evidence in favor of the alternative hypothesis, the null hypothesis remains.Notice that the p-value of the test; namely, the probability that the observed test statistic leads to rejection of H 0 when H 0 is true is usually computed.We then reject H 0 for a small p-value ( ≤ 0.05, for example).
To demonstrate the significance of the observed foF2 anomalies related to the earthquakes under study, we perform a simulation-based statistical test.In this setting, we test for H 0 : The pre-earthquake foF2 anomaly is randomly observed versus H A : The pre-earthquake foF2 anomaly is not randomly observed.
If H 0 is true, then the observed foF2 anomalies can be regarded as anomalies in a possible sequence of the background foF2 anomalies that raise the alarm for the recorded earthquakes.However, if the background foF2 anomalies only show a small chance of being related to the recorded earthquakes then we reach the conclusion in favor of H A that the observed foF2 anomaly is not random and is a pre-earthquake event.
We investigate the efficiency of any possible sequence of foF2 anomalies for raising an earthquake alarm for those earthquakes under study via the criteria in Section 2. In addition, for generating possible occurrence of the foF2 anomalies during 1994 -1999, the inter-foF2 anomaly time was fitted by a Gamma distribution with scale parameter 10.3 and shape parameter 0.55 (Fig. 3) owing to the clustered behavior of the foF2 anomalies.Notice that the shape parameter of 0.55 indicates heavily clustered foF2 anomalies and the Gamma distribution gives, on the average, 307 foF2 anomalies during 1994 -1999.Moreover, a chi-squares goodness-of-fit test gives a large p-value of 0.686, which shows that the Gamma distribution provides with a reasonable fit to the observed inter-foF2 anomaly times.Based on the Gamma (0.55, 10.3) distribution, we generated the possible inter-foF2 anomaly times and, hence, simulated 10000 sequences of the foF2 anomalies.We then computed, for each sequence of such anomalies, the associated AR, SR, PG and R values for raising the alarm of impending M ≥ 5.0 earthquakes during 1994 -1999.The p-value of the test, namely, the proportion of the simulated sequences with a criterion value larger than the corresponding observed one in Fig. 2 is finally shown in Table 2. Notice that the small p-values (≤ 0.0052) lead to the rejection of the null hypothesis.Therefore, we conclude that the observed foF2 anomalies during 1994 -1999 are significantly related to the impending M ≥ 5.0 or M ≥ 6.0 earthquakes in 1-5 days.
To contrast the foF2 anomaly-based method to some alternative methods for raising the alarm to the M ≥ 5.0 earthquakes under study, we consider herein two random prediction Table 2.The p-value of the statistical test for the randomness of the observed pre-earthquake foF2 anomalies in the Taiwan area during 1994 -1999.
methods as competitors to the foF2 anomaly-based method.The first one, called the naive prediction, is based on a coin-tossing experiment with probability of head 0.1, which is close to the background seismic rate of 7.8% for the M ≥ 5.0 earthquakes under study.Such a coin is tossed everyday for predicting M ≥ 5.0 earthquakes during 1994 -1999.The second one, referred to simple prediction, is constructed from a Gamma renewal process, which is appropriate for modeling clustered earthquakes.For retrospectively testing against the pre-earthquake foF2 anomaly, we used the Gamma distribution with scale parameter 17.9 and shape parameter 0.75 for describing the inter-earthquake time of the M ≥ 5.0 earthquakes during 1994 -1999 (Fig. 4).Notice that the p-value of a chi-squares goodness-of-fit test as 0.743 shows that the Gamma distribution provides with a reasonable fit to the inter-earthquake time of the M ≥ 5.0 earthquakes during 1994 -1999.The shape parameter as 0.75 further indicates that the M ≥ 5.0 earthquakes in the Taiwan area during 1994 -1999 preserve slightly the clustering phenomenon.
Repeating the coin-tossing experiment, we obtain 10000 sequences of naive predictions for the M ≥ 5.0 earthquakes in the Taiwan area during 1994 -1999.On the other hand, numer- ous Gamma (0.75, 17.9) random variables were simulated to generate 10000 sequences of occurrence times of earthquakes; each is then regarded as predictions for the recorded M ≥ 5.0 earthquakes during 1994 -1999.We then calculated, for each sequence, the values of AR, SR, PG and R score for predicting the recorded earthquakes during 1994-1999.We further computed and reported, in Table 3, the 90 percentiles of the respective criteria values for the two alternative prediction methods.In the spirit of Musson (1997), these percentiles provide reference values for validating candidate prediction of earthquakes in the Taiwan area.For instance, if a prediction method under investigation only reaches a smaller criterion value than the associated 90 percentile, then the alternative prediction method has at least 10% chance to be more efficient than the candidate and, hence, the candidate prediction method can be ditched.Notice that the efficiency criteria in Fig. 2 are all larger than the corresponding 90 percentiles in Table 3.Therefore, we withhold the foF2 anomaly for alarming the recorded M ≥ 5.0 earthquakes during 1994 -1999.
According to the 90 percentile-efficiency rule stated above, we can construct a test against the observed foF2 anomalies for raising the alarm to the earthquakes under study.Let P be the probability that the alternative prediction method is more efficient than the observed foF2 anomalies for in raising the alarm to the earthquakes.Then, we have: H 0 : P ≤ 0.10 versus H A : P > 0.10.
In other words, the foF2 anomaly-based method is claimed to be worse than the alternative method if the alternative method has at least 10% chance being better than the observed foF2 anomalies for alarming the earthquakes.
For each of the two alternative prediction methods, we estimated the P value, denoted by P, the proportion of the simulated 10000 sequences of alternative prediction showing larger criteria values than the corresponding one in Fig. 2 (Table 4).Notice that there are at most 129 out of 10000 naive prediction sequences better than the foF2 anomaly for alarming within 4 days the M ≥ 5.0 earthquakes under study with respective to the success rate.Moreover, in terms of the four criteria for the simple predictions, the P values are all smaller than 0.0157.In fact, the approximated p-value of a Z-test for H 0 versus H A based on 100000 replications is given by 1-Φ [( P-0.10)/0.003],where Φ (.) is the cumulative distribution function of a Table 3.The 90 percentiles of the efficiency criteria on the basis of the alternative methods for predicting M ≥ 5.0 earthquakes in the Taiwan area during 1994-1999.
These results further demonstrate that the pre-earthquake foF2 anomaly remains to be a valid candidate as a short-term precursor of M ≥ 5.0 earthquakes in the Taiwan area.

DISCUSSIONS AND CONCLUSIONS
We learn from the debate on earthquake prediction hosted by Nature magazine in 1999 that the most difficult part about predicting an impending earthquake on the timescale of months to weeks is to identify reliable, unambiguous precursors (Main 1999).The ionospheric anomaly is one of several possible precursors of Taiwan's earthquakes under investigation by Tsai et al. (2003).According to the simulation-based tests together with reasonable criteria under study, we show that 1 -5 day alarms before M ≥ 5.0 earthquakes in the Taiwan area during 1994 - 1999 on the basis of the foF2 anomalies reported in Liu et al. (2003) is not due to random chance.Moreover, in comparison with the naive or simple method, the observed foF2 anomalies remain valid for alarming the earthquakes under study.Therefore, this work verifies that the foF2 anomaly serves as a candidate precursor for short-term prediction of earthquake occurrence in time.In addition, the observed foF2 anomalies preserve a significant ability for alarming the earthquakes under study, we may regard the observed criteria values shown in Fig. 2 as the best that we can have for using the foF2 anomalies as alarms of M ≥ 5.0 earth- quakes during 1994 -1999.
Many authors proposed models for describing the occurrence of earthquakes in time, for instance, the ETAS-model suggested by Ogata (1988).These models may provide reasonable prediction of earthquakes, however, it is difficult to generate the appropriate syntheses earthquake catalogs as predictions from the models.In addition, block bootstrapping from the observed foF2 anomalies or M ≥ 5.0 earthquakes in the Taiwan area during 1994 -1999 may preserve the relevant clustered structure.Yet, the optimal block size remains undecided.This Table 4.The estimated P value of the statistical test against the observed foF2 anomalies for alarming M ≥ 5.0 earthquakes in the Taiwan area dur- ing 1994 -1999.
is the reason why, in this paper, we use the Gamma renewal process to mimic the clustered earthquakes in the Taiwan area, although the Gamma renewal process inherits the assumption of independent occurrence of earthquakes.Nevertheless, in the future, if there is any reasonable model for describing the earthquake occurrence and the synthesis earthquake catalog is available or there is any better alternative prediction method, then we should have a new statistical test against the pre-earthquake foF2 anomaly.
Finally, we notice that many anomalous observations have been identified retrospectively and nominated as precursors.However, a proper monitoring of the precursors together with a complete record of earthquakes in the same place for a long time are needed for establishing a precursory connection to earthquakes which may lead to skills in prediction (Jackson 1999).In fact, the ionospheric anomaly learned from the recorded foF2 data during 1994 -1999 shows that the occurrence of the foF2 anomaly depends on the magnitude of the related earthquake and the distance from the epicenter to the ionospheric station (Liu et al. 2003).However, to have a probabilistic earthquake prediction, we need to build an appropriate statistical model for describing the relationship between well-defined precursory strength and the magnitude, occurrence time lag, and the epicenter of the related earthquake.All of these would be the valuable future statistical work on the seismo-ionospheric precursor.

Fig. 4 .
Fig. 4. The cumulative distribution function of the inter-earthquake time.The empirical ( o ) and fitted ( − ) cumulative distribution functions for the 170 M ≥ 5.0 Taiwan's earthquake days during 1994 -1999.