An Adaptive neuro-Fuzzy Inference System for Sea Level Prediction considering tide-Generating Forces and oceanic thermal Expansion

The paper presents an adaptive neuro fuzzy inference system for predicting sea level considering tide-generating forces and oceanic thermal expansion assuming a model of sea level dependence on sea surface temperature. The proposed model named TGFT-FN (tide-Generating Forces considering sea surface temperature and Fuzzy neuro-network system) is applied to predict tides at five tide gauge sites located in Taiwan and has the root mean square of error of about 7.3 15.0 cm. The capability of TGFT-FN model is superior in sea level prediction than the previous TGF-NN model developed by Chang and Lin (2006) that considers the tide-generating forces only. The TGFT-FN model is employed to train and predict the sea level of Hua-Lien station, and is also appropriate for the same prediction at the tide gauge sites next to Hua-Lien station.


IntroductIon
Astronomical tides, which are generated by the gravitational forces of the moon and the sun, and the centrifugal force due to the rotation of the earth, generally account for 75 -80% of water level variability in the open ocean (Molines et al. 1994).The nonlinear effects of tide gauge measured sea level in coastal and enclosed or semi-enclosed basins areas may differ from the astronomical constituents due to meteorological forcing, tidal interactions, and river outflow.For example, tides account for only about 50% of water level variability at Pensacola, Florida ( Zetler et al. 1979), and about 40% at Baltimore, Maryland ( Frison et al. 1999).Because of the magnitude of astronomical forcing, analysis of water levels has traditionally emphasized linear methods to decompose water levels into "tides" and other components.The amplitudes and phases of the tidal constituents are then determined on the basis of known periods that are driven by the astronomical motions of the earth, moon, and sun.However, measured water levels in coastal and estuarine areas may differ significantly from the astronomical constituents due to nonlinear effects that include meteoro-logical forcing, tidal interactions, and river outflow.Tide analysis and classification techniques in common use are least squares analysis (e.g., Godin 1972), response analysis (e.g., Munk and Cartwright 1966;Cartwright et al. 1969), Fourier analysis (e.g., Godin 1972) and classification using ratios of tidal constituents (e.g., Defant 1961).Common to all of these methods is the assumption that measured water levels are a simple superposition of astronomical tides and other components.However, even these traditional methods acknowledge the importance of the nonlinear effects by considering the harmonics of the astronomical tidal frequencies (e.g., Schureman 1958;Godin 1972).Modern tidal research, such as by Munk and Cartwright (1966), recognizes the importance of nonlinearities and the role of the inherently nonlinear Navier-Stokes equations in water level dynamics.
Part of sea level signal is due to varying barometric pressure, temperature, salinity and other factors, but this is small and usually of less importance except when considering seasonal variation of mean sea level.Damaging to claims that the sun causes periodicities in temperature, or

An Adaptive neuro-Fuzzy Inference System for Sea Level Prediction considering tide-Generating Forces and oceanic thermal Expansion
other aspects of weather, are reported interruptions in solarweather correlations, most conspicuous during the 1920s.Thermal tides are a variation in atmospheric pressure due to the diurnal differential heating of the atmosphere by the sun (e.g., Volkov and van Aken 2004).Munk and Cartwright (1966) first investigated the importance of thermal tides.The results show that the annual variation of sea level, which is the predominate signal in sea level data recorded for example by tide gauges, is highly related to the thermal radiation of the sun.
The harmonic method developed by Darwin (1907) assumes that the tides can be regarded as superposition of different harmonics whose frequencies are known from astronomy (Doodson 1921;Desai 1996).The information can then be used to provide reliable predictions for future tides at the same point.The harmonic method has been widely used due to its remarkably good accuracy.The length of record needed to extract different components depends primarily on the closeness in frequency of the components that are to be extracted and on the lowest frequency of the components chosen.Normally, 369 days of hourly data at a point are needed to extract 20 to 30 constituents with adequate separation of closely spaced constituents using the least squares method.
Artificial Neural Network (ANN) has high functioning with fast computation and a considerable memory to solve the problems concerning extremely nonlinear interactions and complex effective variables, for particular application of ANN to tidal prediction (Vaziri 1997;Deo and Chaudhari 1998;Tsai and Lee 1999;Kumar and Minocha 2001;Mandal 2001;Medina 2001;Walton and Garcia 2001;Lee et al. 2002;Lee and Jeng 2002;El-Rabbany and El-Diasty 2003;Lee 2004;Rajasekaran et al. 2006).Chang and Lin (2006) use tide-generating forces (TGF) of astronomy as inputs in a back-propagation NN model, called the TGF-NN model, to establish the valid relationship between tides and tide-generating forces.The TGF-NN model was examined to be as efficient at a single site as the harmonic method.The extended application of the TGF-NN model at some tide gauge sites next to an original interest site such as the NAO.99b numerical model by Matsumoto et al. (2000) reveals accurately simulated multi-point tides.
In the past, the fuzzy inference system (FIS) has been used to predict uncertain systems and its application does not require knowledge of the underlying physical processes as preconditions.Therefore the FIS has been applied to different subjects, such as reservoir operation (Russel and Camplell 1996;Shrestha et al. 1996;Dubrovin et al. 2002;Ponnambalam et al. 2003;Akyilmaz and Kutterer 2004), and wave study (Kazeminezhad et al. 2005;Őzger and Sen 2007).This paper employs an adaptive-network-based FIS (ANFIS), which combines ANN and FIS, to predict sea level and tides.The ANFIS-based and ANN-based predictions on tides are subsequently analyzed and compared.

the theory of tide-Generating Potential
The forces that are of importance in the tides of the oceans are the gravitational forces, Fa, of the moon and the sun, and the centrifugal force, Fc, due to the movement of the earth in its orbit.These two tide generating forces are depicted in Fig. 1.Both the attractive force and centrifugal force can be decomposed into radial and tangential components.Based on the theory of tidal potential, the tidal displacement of the oceanic free surface due to the moon and the sun is then expressed as (Lamb 1932;Chang and Lin 2006): where G ≈ 6.67 × 10 -11 Nm kg -2 is the coefficient of universal gravitation; R ≈ 6.371 × 10 6 m is the mean radius of the earth; D is the distance between the earth's center and the center of the body; t is the distance between the point X and the center of an attracting celestial body; M is the mass of the body; g is the acceleration due to the gravity of the earth; the subscripts of m and s denote the physical quantity corresponding to the moon and the sun, respectively.A difference between the equilibrium tides computed by Eq. ( 1) and observed tides at near-shore is described in numerous papers.The difference between equilibrium tides and realistic tides may result from non-uniform water depth around the ellipse-like earth and nonlinear interactions between tidal components.Such factors are not considered in the linear astronomical tide theory.The difference is observed to be distinguishable by an artificial neural network that has an advantage of auto-learning the relationship between inputs and outputs.For this case inputs are equilibrium tides and outputs are observed tides.

Annual Sea Level Signal due to thermal Expansion
Various factors affect the volume or mass of the ocean, leading to long-term changes in sea level.The sea-level change is the change in sea level due to thermal expansion and salinity contraction (volume), and the addition of mass due to the melting of glaciers and polar ice sheets (mass).In addition to longer period variations (decadal or longer), three general modes of the sea level variability present in the ocean: (1) inter-annual change -the change, which may have an irregular oscillating nature as well as a linear trend; (2) annual signal, which is a result of ocean-atmosphere interactions at annual frequencies in terms of solar radiation changes, heat fluxes, wind forcing etc.; (3) high-frequency changes (periods less than 1 year) induced by direct wind forcing and current meandering resulting in eddies generation (periods 10 to 100 days).Another type of oscillation at periods longer than 70 days include the propagations of Kevin (eastward) and Rossby (westward) waves across the ocean basins (e.g., Volkov and van Aken 2004).
Tides, atmospheric, and steric (thermal and salinity) effects, and other oceanographic effects are part of the signals in the sea level measured by long-term tide gauges, such as the Hua-Lien tide gauge at the eastern coast of Taiwan.
Here we ignore the salinity or the haleosteric effect of the ocean, as their effect is assumed small, e.g., (Matsumoto et al. 2006).The atmospheric effect assuming inverted barometric (IB) is not removed from the tide gauge sea level, but assumed small.Finally, the thermosteric effect or the thermal expansion of the sea level signal is modeled using a correlation between sea level and sea surface temperature, e.g., also used by Matsumoto et al. 2006 in their ocean bottom pressure analysis of the sea level signals.
The variations of sea level and thermal expansion (modeled assuming a dependence on sea surface temperature only) at the Hua-Lien (HL) tide gauge station between year 2001 and 2002 are demonstrated from a comparison of the 720-hour moving average, as illustrated in Fig. 2 and the 720-hour moving average sea level and SST are applied to calculate the power spectrum density (PSD) using four years of data, as illustrated in Fig. 3. Figure 3 shows that the peak of the power spectrum between average sea level and SST in the 8192 hours is about the annual period.The correlation coefficient value of them is 0.68, and the Fig. 2 shows that the time series of average sea level and SST represent a similar trend and a quasi period of about one-year.The correlation coefficient between the 720-hourly averaged tide at Hua-Lien and SST at Tou-Cheng (TC), Su-Ao (SA), Cheng-Gong (CG), and Lan-Yu (LY), of which the positions are demonstrated in Fig. 4, are 0.71, 0.73, 0.55, and 0.60, respectively.As described above, high correlation coefficients indicate some relationship between the 720-hourly averaged sea level and sea surface temperature.Thus the relationship can be established by a neuron-fuzzy method and estimated by the developed model.Tou-Cheng and Su-Ao are separated in distance by about 106 and 72 km, respectively, from and north of the Hua-Lien tide gauge.Cheng-Gong and Lan-Yu are separated in distance about 102 and 214 km, respectively, from and south of the Hua-Lien tide gauge.The Lan-Yu station is located at a small island away from eastern Taiwan by a distance of about 78 km.Observed sea level and sea surface temperature at these five stations are applied to predicting the sea level signal in the developed model.

brief Introduction to AnFIS
A fuzzy inference system is a frame work, which simulates the behavior of a given system as "IF-THEN" rules through knowledge of experts or past available data of the system.It is a process of how to map a set of given input variables to an output variable based on fuzzy logic theory.The FIS using neural networks has a great advantage in that it can use neural network learning capability, but can avoid the rule-matching time of an inference engine in the traditional fuzzy logic system.Functionally, there are almost no constraints on the node functions of an adaptive network except piecewise differentiability.Structurally, the only limitation on network configuration is that it should be of feedforward type.Due to this minimal restriction, the adaptive network's applications are immediate and immense in various areas.The learning algorithm tunes all the modification as adjusting of the parameters of the membership functions of the input-output variables.ANFIS combines the advantages of both neural networks (e.g., learning capacities, optimization capacities, and connectionist structures) and FIS (e.g., human like "IF-THEN": rule thinking and ease of incorporating expert knowledge).A class of adaptive networks is briefly introduced as follows (see Jang 1993): A Sugeno FIS under consideration defines the consequent variable of each rule as a linear combination of input variables and has a final output as the weighted average of each rule's output.For example, a Sugeno FIS including two inputs and one output and two fuzzy rules can be written as follows: where p i , q i , and r i are the consequent parameters of the ith rule.A i and B i are the linguistic labels which are represented by fuzzy sets.The node functions in the same layer are of the same function family as described below: The architecture of ANFIS including five layers is shown in Fig. 5.
Every node in the first layer is a square node with a node function.
( ) where x is the input to node i.In other words, O i 1 is the membership function of A i and it specifies the degree to which the given x satisfies the quantifier A i .Usually ( ) Ai n is chosen to be bell-shaped with maximum equal to 1 and minimum equal to 0, such as the generalized bell function (2): where a i , b i , and c i is the parameter set.As the values of these parameters change, the bell-shaped functions vary accordingly, exhibiting various forms of membership function on linguistic label A i .Parameters in this layer are referred to as premise parameters.
Every node in the second layer labeled Π is a circle node, which performs a fuzzy intersection operation on the incoming signals from the first layer and delivers the result as the firing strength of a rule defined by the fuzzy subsets A i and B i as follows: Every node in layer 3 is a circle node labeled N. The ith node calculates the ratio of the ith rule's firing strength to the sum of all rules' firing strengths as follows: , O w w w w i 1 2 For convenience, output of this layer will be called normalized firing strengths.Every node i in layer 4 is a square node with a node function: where wi is the output of layer 3 and [p i , q i , and r i ] is the parameter set.
The single node in the last layer labeled Σ computes the overall output by using the weighted average defuzzification method:

Introduction to the tGF-nn Model
The previously developed TGF-NN model by Chang and Lin (2006) has achieved good performance in predicting tides at a station of interest using its data, and the data from other tide gauge sites close to this station.The key input variables are investigated through the tide-generating forces, which are D, t , and θ between the sun and the earth and between the moon and the earth at any time.The JPL Solar System Ephemeris (DE200 series, which is in the J2000 system) (Standish 1982(Standish , 1990) are applied to calculate the parameters of the tide-generating forces as the inputs of the TGF-NN model.The angle { between two vectors from the center of the earth to the center of the sun or moon determines the relative positions of the sun, the moon and the earth.The cos { can be adopted to identify the spring-neap tidal cycle.The seven crucial parameters, / ( ) ( ) cos t { , form an input vector and are related to the output (observed tides at Hua-Lien) in TGF-NN to establish the valid weight and bias matrices.The proposed TGF-NN model that is established by back-propagation, NN, has one hidden layer with five neurons.Five neurons were determined by examining several cases of different numbers of neurons in the hidden layer, and 2-hour lead time inputs were examined to have the smallest target error.The maximum iteration was set to 1500 in the sub-optimal procedure.

the tGFt-Fn Model
A histogram fits for the sea level data at Hua-Lien station for year 2001, and can be represented by the normalized Gaussian function, as illustrated in Fig. 6.Accordingly the Gaussian membership function is chosen in the ANFIS.The 720-hourly averaged sea surface temperature is applied to train the ANFIS for suitably fitting the parameters of the membership functions to tide data.The hourly averaged sea surface temperature and tide are the input and output respectively.Two Gaussian membership functions are chosen in this study indicating that the combine of fuzzy rules and fuzzy membership functions (MFs) are extracted in the present model, as observed from the training and testing processes.The premise and consequent parameters should be properly obtained such that the pattern can be recognized.A least square method is employed to optimize the consequent parameters with the premise parameters fixed and the premise parameters can be adapted by a gradient descent method.Thus, the output functions which are linear functions of the input variable and the output functions with respect to membership functions and rules can be written as follows: 1. IF input is high THEN ouput = -10.37 × input + 237.48 2. IF input is low THEN output = -3.89× input + 129.34 The changes of the shape of the initial and the final membership functions of hourly averaged SST are illustrated in Fig. 7.As stated above, the trained model, establishing the ANFIS output in matching the tide data, is applied to predict the hourly averaged tides, as indicated in this study.
The TGFT-FN model combines the TGF-NN, as proposed by Chang and Lin (2006), and ANFIS associated with tide generating forces and mean sea surface temperature in light of predicting ocean tides.The training data of TGF-NN and ANFIS are equilibrium and mean tides, as demon- strated in this paper.Subtracting the mean tide from the tide gauge data allows for the equilibrium tide to be identified.The predicted ocean tide is observed from equilibrium and mean tide as a result of training both models.The construction chart of TGFT-FN is depicted in Fig. 8.

Model Validity
The tide generating forces and 720-hourly averaged sea surface temperature at Hua-Lien for year 2001 are utilized to establish the TGFT-FN model.Observed mean tides and predicted mean sea level by the TGFT-FN model trained at Hua-Lien are related to the four nearby stations.The correlation coefficients of averaged tides observed at Hua-Lien and the four other stations are shown in the second row of Table 1.The last row of Table 1 indicates the correlation coefficients of averaged tides simulated at Hua-Lien and averaged tides observed at one of four stations.For the observed data at Hua-Lien, the correlation coefficients vary within a range of 0.55 -0.73 shown in the second row of Table 1.For the simulated data at Hua-Lien the correlation coefficients in the last row of Table 1 range between 0.72 -0.91, higher than the former ones.
Commonly the simulation performance of predictors by a method or a model is evaluated by the root mean square (RMS) or the square of correlation coefficient (R 2 ).The RMS and R 2 are defined as: where η 0 (t i ) and η p (t i ) are the observed and simulated sea levels, respectively, at time t i ; N is the total number of data; p h is the mean of all predictors.
The sea level data (includes tides) at Hua-Lien for the

Extended Application of the tGFt-Fn Model to different Stations
The developed TGFT-FN model is applied to directly calculate the sea level (includes tides) at four sites near Hua-Lien, using astronomical input parameters and sea surface temperature.The results of RMS and R 2 are listed in Table 2.The four values of RMS obtained by the TGFT-FN model for the year 2001 range between 7.70 -15.03 cm.These values are smaller than those computed by the TGF-NN model with a range of 13.47 -18.57cm.For 2002, the TGFT-FN model also has better simulation of tides than the TGF-NN model by a difference of 2.75 -4.87 cm.
A relative error is defined as the ratio of root mean square to the mean tidal range (MTR) for representing an alternative criterion of simulation capacity considering the variation of mean tidal range at different points: Tides at a point can be predominantly semidiurnal, predominantly diurnal, or mixed.Their nature is determined by the ratio ( ) ( ) where K 1 , O 1 , M 2 , and S 2 are the amplitudes of four main constituents at a station in general.If F < 0.25, the tides are predominantly semidiurnal and if F > 3.0, the tides are predominantly diurnal.If 0.25 < F < 1.5, the tides are mixed, but mainly semidiurnal, and if 1.5 < F < 3.0, the tides are mixed, but mainly diurnal.The four points from north to south are examined to have F = 0.71, 0.64, 0.49, and 0.42, respectively.F = 0.48 is estimated for Hua-Lien.All values of F at chosen points indicate mixed tides, but mainly semidiurnal.The MTRs at chosen points vary within a range of 65 -109 cm.
The computed Err(%) at five stations for years 2001 and 2002 are list in Table 3. Err(%) obtained by the TGF-NN model for years 2001 and 2002 varies within a range of 6.64 -28.57% and 10.76 -27.09%, respectively.However, Err(%) obtained by the TGFT-FN model for years 2001 and 2002 ranges within 7.06 -23.12% and 7.79 -22.86%, respectively.Err(%) obtained by the TGFT-FN model is generally smaller than that obtained by the TGF-NN model.Additionally, Err(%) obtained at a station far from Hua-Lien is larger than that at a station near Hua-Lien.As expected, this results from the tidal type of the station being gradually different from that of Hua-lien's as the distance between that site and Hua-Lien increases.

concLuSIon
The predominate sea level signal is the seasonal variation caused by the seasonal heating of the ocean by the sun.The observed annual amplitude in sea level due to varying temperature can reach about 15 cm and become important when this signal is considered to improve prediction of sea level (including tides).Applying a 720-hour moving average to the hourly data of tides and sea surface temperature, the mean tides and sea surface temperature are shown to be highly correlated as indicated by high correlation coefficients and there is an implied quasi-period of about oneyear.Thus, that is similar to the Sa component of tides in harmonic analysis, which is inseparable from the annual signal.The mean sea surface temperature is applied to the trained ANFIS to predict hourly averaged sea level data.The proposed TGFT-FN model combines the ANFIS, which is used to simulate the mean tides, and the original TGF-NN model, that considers the tide-generating forces as input parameters in the neural network.
The sea level data and sea surface temperature at Hua-Lien for the year 2001 are applied to establish the TGFT-FN model.The simulation capacity of the TGFT-FN model is found to be better than that of the original TGF-NN model by comparing the RMS and R 2 of calculated tides for the year 2002.The proposed TGFT-FN model can be applicable to directly calculate the sea level (includes tides) at four stations next to Hua-Lien.The TGFT-FN model has better predictive capability of tides at five stations of interest than the original TGF-NN model by having less RMS of about   2 -5 cm and by less relative error of 5%.The proposed TGFT-FN model would be applicable to tidal and sea level simulation in the case where tides at a point are not available so that harmonic analysis cannot be applied.

Fig. 1 .
Fig.1.Geometry of gravitational force and centrifugal force on a point X at the surface of the earth.

Fig. 3 .
Fig.3.The power spectrum density calculated for averaged sea level and sea surface temperature using four years of data.

Fig. 2 .
Fig. 2. Long-term variations of averaged sea level (solid line) and sea surface temperature (dashed line) at Hua-Lien for years 2001 and 2002 using a 720-hour moving average.
of x and y in fuzzy sets A i and B i ."/ " denotes a fuzzy T-norm operator which is a function that describes a superset of fuzzy intersection (AND) operators, including minimum or algebraic product.

Fig. 6 .
Fig. 6.Histogram of tides at Hua-Lien for the year 2001 and fitting of a normal distribution (solid line).
year 2001 are used to train both the original TGF-NN model and the TGFT-FN model and the results are shown in
935 and  8.73 cm, 0.951, respectively.The results indicate that the proposed TGFT-FN model improves the capacity of simulating the tides under consideration of long-term variation of tides as a response to sea surface temperature.The scatter plot of simulated tides by the TGFT-FN model against the observed data for years 2001 and 2002 are depicted in Figs. 9 and 10, respectively.Both figures show that the plotted data with a narrow-banded distribution are located along the line of best fit with a liner function.High correlation coefficients R 2 = 0.963 and R 2 = 0.951 in Figs. 9 and 10 indicate that the proposed model TGFT-FN model is well trained for testing the ocean tides in year 2001 and is practical for accurately predicting the tides in the next year.

Fig. 9 .
Fig. 9. Scatter plot of simulated sea level/tides by the TGFT-FN model against the observed sea level/tides at Hua-Lien station for the year 2001.

Fig. 10 .
Fig. 10.Scatter plot of computed sea level (includes tides) by the TGFT-FN model against observed sea level at Hua-Lien for the year 2002.

Table 2 .
The simulated data by the TGF-NN model corresponding to the observed data have RMS of 6.37 cm and R 2 of 0.977.The simulated data by the TGFT-FN model for 2001 have a higher RMS of 7.32 cm and lower R 2 of 0.963 than those of the TGF-NN model.Both well developed models are examined to directly calculate the tides at Hua-Lien for the year 2002 after input parameters are avail-

Table 1 .
Correlation coefficient of observed or simulated averaged sea level by the TGFT-FN model at Hua-Lien to those observed at the other four tide gauge sites.The RMS and R 2 of computed tides corresponding to the observed ones by both models are 10.57cm, 0.

Table 2 .
Simulation capacity of the TGF-NN or TGFT-FN models examined at five sites for years 2001 and 2002.

Table 3 .
Err(%) computed by the TGF-NN or TGFT-FN models at five sites for years 2001 and 2002.