Remote Sensing Spatiotemporal Assessment of Nitrogen Concentrations in Tampa Bay, Florida due to a Drought

A long-term low nitrogen to phosphorus (N:P) ratio in the Tampa Bay, Florida, estuary system suggests that nitrogen is more limiting than phosphorus. However, south Florida suffered from a drought around 2007, and the reduction in runoff flowing into the bay affected local ecosystem dynamics. This study presents a remote sensing study to retrieve spatiotemporal patterns of total nitrogen (TN) concentrations in Tampa Bay under drought impacts through the integration of Moderate Resolution Imaging Spectroradiometer (MODIS) images and a genetic programming (GP) model. Research findings show that the drought impact on TN in Tampa Bay is both a seasonal and yearly phenomenon. Without the presence of ocean water intrusion, the whole bay would show a relatively uniform TN distribution during the drought period until the flow input from rivers returned to normal. Based on yearly comparisons, temperature could be the limiting factor on the plankton growth in Tampa Bay. To further substantiate the credibility of a nutrient estimation algorithm, a k-means clustering analysis was conducted to demonstrate sea-bay-land interactions among ebbs, tides, and river discharges. The seasonal cluster distribution in 2007 is generally consistent with the conventional segments division of Tampa Bay.


INTRODUCTION
Urbanized regions are more vulnerable to the impacts of climate change. Urban sprawl has led to an increase in impervious surface areas as well as a decrease in the vegetation cover which has weakened urban infiltration and flood control capacity. Recently, analyses exploring extreme climatic events in urban regions have received wide attention due to a possible increase in the regularity and magnitude of hurricane and drought events and an increase in deaths and economic losses due to these events (Karl and Easterling 1999). Recent occurrences of extreme drought events in the east and southeast regions of the United States were seen in Maryland and the Chesapeake Bay area in 2001 -2002, the Peace River and Lake Okeechobee in south Florida in 2006, and Lake Lanier in Atlanta, Georgia, in 2007, lead-ing to studies on their impact, mostly on water availability or water shortages with regard to public needs and ecosystem conservation (Haase 2009). Drought stresses regional ecosystems by increasing the amount of highly concentrated and warmer polluted runoff in the receiving water catchment basins resulting in increased eutrophication of surface waters.
Eutrophication is a major challenge to the ecological health of coastal waters that has received wide-spread attention in both industrialized as well as developing countries around the world (Seitzinger et al. 2002(Seitzinger et al. , 2005Conley et al. 2009). About 65% of estuaries in the United States were impaired as reported by the National Estuarine Eutrophication Assessment (Bricker et al. 1999). Most eutrophication in US estuaries occurs along the Gulf of Mexico (NOAA 2011). Eutrophication in estuaries is generally triggered by overloading nutrients, including nitrogen and phosphorus, resulting in algal blooms and hypoxia, which pose a threat to both marine life and human health. Eutrophication also elevates turbidity levels, which reduces light penetration, causes loss of submerged aquatic vegetation (SAV), and further adversely affect the ecosystem balance of coastal regions. Hence, there is an urgent need to find an approach to support a continuous, long-term and full-scale monitoring of nutrients in coastal regions.
Nutrient concentrations are normally associated with chlorophyll a (Chl-a) concentrations which is the primary indicator of estuarine health. Because Chl-a is strongly correlated with dissolved nitrogen and phosphorus (Brandini et al. 2000;Muslim and Jones 2003), it is the most-used indicator to estimate nutrient levels in water (Muslim and Jones 2003;Pacciaroni and Crispi 2007). Varying with the distribution of phytoplankton, the Chl-a concentrations can change the absorbance of natural radiation, thereby providing a tool for water quality monitoring using remote sensing (Bagheri and Dios 1990;Lavery et al. 1993;Thiemann and Kaufmann 2000;Volpe et al. 2007).
Moreover, marine systems are generally considered nitrogen limited (Smith 1984;Nixon et al. 1996;Howarth and Marino 2006). Nitrogen is a more limiting factor than phosphorus in Tampa Bay, a finding supported by the results of nutrient addition bioassays. Therefore, an expression derived from Chl-a is required to discover the highly nonlinear relationship between total nitrogen (TN) and Chl-a. Hydrologic conditions might be the mediating influences in the relationship between TN and Chl-a; the hydrologic conditions in southwest Florida are highly seasonal, and south Florida suffered from a drought around 2007. The variation of the runoffs flowing into the bay can affect the ecosystem dynamics in the bay resulting in differing absorbance and reflection of natural radiation on the surface of the water. As a consequence, this change can be monitored by space-borne sensors, including but not limited to, the Moderate Resolution Imaging Spectroradiometer (MODIS) (Shutler et al. 2007), Sea-viewing Wide Field-of-View Sensor (SeaWiFS) (Erkkila and Kalliola 2004), Coastal Zone Color Scanner (CZCS) (Gordon et al. 1988), Medium-spectral Resolution Imaging Spectrometer (MERIS) (Bricaud et al. 1999) and Ocean Color Temperature Scanner (OCTS) (Kawamura and the OCTS Team 1998).
This study aims to determine the spatiotemporal patterns of nutrients in Tampa Bay with the aid of MODIS reflectance bands and genetic programming (GP) models. We investigated the following questions: (1) Can TN concentrations be estimated using machine learning models (GP models) with MODIS reflectance bands and associated water quality parameters? (2) Which MODIS reflectance bands are most influential on TN concentrations? (3) How do nitrogen concentrations change during a drought year? (4) Will nitrogen concentrations remain stable after a drought? (5) What comparable computational intelligence approaches might perform a spatiotemporal assessment of nitrogen concentrations in Tampa Bay, Florida when experiencing a drought?

Background
Tampa Bay, Florida, the largest open-water estuary in Florida with an area of approximately 1031 km 2 , receives the outfall from terrestrial wastewater treatment plants, urban stormwater, and agricultural runoffs from four main rivers: the Hillsborough River, Alafia River, Manatee River, and Little Manatee River. The Tampa Bay estuary is composed of four bay segments, Old Tampa Bay, Hillsborough Bay, Middle Tampa Bay, and Lower Tampa Bay. The estuary spans an area of about 1000 km 2 and has an average depth of 3.7 m (Fig. 1). Old Tampa Bay receives runoff from Lake Tarpon; Hillsborough Bay receives river water from Hillsborough River and the Alafia River; and Little Manatee and Manatee rivers drain into Middle Tampa Bay and Lower Tampa Bay, respectively. Nitrogen sources in this region include point sources, nonpoint sources, material losses, atmospheric deposition, septic tank leachate, and contribution from groundwater and springs. Annual climatological summaries from Station "ST PETERSBURG AP" (27.75°N, 82.61°W) (http://gis.ncdc.noaa.gov/map/acs/#) represent the hydrologic conditions in Tampa Bay from 2002 to 2010 (Fig. 2). Years 2007 and 2008 were known as typical drought years with higher temperature and lower precipitation. Year 2007 had the peak temperature and the third lowest precipitation. Because of insufficient data at the mouth of the Manatee River basin, the average flow rates of three of the four major rivers basins -Hillsborough River, Alafia River, and Little Manatee River -were collected on a monthly basis by the United States Geological Survey (USGS) National Water Information System (Fig. 3). Three rivers had a similar pattern of flow rate: higher flow rate in summer and fall, and lower flow rate in winter and spring. A gradually increasing trend of flow rate was observed for all three rivers. Note that the negative flow rate for Alafia River in winter 2007 indicates that the bay water flowed backward into Alafia River, which had low levels due to the synergy of high evaporation and low precipitation.
The goal of this study was to develop the relationships between in situ TN concentrations in the Tampa bay area and corresponding satellite-derived estimates. SeaWiFS Data Analysis System (SeaDAS) has the MODIS Aqua data with bandwidth between 405 and 683 nm (Table 1). Other secondary products such as Colored Dissolved Organic Matter (CDOM) and Chl-a are derived from bands with different wavelengths, which could co-exist and co-vary; thus, we only used primary bands related to coastal water detection to investigate the relationships between them and TN concentrations by using a GP model. Finally,Bands 8,9,10,11,12, and 14 were selected to use in GP modeling after further removing some bands for land use. All the raw data were processed by the SeaDAS software package and exported as ASCII files. To increase the credibility of MODIS data, cloud-cleaning images were screened out among the downloaded data from 2007 to 2009. The synchronous, ground-truth total nitrogen (TN) data were downloaded from a dedicated online web database (http://www.tampabay.wateratlas.usf.edu/) of the Tampa Bay Water Atlas. The workflow of MODIS Aqua image processing and machine learning was as follows: (1) arrange the sampling locations (Fig. 4) and dates of the in situ data points; (2) import both ground truth measurements and MODIS images into ArcGIS; (3) join the data from both ground truth measurements and satellite observations and extract transformed MODIS image pixel values that are temporally and spatially synchronous to in situ data; (4) export the combined data into a GP modeling platform; and (5) perform GP nonlinear regression analysis. The distribution dates of both MODIS images and ground-truth data used to calibrate and validate GP models were recorded (Table 2).
When the GP computational process was complete, the top 30 models with the highest level of fitness based only on the training data fitness were saved and listed in the report. Among the top 30 models, the best models may frequently have poor fitness based on the applied data, a condition referred to as "over-fitting." Therefore, only the GP model with both training data and applied data fitness high enough to be among the top 30 models would eventually be chosen as a candidate. Once the GP nonlinear regression analysis was complete (i.e., when the relationship between pointmeasured TN data and pixel values were determined), the entire MODIS image was transformed inversely based on the derived equations. Finally, TN maps were processed and exported from ArcGIS.

K-Means Clustering Analysis
To estimate the inter-zonal transfer among bay seg-  by several researchers across several disciplines (Lloyd 1982 andMacQueen 1967). The k-means function is composed of a two-step iterative process: step 1 is to assign each data point to its closest centroid, and step 2 is to relocate the cluster centroid until the distance between the centroid and the data points no longer changes. The k-means clustering algorithm is superior than the traditional numerical simulation model in terms of its relatively simple implementation, ability to partition data into subsets based on non-linear relationships between characteristics of the ecosystem, and its ability to work with relatively noisy or incomplete input data. Clustering analysis has been used for nutrient control management in different regions (Niederhauser and Schanz 1993;Thornton et al. 2002;Waters et al. 2010); however, few studies were found to compare clustering analysis against remote sensing technology. We used these two approaches to assess the spatiotemporal nitrogen concentrations in Tampa Bay during drought impact and compare their performance. IBM SPSS Modeler ® , Version 14.1 was used for the kmeans clustering analysis. Grid-based TN values along with location information (latitude and longitude) were extracted from each GP-derived TN map and imported individually into k-means clustering models. Each clustering model was run with 7 scenarios using 4 -10 clusters to determine the optimal cluster number according to the spatial distribution pattern of TN concentration. The default selections in the software were used to construct the k-means clustering models. The optimal model was selected from models with the minimal clusters containing less than three sample points (Fig. 5), based on the cluster quality (silhouette of cohesion and separation) and minimal standard deviation between sample parameters within each cluster.

RESULTS AND DISCUSSION
To estimate changing levels of TN in Tampa Bay through a machine-learning-based regression model, 103 data points (88 for calibration and 15 for validation; Table  2) were used to develop the proposed GP models. Correlation was examined between measured and estimated TN concentrations with the R-square values of 0.75 based on the calibration dataset and 0.63 based on the validation (unseen) dataset (Fig. 6).
In our study, Discipulus TM sorted the 30 best models from millions of GP-based models and analyzed how often each input was used in the selected programs. A value of 1.00 (1.00 = 100%) indicates that this input variable appeared in all top 30 programs (i.e., the variable plays the decisive role in the prescribed model). The frequency of use of all input variables of interest for the top 30 models during the GP-based evolutionary process (Table 3) indicates that the in situ TN concentration correlated highly with six selected MODIS bands for ocean color use; Bands 8, 10, 11,  ments, a k-means clustering analysis was deployed to partition the dataset to evaluate the spatiotemporal patterns of water quality in Tampa Bay. The k-means clustering analysis is a data-mining technique; its algorithm have been used Fig. 5. Flowchart of the methodology in this study.
The spatiotemporal patterns were delineated and analyzed based on the seasonal snapshots of GP-derived TN distributions (Fig. 7) with regard to the higher TN in the spring, summer, and fall and lower TN in winter. In addition, the TN concentrations remained relatively uniform among four bay segments in spring, summer and fall due to the low inflow from major rivers. The Lower Tampa Bay area normally has the best water quality compared to the other three bay segments; however, the spatial distribution of TN in the winter of 2007 appeared to be reversed. After reviewing the average flow rates (Fig. 3), it becomes obvious that the decrease of TN concentration of Hillsborough Bay and Middle Tampa Bay was caused by the backward flow from Tampa Bay into the Alafia River, which also led to the TN aggregation at the Alafia River mouth. The higher TN concentration in Lower Tampa Bay could be explained as ocean water intrusion from the Gulf of Mexico or possibly contribution from the Manatee River.
To answer the question, "Will the nitrogen concentrations remain stable after drought?" data from 2008 and 2009 were used for a yearly-based comparison. Because all three rivers have a common high flow period around the fall season, two sets of data available within that time window were used to verify if the input is the dominant factor for the TN concentration changes; that is, would the TN concentration in Tampa Bay increase year by year with the recovery of inflow rate from three rivers after the drought period? The TN concentration did not rise as expected due to the increase of river inflows, however (Fig. 8); instead, it dropped to half that in 2007 demonstrating a reconstruction of the hydrology and ecosystem throughout the Tampa Bay during the transition from drought to normal conditions. A traditional spatial pattern (higher TN concentrations in upper bays and lower TN in lower bay) reappeared from 2008 through 2009. Based on the yearly based temperature comparison (Fig. 2), temperature also could be the limiting factor on the plankton growth in Tampa Bay. Overall, the results (Figs. 7 and 8) strongly confirm the calibration and validation of the GP model, and the model was shown to be transferable in other years in Tampa Bay.
A k-means clustering analysis was introduced in this study to enrich the comprehensive techniques for remote sensing assessment of spatiotemporal nitrogen concentrations under the impact of drought. In addition to gathering and grouping the data points with potential links, the colors indicating different clusters in the k-means clustering model output also can be considered as tracers tracking the flow from different watershed sources. This information elucidates the interaction among bay segments and leads to a better understanding of nutrient transport throughout Tampa Bay via sea-bay-land interactions among ebbs, tides, and river discharges over time.
In contrast to the relatively uniform TN spatial distribution patterns (Figs. 7a, b, and d), the seasonal cluster distributions in 2007 (Fig. 9) were unexpectedly highly consistent with the conventional division segments of Tampa Bay (Fig. 1) except in the fall of 2007. This captures the unexplained pattern showing the extremely complex hydrogeologic phenomena during the alternate inversion between ocean water and bay water. The water in the Middle Tampa Bay area stretching into Hillsborough Bay validates the backward flow from the bay into Alafia River (Fig. 9d). The outspread of water flow from Lower Tampa Bay into Middle Tampa Bay also corroborates the conjecture of ocean water intrusion. The same color in both Hillsborough and Middle Tampa bay (Fig. 9e) implies that the Alafia River contributed most of the nitrogen loading toward both areas, a finding supported by the low flow rate of Little Manatee River in fall 2008 (Fig. 3). Four points with higher TN concentrations were grouped and highlighted to mark the potential point source of pollution. When the flow input from rivers recovered in fall 2009, the turbulence and more intermediate zones caused by rushing inflow are apparent (Fig. 9f). In summary, a k-means clustering analysis following a GP-derived TN assessment enabled us to present a clearer and more detailed picture of the impact of drought on the spatiotemporal TN concentration distribution in Tampa Bay.

CONCLUSIONS
This study confirms that MODIS images can be correlated with estimated TN values to explore the spatiotemporal patterns of TN concentrations in Tampa Bay, Florida. By using the MODIS-based GP models that derived the highly nonlinear structure between Chl-a-related band data and nutrient concentrations in coastal waters, the potential of machine learning capacity was confirmed. The impact of drought on TN in Tampa Bay is both a seasonal and yearly phenomenon as indicated by the contrast between relatively uniform TN distribution across the whole bay during the drought period and water quality conditions typically observed with the exception of ocean water intrusion. We also found that the spatial TN concentration pattern returned to normal after drought, and the model was shown to be transferable to other years in Tampa Bay.
Based on the yearly comparisons, temperature could be the limiting factor on the plankton growth in Tampa  Bay in addition to the presence of TN. To further substantiate the credibility of nutrients estimation algorithm, the k-means clustering analysis was conducted in this study to better demonstrate sea-bay-land interactions among ebbs, tides, and river discharges in a drought year. Generally, the seasonal clusters distribution in 2007 is consistent with the conventional segments division of Tampa Bay. A seires of complex hydrogeologic phenomena during and after the drought period were also successfully captured by the kmeans clustering analysis. Therefore, spatiotemporal analysis using remote sensing GP model followed by the k-means clustering analysis is highly recommended for multitemporal change detection of time series in situ surface properties in coastal bays.