An Effective Gap Filtering Method for Landsat ETM+ SLC-Off Data

The Landsat 7 Enhanced Thematic Mapper Plus (ETM+) scan line corrector (SLC) failed on 31 May 2003, causing the SLC to turn off. Many gap-filled products were developed and deployed to combat this situation. The majority of these products used a primary image taken by the SLC when functioning properly in an attempt to correct SLC-off images. However, temporal atmospheric elements could not be reliably reflected using a primary image, and therefore the corrected image was not viable for use by monitoring systems. To bypass this limitation, this study has developed the Gap Interpolation and Filtering (GIF) method that relies on one-dimensional interpolation filtering to conveniently recover pixels within a single image at a high level of accuracy without borrowing from images acquired at a different time or by another sensor. The GIF method was compared to two other methods—Global Linear Histogram Match (GLHM), and the Local Linear Histogram Match (LLHM)—both developed by National Aeronautics and Space Administration (NASA) and United States Geological Survey (USGS) to determine its accuracy. The GIF method accuracy was found superior in land, sea, and cloud imaging. In particular, its sea and cloud images returned Root Mean Square Error (RMSE) values close to or less than 1. We expect the GIF method developed in this research to be of invaluable aid to monitoring systems that depend heavily on Landsat imagery.


InTRODuCTIOn
The Landsat satellite series began with the launch of its first satellite in 1972. By the launch of its eighth satellite in February of 2013, the program had traced the Earth over the same broadband spectrum for just over forty years. The Landsat maintains an observation cycle of 16 days. Its images have aided landslide, volcano, forest fire, oceanography, and agriculture monitoring systems (Sidjak 1999;Nichol and Wong 2005;Lagios et al. 2007;Bédard et al. 2008;Masek et al. 2008). However, on 31 May 2003 as the satellite was in mid-orbit, its scan line corrector (SLC)which makes adjustments for the curvature of the Earth's surface-failed. This failure resulted in the satellite transmitting Landsat 7 Enhanced Thematic Mapper Plus (ETM+) images suffering a pixilation loss of roughly 22% (Digital Number; DN). This satellite state is referred to as SLC-off. Being unable to acquire images at a constant interval according to the satellite's flight direction, the Landsat in its SLC-off state renders images with gaps that tend to widen out from the image's bottom (Fig. 1).
Because of the Landsat 5 Thematic Mapper (TM)'s inability to capture images, a hoard of scientists tried in succession to fix the Landsat 7 ETM+ SLC. These attempts were unsuccessful (USGS and NASA 2003). Ten years later in February 2013, the Landsat 8 Landsat Data Continuity Mission (LDCM) was launched successfully. The dearth of data during that 10-year span from 2003 to 2013 is difficult to surmount. As a result, many methods to minimize a similar loss in imagery data were developed, with such efforts continuing to this day.
The methods presented as solutions may be classified into two groups. First are those methods that approximate pixels by interpolating via spectrum data alone without any request for additional imagery. Such a geostatistical interpolation method, through Kriging and Co-Kriging application to calculate pixels, offer the advantage of being able to recover images even without reference to SLC-on imagery or images acquired by other sensors. However, because Kriging and Co-Kriging approximate pixels based on distance through two-dimensional interpolation, there is a probability for calculation error due to incorrect spectrum data in diverse topography or land areas where the distribution of pixels cannot reach (Zhang et al. 2007;Pringle et al. 2009). The second method seeks to replace lost pixels with Landsat 7 ETM+ SLC-on imagery or imagery acquired by sensors like the Landsat 5 TM, Satellite Pour I'Observation de la Terre (SPOT), or the Moderate Resolution Imaging Spectroradiometer (MODIS). The United States Geological Survey (USGS) and the National Aeronautics and Space Administration (NASA) developed the Global Linear Histogram Match (GLHM), Global Nonlinear Histogram Match (GNHM), and the Local Linear Histogram Match (LLHM) (Scaramuzza et al. 2004;USGS 2004;Storey et al. 2005)-methods which use two Landsat 7 ETM+ SLCon images to insert pixels that compensate for the regions lost after radiometric correction. Compared to the GLHM and GNHM methods, LLHM provides decidedly improved results, but shows significant error in certain regions not reflected in SLC-on imagery such as clouds. Other methods were also presented including methods based on LLHM and taking satellite imagery from different times to replace lost pixels. The fact remains that numerous images are required to recover even one SLC-off image, and therefore such methods carry the disadvantage of requiring a much more complicated process for data processing compared to GLHM and LLHM (Storey et al. 2005). Following the introduction of these methods, Maxwell et al. (2007) put forth a method that extracts segments from SLC-imagery and inserts the values representing each of these segments into the applicable lost region. The drawback of this method is that sometimes values cannot be assigned to smaller segments contained within a gap of lost data, making full image recovery impossible (Roy et al. 2008). MODIS introduced a method for recovering imagery using satellite spectrum data, but there were limits in recovering Landsat 7 ETM+ imagery-with its 30 m resolution-into 1 km low-resolution MODIS satellite imagery (Chen et al. 2004(Chen et al. , 2012. SLC-on discovered a way to recover an identical pixel by assigning weighted values to similar pixels (taken from satellite images or images from other sensors) extracted from adjacent regions based on the pixel's level of influence on the image. As mentioned before, those correction methods that use SLC-on imagery or imagery information gathered by other sensors have the advantage of being able to accurately recover the edges of an object despite wide gaps on either side of an image's exterior. On the other hand, their drawback is that they rely on imagery collected on cloudless days and have difficulty in correcting for snow, vegetation or other topographical elements. In addition, because they calculate based on pixels it takes much longer to recover an image as opposed to methods that rely solely on spectrum information. This notwithstanding, corrective methods us-ing primary images to solve the SLC-off problem continue to be used extensively due to their high accuracy when compared to the former method (Bédard et al. 2008).
This study introduces a new and accurate method for correcting SLC-off imagery via single image spectrum data. Because this method corrects imagery through one-dimensional as opposed to two-dimensional interpolation, data processing is quick and simple. When the Gap Interpolation and Filtering (GIF) method's output is compared to that of GLHM and LLHM, the GIF method shows remarkable improvement. Moreover, when comparing the GIF method correction results over the same area against those of both the GLHM and LLHM, the GIF method brings remarkably improved Root Mean Square Error (RMSE) values and is unaffected by temporal atmospheric cover or atmospheric phenomena such as clouds, returning a stable level of accuracy. This new correction method is called the GIF method. Its algorithm is currently undergoing review for an international patent (Lee et al. 2013).

GLhM and LLhM Methods
Because atmospheric conditions affect images day in and day out, radiometric correction is necessary when using historical imagery to correct an image. The GLHM defines the statistical relationship between the distribution of selected primary imagery and SLC-off imagery. In order to create identical statistic properties, it adjusts the pixels of the primary image and inserts this value into the lost region (Beck 2003;Storey et al. 2005).
Here X is the selected primary image for correction and Y is the SLC-off image. G and B are calculated via the following formula.
X is the average of the primary image X; Y is the average of the SLC-off image Y, X v the standard deviation of X, and Y v the standard deviation of the SLC-off image Y. Through the radiometric correction computed using this simple formula, the correct pixel to be inserted into the lost region can be found. Commonly it is set based on a primary image unaffected by atmospheric conditions like clouds. If clouds are present in the SLC-off image to be corrected, their presence will not be reflected in the corrected image, causing a large error. This issue may also arise because of seasonal vegetation or agricultural land or by coastal areas affected by the temporally changing algal bloom phenomenon (Storey et al. 2005).
Because gaps occurring in Landsat 7 ETM+ SLC-off images have pixels of up to 14 on bands with a 30 m resolution, pixels of up to 7 on thermal infrared bands with a resolution of 60 m, and pixels of up to 28 on panchromatic band with a resolution of 15 m. Data processing is identical to that of GLHM with lost pixels forming a 19 × 19 window on 30 m, a 11 × 11 window on thermal infrared bands with a resolution of 60 m, and a 31 × 31 window on panchromatic bands with a resolution of 15 m. This is referred to as LLHM. Compared to GLHM, clouds and temporally changing land surface are well reflected, but as expected, there are errors due to clouds or other seasonal sensitive topography like vegetation, farmland and seacoasts (Storey et al. 2005).

GIF Method
The GIF method can correct lost regions within an SLC-off image without using SLC-on imagery or imagery from other sensors. Therefore, it is unaffected by atmospheric conditions like clouds or seasonally changing elements like vegetation, farmland, and seacoasts. Existing methods for interpolating within a current SLC-off image all feature two-dimensional interpolation. We have chosen to use onedimensional interpolation as the X-axis of an SLC-off image because it is narrower than the Y-axis. Figure 2 is a mimetic diagram of the GIF method's data processing. The one-dimensional, Y-axis-based interpolation explained in the picture works using monotone cubic Hermite interpolation, while the X-axis-based smoothing filtering works using the Savitzky-Golay method. Algorithms in previous studies for solving the SLC-off problem use complicated computational procedures and require a SLC-on optical satellite scene. We suggest using a simple interpolation algorithm based on one-dimensional processing in this study. Figure 3 is a view that explains the first interpolation method and a second interpolation method according to another exemplary embodiment of this study. When carrying out the onedimensional interpolation, the first interpolation unit carries out the one-dimensional interpolation for a missing pixel using normal pixels, a Y-axis based on the missing pixel in each satellite image according to the spectral information (Fig. 3a). The second interpolation unit carries out twodimensional interpolation for the satellite images in which each one-dimensional interpolation is carried out according to the spectral information. The second interpolation unit carries out two-dimensional interpolation for a missing pixel using the normal pixels within a predetermined range based on the missing pixel in the satellite images in which the one-dimensional interpolation is carried out (Fig 3b). Cubic spline interpolation, by interpolating through a value that supersedes maximum and minimum regions containing primary data, is often deemed unsuitable by scientists (Fig. 4). To solve this issue a monotone cubic Hermite interpolation technique that interpolates by protecting the monotonicity of circular data was developed (Fritsch and Carlson 1980). The present research unfolds under the presumption that lost pixels along the Y-axis do not exceed the boundary of the pixels anterior and posterior values to the lost region. Its equation is as follows: If the interpolating data points are (x k , y k ), and k = 1, …, n, the slope between these two points is: As the mean of the line connecting these two points, the tangent of all data points is as follows.
should k a or k 1 bbe computed to be less than zero, the input data points are not strictly monotone and (x k , y k ) is a local maximum value. In such case piecewise monotone curves m k = 0 can be set to zero and maintains monotonicity. To ensure monotonicity and prevent overshoot at least one of the following conditions must be met: When the above is equal to or greater than zero, or + -must be true. If monotonicity must be strict then ( , ) z a b must have a value strictly greater than zero. One simple way to satisfy this constraint is to restrict the magnitude of vector ( , ) and b is calculated in the same way (Fritsch and Carlson 1980).
If one of these conditions is true, using x k , y k , and m k , it is calculated the same as the cubic Hermite spline (k = 1, …, n). If x upper is the least value of all values greater than x, and x lower is the greatest value of all values lesser than x, x lower ≤ x ≤ x upper .
t h The interpolation (IP) is calculated as follows.
Where, h 00 , h 01 , h 10 , and h 11 are the cubic Hermite spline functions (Catmull and Rom 1974). Figure 5b is the result of monotone cubic Hermite interpolation on the Y-axis of the SLC-off Landsat image Fig. 5a. The width of the Y-axis gap in narrower compared to the width of the X-axis of the SLCoff image and the width of the Y-axis gap is narrower. This proves that Y-axis-based one-dimensional interpolation is an improvement over two-dimensional interpolation (Fig. 5b). However, as seen in Fig. 5c, because the pixel transformation along the X-axis is ignored, there is a distortion in the pixels. The Savitzky-Golay smoothing filter is applied to compensate for this. Savitzky and Golay (1964) presented an algorithm that finds the polynomial method of least square with a k-degree best revealing the surrounding points, and identifies data values at these points when given data of a certain width, including noise. This smoothing filter does a relatively good job of preserving the minimums and maximums, and the widths of the peaks of a given range of data. We can assume the X-axis pixel distortion seen in Fig. 5c is due to noise in its persistent pixel value. The Savitzky-Golay smoothing filter is used to decrease the noise present in the X-axis pixels. Its equation is below.
Here S is a pixel of the SLC-off image interpolated along the Y-axis, S * is the smoothing filtering result, and C i the number of pixels from the SLC-off image i outputted from the Savitzky-Golay smoothing filter. N, is the number of pixels counted within the filter window, and is the same as the filter size. j is the number of pixels in the SLC-off image interpolated one-dimensionally along the Y-axis. It counts the pixels as the filter window moves along index j. The filter size is 2m + 1. Here m is half the size of the filter window (Savitzky and Golay 1964;Chen et al. 2004Chen et al. , 2011. Because the correction quantity increases per any increase in the filter size, its curves are smooth. No blurring occurred throughout the present research and the length of the filter that guards against pixel contorting noise was set to 5 pixels (m = 2). Here the value of the filtered pixel S * is expressed as (Savitzky and Golay 1964;Steinier et al. 1972 Figure 5d shows the corrected pixel results being reinserted into the lost region via the Savitzky-Golay smoothing filter. Compared to Fig. 5c the pixel transformation along the Xaxis is much smoother.

RESuLTS
Three SLC-on Landsat 7 ETM+ images were collected, masked and simulated into SLC-off images to test the GIF method accuracy. Each image was run through the GLHM, LLHM, and the GIF method. The results from each method were compared to each respective SLC-on image to calculate the RMSE values (Fig. 6). Figure 6 shows the data processing from the present research. Table 1 shows the imagery data collected. Two tracks of images with regionally distinct factors were collected. On  the track where path/row = 039/028, images where clouds were both present and not present were collected. Primary images with adjusted pixel values from the GLHM and LLHM are needed to match the statistical distributions between the primary SLC-on image and the SLC-off image to be corrected. When this is so, the primary image is closest in time to the simulated SLC-off image, and collected images taken during the same season when clouds are not present can reduce errors in the correction process (Storey et al. 2005). This time interval was short in this research; images from 29 April 2000 (path/row = 116/036) and 26 August 2001 (path/ row = 039/028) were collected as primary images to be put through the GLHM and LLHM (Table 1). Figure 7 shows the results from applying the GLHM, LLHM, and the GIF method to three images to solve a Landsat 7 ETM+ (SLC-on) and a simulated SLC-off phenomenon.

Gap-Filled Results of the Simulated SLC-Off Images
The GLHM, which showed the poorest results on all three images, used primary images taken during the same season for correction and consequently showed a relatively high land imagery correction ratio compared to temporally sensitive sea images or those with clouds (Figs. 7d -f). The LLHM, which used the same primary image as the GLHM. Comparison to Figs. 7d -f shows improved results in Figs. 7g -i. However, there is correction error in some regions where vegetation growth differs from that in the primary image, such as with the algal bloom phenomenon. The GIF method developed in this research provides correction results nearly identical to the original image when seen by the naked eye, despite correction performed with no additional imagery data .
RMSE values were calculated to differentiate each set of results from the original images (Table 2). The value of the image taken on 16 April 2001 featuring the algal bloom phenomenon in a sea region was low by both the LLHM and the GIF method. However, as confirmed by Fig. 7g the LLHM results show errors in certain areas where the algal bloom boundary is unclear. Comparatively, the GIF method returned RMSE values less than 1 on all bands with fantastic two-dimensional correction (Table 2). Bands 4, 5, 6-1, 6-2, and 7, recording infrared wavelengths, showed little reflection on water. Therefore, RMSE values on these bands were low irrespective of the correction method used. In particular, the RMSE values for GLHM bands 6-1 and 6-2, recorded with thermal infrared sensors gauging the temperature of the earth face, were close to zero. This is presumably because this study used primary images from the same season with hydrothermal temperatures that are relatively stable. Within this narrow area of research, a similar hydrothermal temperature was distributed. However, besides the GLHM bands that do not have an infrared wavelength range, high RMSE values were seen on all other bands. In addition, on bands 1 and 2, which were most sensitive to the maritime algal bloom phenomenon, the RMSE values were 9.78 and 7.23 respectively. Except for sea imagery, on bands 6-1 and 6-2 with land and cloud formation imagery, the GLHM values were vastly higher (Table 2).
In a land image from 13 August 2002 with the same primary image as mentioned before, the GLHM returned RMSE values lower than the sea and cloud images. Furthermore, all bands, besides 4 and 6-2, showed the same RMSE values as the LLHM and GIF methods (Table 2). However, despite the low RMSE values, two-dimensional correction brings lackluster results and thus should be considered unsuitable for use by various monitoring systems (Fig. 7e). In this study, the GIF method returned the lowest RMSE values on all bands except for band 1 in correcting the image from 13 August 2002. Two-dimensional correction also did well (Fig. 7k, Table 2).
The two-dimensional GLHM correction results on the clouds image from 29 August 2002 were not only less than par, but returned RMSE values greater than 30 on the most bands (Fig. 7f, Table 2). Compared to the GLHM, the LLHM results were relatively better and returned lower RMSE values. However, pixels in certain cloud formations became distorted, as seen in Fig. 7i. This phenomenon is confirmed by Storey et al. (2005). On the other hand, the GIF method showed excellent correction results in the clouds image despite not relying on a primary image. A RMSE value was returned of approximately 1 over all bands (Fig. 7m, Table 2).
In comparison to those of the other bands from the same timeframe, low RMSE values were observed on all images used in this study on bands 6-1 and 6-2 due to low resolution of 60 m corresponding to the half of the resolution comparison with the other bands. As for the GIF method, which interpolates a single image for its correction, the lowest RMSE values were returned on the majority of the bands seen in all of the images used. RMSE values were especially low-at about or less than 1-on sea and cloud images (Table 2).

Accuracy of GIF Method
The gap-fill correction accuracy is not aptly gauged by RMSE values alone, as images with low RMSE values may have errors in their two-dimensional interpolation, as shown in Fig. 7g. Accordingly, this study engaged in additional accuracy analysis to generate a DN scatter plot for the correction results on original images corrected via the LLHM and GIF methods (Fig. 8). The blue scatter plot in the graph of Fig. 8 represents the correction results for the sea image taken on 16 April 2001. The red scatter plot shows the results for the land image taken on 13 August 2002. The green scatter plot displays the cloud image results taken on 29 August 2002. As seen in Fig. 8, the scatter distribution calculated via the GIF method tended to be denser overall in images for all bands compared to the LLHM scatter distribution.
Because changes in land topography are more pronounced, such changes can occur in even narrow gaps. For both the LLHM and the GIF methods, the land scatter density is significantly loose compared to cloud or sea scatter plots (Fig. 8). On all land image bands, scatter density was higher on the GIF method correction results compared to the LLHM. In order to confirm this numerically, a coefficient of determination defined R2, which reflects how much original data a regression straight line equation will reflect was calculated. In land scatter plots the coefficient of determination for the GIF method, was on average, improved by 0.02 compared to the LLHM method. In particular, the coefficient of determination on bands 1, 5, and 7 gave better than average results with a higher scatter density (Fig. 8).
The coefficients of determination for cloud scatter was close to 1 on all bands processed using the GIF method, and almost matched identically with linear regression (Fig. 8)  = + , a is the slope of the straight line, with b as the constant defining the y section. The cloud scatter for the GIF method carried a close to 1 on all bands and b close to zero (Fig. 8). These figures mean the GIF method shows results similar to those in the original image. Therefore, unlike with the LLHM, which brings about errors in cloudy regions, the GIF method is capable of great cloudy imagery results (Storey et al. 2005).
As for sea scatter plots, their pixel distribution is narrower than that of land scatter plots. This phenomenon is particularly noticeable on infrared wavelength bands. This is because, as mentioned previously, water reflection is lessened on infrared wavelength bands. On the graphs of bands 4, 5, and 7, all sea scatter plots held a DN value less than 50 (Fig. 8). The sea scatter plots seem to show dense distributions on all bands processed by both the LLHM and GIF methods. However, the GIF method brought a coefficient of determination, R2, improved on average by 0.05. Amongst the 3 scatter plots, the coefficient of determination for the sea scatter plot improved by the greatest margin when processed via the GIF method as opposed to the LLHM. Improvement in the coefficient of determination means the regression straight line is better reflecting its data. If we think of this in another way, the GIF method results are a better fit with the original images. Therefore, as with cloud images, the GIF method is more accurate in sea image correction when compared to the LLHM.

DISCuSSIOn
The RMSE value analysis results reveal the GIF method returned RMSE values that were greater across the board than those produced by the other methods. Specifically, they were on average 1.42 times greater than GLHM and 1.3 times greater than LLHM on land images, 4.11 times greater than GLHM and 1.28 times greater than LLHM on sea images, and 40.02 times greater than GLHM and 2.74 times greater than LLHM on cloud formation images. The GIF method's superior accuracy over two-dimensional correction methods was also confirmed (Figs. 7 and 8, Table 2). Even if it does not need a SLC-on primary image, the GIF method provides a particularly higher level of correction accuracy on images with sea or clouds than land covering extreme changes in topography. This is because the GIF method corrects via the pixel value of a single image and therefore the data on vastly changing topography within a gap cannot be comprehended. Accordingly, the GIF method is better suited for use on sea or mountainous area, plains and farmland images than urban centers. Because it reflects changes in time, it is well suited for use by systems monitoring landslides, volcanoes, forest fires, seas, and agricultural land. However, because the GIF method shows a slump in accuracy as gaps widen, if the areas being researched are far from the bottom of the Landsat 7 ETM+ SLC-off image, the GIF method cannot guarantee perfect correction results. As long as the gaps within an image are less than 30 m resolution and fewer than 7 pixels, the GIF method will provide excellent correction results. Table 3 shows the time exhausted by each correction method in calculating one band under identical conditions. The LLHM uses the pixel as its unit of calculation, and therefore requires the longest amount of time. The GLHM showed the fasted processing time. However, the GLHM's low accuracy means it is unsuitable for use by monitoring systems despite its speed. In contrast, the GIF method processes data some 108 times faster than the LLHM and with superior accuracy. Because it does not use additional imagery, only 3 minutes is required to process a Landsat 7 ETM+ SLC-off image from a particular time. For these reasons, the GIF method should contribute to an increase in open-source Landsat 7 ETM+ SLC-off image use.

COnCLuSIOnS
With the failure of the Landsat 7 ETM+ on 31 May 2003 SLC, the Landsat 7 ETM+ gradually lapsed into a SLC-off state. Newly existing gaps within the Landsat 7 ETM+ images struck a painful blow to the plethora of monitoring systems that had come to depend on its images. The majority of gap-fill techniques developed to overcome this setback are contingent upon using SLC-on primary imagery. Such techniques suffer for their unreliable rendering of various factors that change over time, and are therefore unsuitable for monitoring systems. Furthermore, errors occur when trying to correct an image wherein atmospheric conditions are observed such as clouds or the temporal algal bloom phenomenon.
This study seeks to circumvent these drawbacks through the GIF method, which provides a high level of accuracy in correcting pixels from a single image via the one-dimensional interpolation and filtering techniques. The GIF method accuracy was tested alongside the GLHM and LLHM techniques developed by NASA and the USGS, respectively. This study compared simulated SLC-off images with original images for the pixel distribution and calculated RMSE values between each image. An analysis of the coefficients of determination of two-dimensional correction results, in terms of the RMSE and pixel distribution, show the GIF accuracy is greatest in sea, land, and cloud formation images. In particular, sea and cloud images, where there are no extreme changes in topography, the RMSE values lie close to or less than 1 over all bands. The coefficient of determination for cloud images held a value close to 1

Method GLhM LLhM GIF
Processing Time (sec) 1.16 1065.57 9.86 Table 3. Processing time of each gap-filled method. over all bands.For sea images, the GIF method returned, on average, a coefficient of determination improved by 0.05 as compared to that of the LLHM. In land images as well, pixel distribution density tended to be high over all bands. The GIF correction method developed through this study does not make use of imagery acquired at previous times or by other sensors. It makes quick and easy work of individual SLC-off image correction. Its processing speed is 108 times faster than that of the LLHM and it is more accurate. The GIF method shows a slight decrease in accuracy, but is still reliable in topography where the research area is far from the image's bottom with a large gap, or where the terrain changes drastically, such as in urban centers, compared to images taken of consistent topography, such as sea or mountain ranges, or vast agricultural land. We expect the GIF method to bring great results for natural topography monitoring, and predict that it will increase Landsat 7 ETM+ SLC-off imagery usage, and serve as a tremendous resource for monitoring systems that rely on Landsat imagery.