Mapping CHM and LAI for Heterogeneous Forests Using Airborne Full-Waveform LiDAR Data

Canopy height model (CHM) and leave area index (LAI) are essential forest structure attributes that are estimated to understand the ecological states and processes occurring in forest ecosystems. Airborne light detection and ranging (LiDAR) systems have proven efficient in producing both CHM and LAI maps for heterogeneous forests at the regional scale. The unique advantage of airborne LiDAR over optical and radar sensors is its vegetation penetration capability. Although the LiDAR penetration capability decreases in dense, complex forests, full-waveform LiDAR systems are currently available to provide critical point observations under the forest canopy. This research developed and tested methods to map CHM and LAI in heterogeneous forests using airborne waveform LiDAR datasets acquired using two different LiDAR systems and flight altitudes. Since using waveform data significantly increases the laser penetration rate, the test results strongly recommend using waveform data for the estimation of both CHM and LAI. These experiments also revealed that the flight data collection altitude will not affect LAI estimation. Through the analysis of CHMs and LAI data variations derived from 4 different datasets, CHM estimation may be good to 0.8 m and LAI estimation may be as precise as 0.5.


InTRoDUCTIon
Quantitative characterization of vertical and horizontal forest structure at landscape scales plays an important role in multiple-use forest management.Among the forest structure attributes, canopy height model (CHM) and leave area index (LAI) are two of the most important parameters.The CHM and LAI parameters represent the forest vertical structure and composition respectively, and provide essential information for understanding the ecological states and processes in forest ecosystems (Lefsky et al. 2002).
The quantitative procedure of forest structure relies on field measurements and data processing.Although ground measurements are accurate and have been applied for decades, they are far from enough to map the structural parameters of a large area.Many remotely sensed images, from both aircraft and spacecraft, have been used successfully to retrieve forest structural parameters (Roberts et al. 2007).
Although remote sensing techniques are highly suitable for a wide range of observations (Lillesand et al. 2004;Li et al. 2010;Kumar et al. 2013), mapping forest structure has been challenging for early remote sensing systems with existing optical and radar sensors due to the limitations of sensing vertical information.Optical and radar systems can provide canopy surface height information based on multi-angle data, but observations for the forest canopy interior are frequently invalid.
Light detection and ranging (LiDAR) systems are active sensors capable of collecting three-dimensional spatial information.LiDAR is currently attracting much attention from the forestry community as a rapid and efficient tool for obtaining forest inventories (Nelson et al. 1988;Lefsky et al. 1999;Zimble et al. 2003;Solberg et al. 2006;Hyyppä et al. 2008;Hopkinson and Chasmer 2009).Ground-based (or terrestrial) LiDAR can be used to measure forest structure in detail, but these measurements are at local scales (Yao et al. 2011).Equipped with accurate position and attitude sensors, airborne or space borne LiDAR systems are able to collect forest structure measurements at regional or global scales.However, space borne LiDAR cannot provide detailed enough spatial resolution information due to the large incident laser beam footprint from space.Airborne LiDAR systems deliver spatial datasets in sub-meter level resolution that provide detail and accurate measurements of target areas.The unique advantage of airborne LiDAR over optical and radar sensors is its vegetation penetration capability (Baltsavias 1999).This penetration characteristic especially enables the collection of terrain elevation under a forest canopy, so that it improves the accuracy of the derived digital elevation model (DEM) (Wang 2012).Forest structure and biomass studies benefit greatly from airborne LiDAR systems (Nelson et al. 1988;Vargas et al. 2002;Popescu et al. 2011).Those factors allow airborne LiDAR systems to produce both useful CHM and LAI maps of heterogeneous forests at the regional scale, which may reveal forest age, local topographic condition, or local growing environment characteristics (Baldocchi et al. 2002).
The LiDAR penetration capability decreases in dense, complex forests (Solberg 2010).Consequently, the number of ground points maybe largely reduced in a LiDAR dataset and the derived DEM can be degraded.Recent advances in laser scanning technology have rendered commercial airborne full-waveform LiDAR systems, which are capable of recording the complete waveform of each returning signal (Wagner et al. 2006;Alexander et al. 2010).For the forest area dataset one may retrieve about 20% more ground points using a refined waveform data echo detector (Wang 2012), so that the DEM generation quality can be maintained.This is a significant merit of using waveform data.
This study estimated the CHM and LAI of a heterogeneous forest using airborne waveform LiDAR datasets.It is commonly expected that DEM and DSM (digital surface model) are regular products after LiDAR data are processed.The CHM derivation is then obtained directly by taking the difference between DSM and DEM.Solberg (2010) indicated that the canopy penetration and corresponding LAI could be derived from the laser penetration rate, i.e., from the fraction of echoes located on and below the canopy.Many researchers considered that the laser penetration variables are strongly related to the field-measured gap fraction and LAI.Alternative LiDAR penetration variables are derived in this study and compared for their suitability in mapping LAI.Three LiDAR datasets acquired with Leica ALS60 and Riegl Q680i systems at the same area and in the same season were tested.We also tested how much the LAI calculation can be improved using LiDAR waveform data (Ma et al. 2015;Nie et al. 2016).

Study Site and In-Situ Data Collection
The study site is located in Najenshan, an ecological reserve area in southern Taiwan (Fig. 1a).The elevation of this area ranges from 10 -460 m.This area remains a natural tropical heterogeneous rainforest, which is the only lowattitude primeval forest in Taiwan.Two test areas, A and B, as shown in Fig. 1a, were selected for data analysis.The forests in these two areas are named forest A and B respectively.Forest A is windward side with the average forest canopy height less than 10 m (Fig. 1b).Forest B is leeward side with many trees higher than 15 m (Fig. 1c).
An LAI 2000 Canopy Analyzer was utilized to obtain the in-situ LAI data.The LAI 2000 is designed to estimate LAI indirectly from radiation measurements above and below the canopy, based on the relationship between the leaf area and canopy transmittance (Stenberg et al. 1994).For instrument calibration two LAI 2000 measurements taken under the canopy and on an open-sky area should be operated simultaneously.A calibrated LAI can then be calculated by taking the ratio of brightness from the two places.The open sky area measurements were taken from a grass filed (Fig. 1b) in the test area.
In order to match the in-situ LAI measurements with LiDAR derived LAI variables, the ground survey positions should be determined.Although GPS positioning is an ideal technique for accurate positioning, GPS signals are mostly blocked when LAI measurements are taken under a canopy.We set two control points on the grass field with GPS and determined the measurement locations by traversing with a total station.Due to the limitation in performing a ground survey under the dense test area B forest, we collected measurements only in test area A. There were 10 LAI measurements acquired.

Airborne LiDAR Data
Three airborne LiDAR datasets were applied for the test.The first two datasets were acquired with Leica LAS60 at flight heights of 1000 and 2500 m, respectively.The third dataset was collected using Riegl LMS-Q680i at a flight height of 1000 m.Table 1 shows the scanning specifications for the three datasets.
Each dataset includes originally delivered point clouds and waveform data.The refined point clouds were also extracted from waveform data using the wavelet-based echo detector (Wang 2012).Since the Riegl system records waveform data directly without performing echo detection, the delivered point cloud data from Riegl software are actually derived from waveform data.We therefore treated the Riegl delivered data as waveform point cloud.In summary, four point-cloud datasets were involved in the test, which are listed as follows: Lei-Orig-L: The original point-cloud dataset delivered by Leica system of low-altitude scanning.Lei-Wave-L: The refined point-cloud dataset derived from Leica waveform data of low-altitude scanning.Lei-Wave-H: The refined point-cloud dataset derived from Leica waveform data of high-altitude scanning.Rie-Wave-L: The refined point-cloud dataset derived from Riegl waveform data of low-altitude scanning.
It is important to check some point-cloud dataset fac-tors including the total number of laser hits (TNLH), laser density (LD), total number of points (TNP), point density (PD), total number of ground points (TNGP), and ground point density (GPD).Tables 2 and 3 summarize the corresponding four point-cloud dataset factors for test areas A and B, respectively.To examine the benefits of using waveform data one can compare the numbers of Lei-Org-L and Lei-Wave-L datasets.Regarding test area A, TNP and TNGP are increased by about 16 and 8%, respectively.This means that when waveform data are available we can have 16% more under canopy points and 8% more ground points.These increasing rates are much higher for the test area B datasets.They are about 30 and 66%.This indicates that using waveform data is significantly important for dense forest areas in improving the penetration rate.Paying attention to the high-altitude scanning dataset, one may notice a phenomenon that the GPD factors are almost maintained the same as low-altitude scanning, even though the corresponding PD factors drop dramatically.This phenomenon may be due to the enlarged laser footprint of high-altitude scanning, which increases the possibility of vegetation penetration.
When the GPD factors are maintained we would expect that the high-altitude scanning datasets will be as good as lowaltitude scanning for CHM and LAI mapping.One would also notice that all of the GPD factors for test area B are much smaller than those for test area A. Increasing the penetration rate is, therefore, critical for a very dense forest.

CHM estimation
The products delivered by a regular airborne LiDAR project would include DEM and DSM.One can get the corresponding CHM by subtracting DEM from DSM (Kellner and Asner 2009).Figure 2 shows the CHM estimation concept.However, for a dense forest, the DEM quality may be degraded due to insufficient detected ground point distribution.Under this circumstance using waveform LiDAR will increase the number of ground points by about 20% (Fig. 3), so that the DEM quality can be improved (Wang 2012).The differences between DEMs generated from discrete-echo and waveform data could be up to a couple meters in some dense vegetation areas (Fig. 4).This strongly suggests that waveform data should be applied if high quality CHM estimation is required.for point classification.The outlier points are considered as abnormal measurements and should be removed.The ground points group will be used to generate DEM.The second step is laser penetration index (LPI) calculation.This calculation involves DEM and non-ground points data.The final step is tuning the relationship between the calculated LPIs and the corresponding LAI in-situ measurements.A linear regression model is usually applied for this tuning process.Once the regression equation is determined we can translate all LPIs into LAI parameters for all data-covered areas.Figure 5 shows the LAI estimation procedure using LiDAR point-cloud data.

Calculation of LPIs
A variety of functions have been proposed in the last decade for LiDAR LPI calculation.We applied five different LPI calculation functions.The first three are found in the literature (Solberg et al. 2006;Solberg 2008;Zhao and Popescu 2009;Hopkinson and Chasmer 2009), and the other two were developed in this study.
The first LPI function is proposed by Solberg et al. (2006).By counting the number of ground points, this function simply takes the ratio of the total numbers of ground points (N g ) and all points (N total ), expressed as: The second LPI function was also proposed by Solberg (2008) to take the advantage of LiDAR intensity information, and is formulated as the ratio of the summation of ground-point intensities (I g ) and all point intensities (I): The third function is a modified function of the first one.In order to avoid LPI under estimation, Zhao and Popescu (2009) suggested counting the total number of emitted laser pulses rather than counting all points.This function is expressed as the ratio the total numbers of ground points (N g ) and emitted laser pulses (N l ): In this function the single-return ground points are actually counted twice, i.e., in N g and N sg .This can be considered as a weighting effect on the single-return ground points, which emphasizes the forest canopy sparseness.When LiDAR waveform data are available continuous backscattered energy signals are recorded.Features derived from waveform data such as echo width and backscatter cross-section have been demonstrated useful in many applications; for example land cover classification and DEM generation.We propose a new LPI function that uses waveform data.Based on the study of Wagner et al. (2006), the laser backscatter cross-section can be defined as the backscattering characteristics of a target, so that one can obtain the illustrated area (A s ) of a return signal as: where X is the backscatter solid angle, t is the reflectivity, and v is the laser backscatter cross-section.The t and v values can be derived from waveform data after the radiometric calibration (Alexander et al. 2010).With illuminating Lambertian surfaces assumption the backscatter solid angle, X, can be substituted by r: According to Eq. ( 6) we can calculate the illustrated area of a ground point (A g ) and the illustrated area of a canopy point (A c ).Therefore, the fifth LPI function is formulated / / (7)

Regression Model
A linear regression model is applied to tune the relationship between the calculated LPIs and the corresponding LAI in-situ measurements.For an observation site i, the measured LAI i is a linear function of estimated LPI i , i.e.,

LAI L PI
The linear regression is applied to find the solution for coefficients a and b with the condition of best fit.After the regression the coefficient of determination, or called Rsquared (R 2 ), is calculated to evaluate how well the data fits the statistical model.The adjusted R 2 is applied to obtain an unbiased estimation of R 2 by taking into account the degree of freedom (Li et al. 2010).

Search Area Determination for LPI Calculation
A search area in the LiDAR point cloud should be assigned to constitute the involved points for LPI calculation.Since we are correlating the calculated LPI to the corresponding LAI ground measurements, it is reasonable to set the search area the same as the LAI 2000 receiving scope.
The LAI 2000 uses a fish-eye optical sensor which has 148° field-of-view (from zenith 0° to zenith 74° towards each horizontal direction), so that the search area can be a circle defined with a given radius.In a 3D view the search space is a cylinder (Fig. 6a).However, the actual LAI 2000 receiving scope is subject to the local canopy height and forest structure.A proper search radius is still unknown.A proper search radius is determined in this study by testing the correlation between the measured LAI values and calculated LPIs when setting different search radii from 1 -15 m (Fig. 6b). Figure 6c indicates the LAI-2000 measurement setup.Each LAI measurement is obtained by averaging the eight observations surrounding the center of a test site.
A test was conducted to determine the proper search radius for LPI calculation.By changing the search radius from 1 -15 m, the R 2 values are estimated by correlating the LAI measurements with all derived LPIs from each LiDAR dataset corresponding to the search radii.The diagrams in Figs.7a -d show the value changes in R 2 versus search radius for all possible LPI calculations with respect to the four LiDAR datasets.By inspecting the diagrams we concluded that the most proper search radius is 13 m, which is therefore used to estimate LAI for the datasets applied in this study.

CHM Results and Data Variations
All LiDAR datasets for test areas A and B were carefully processed to generate DEM and DSM, so the corresponding CHM was obtained by subtracting DEM from DSM.For each test area four CHMs were generated using Lei-Orig-L, Lei-Wave-L, Lei-Wave-H, and Rig-Wave-L datasets, respectively.We took the mean CHMs as the results.Figure 8 shows the averaged CHMs for the four derived CHMs for forests A and B, in which the canopy height variations are intuitively presented.One may easily notice that the canopy height of forest B is much higher than that of forest A. The image in each CHM also clearly exhibits the canopy structure.One can certainly derive the woody volume, e.g., the whole volume between the canopy and ground surfaces of a certain forest area.This may be an important factor for the estimation of above ground biomass and carbon stocks (Rich 1990).
We also plotted histograms for the two averaged CHMs to show the frequency distributions for tree height in these two test areas (Fig. 9).One can see that the tree heights of forest A are lower than 16 m and most trees are about 6 m.On the other hand, the tree heights of forest B are lower than 21 m and most are about 10 m.The area l averaged canopy height of forest A is 5.04 m, and the area l averaged canopy height of forest B is 9.79 m.This means that forest B is about two times thicker than forest A. It also indicates that the age of forest B is greater than forest A, or forest B has much better local topographic conditions or growing environment than forest A.
In order to show the variations in the derived CHMs, we also calculated the standard deviations (STD) for the CHMs. Figure 10   than forest A mainly due to the larger DEM discrepancies in test area B. The reason for this can also be interpreted as the laser penetration rate of test area B is lower than that of test area A.

Linear Regression
Linear regression is applied to tune the relationship between the derived LPIs from LiDAR data and the corresponding LAI in-situ measurements.For each dataset the regression is applied with 5 different LPI calculations except Lei-Orig-L, which does not have LPI 5 .The regression results are shown in Table 4, including the regression coefficients a and b as well as the coefficient of determination R 2 .The coefficient of determination indicates the fitness between the derived LPIs and the corresponding LAI in-situ measurements.
First of all, we examined the improvement in regression when waveform data are applied.We compared the Lei-Orig-L and Lei-Wave-L dataset performance.These two datasets were actually collected with Leica LAS60 in the same flight mission.Lei-Orig-L is the original point-cloud dataset delivered by the Leica system, while Lei-Wave-L is the refined point-cloud dataset derived from waveform data.Table 2 shows that Lei-Wave-L has about 8% more ground points.
The GPD, therefore, increases from 1.4 -1.5 (pt m -2 ).This accordingly improves the DEM generation quality, which is a significant merit of using waveform data.Comparing the R-squared coefficients resulting from Lei-Orig-L and Lei-Wave-L, the advantage of using waveform data is obvious for all LPI functions.This test strongly suggests using Li-DAR waveform data for LAI mapping.
Second, we tested the flight height effect on data collection.The Lei-Wave-L and Lei-Wave-H datasets were intentionally collected for this test.Comparing the point number and density factors in Table 2, Lei-Wave-H has much lower TNLH, LD, TNP, PD, TNGP, and GPD than Lei-Wave-L.It is the nature of laser scanning that the higher the flight height, the lower the dataset point density.However, increasing the flight altitude increases the ground swath and coverage accordingly, which makes data collection more efficient.Comparing the R-squared coefficients resulting from Lei-Wave-L and Lei-Wave-H, the correlation can be maintained that data collection improved when the flight altitude reaches to 2500 m except for LPI 5 .This suggests that high flight altitude is acceptable for LAI mapping.We originally expected that using LPI 5 would improve the result, but it turned out to be a worse case.We do not have a theory to explain this outcome.Perhaps, the function applied to estimate laser beam illustrated area is not quite good for the high flight case.Third, the LAI mapping performance using different LiDAR systems was also examined.The Lei-Wave-L and Rie-Wave-L datasets were used for this test.Identical flight conditions were applied for these two datasets, which resulted in similar point number and density factors, as shown in Table 2.Although the overall Riegl LMS-Q680i performance was better than Leica ALS60 based on the R-squared coefficients comparison, we conclude that both instruments are qualified for LAI mapping.
The ultimate goal of this experiment was to choose the most appropriate LPI function for LAI mapping.By comparing the column average of R-squared coefficients in Table 4, one can find that LPI 4 outperforms the others.The R-squared coefficients of LPI 4 remain consistently higher than 0.55 in all cases, even although they may not be the best one for some datasets.We hereby use LPI 4 to derive the LAI maps for the two test areas.Although the regression coefficients a and b were determined using the LAI ground measurements acquired in test area A, it is assumed that they can be applied for test area B to map LAI as well.This assumption was made because both test areas were to the same type of heterogeneous forest area.The ground measurements obtained in test area B were needed to find the most suitable regression equation.However, it is not valid due to the positioning difficulty inside the forest.

LAI Results and Data Variations
LAI maps of forests A and B were estimated with each LiDAR dataset using the LPI 4 function and its corresponding regression coefficients.For each test area four LAI maps were generated using Lei-Orig-L, Lei-Wave-L, Lei-Wave-H, and Rig-Wave-L datasets, respectively.We took the mean LAIs as the results.Figure 11 shows the averaged LAI maps for forests A and B, in which the vegetation den-sity distributions are intuitively presented.One may easily notice that the vegetation density of forest B is much higher than that of forest A. Comparing Figs. 8 and 11 similar distribution patterns can be found in CHM and LAI maps.Both partially exhibit the forest structure and are important in estimating the forest biomass and carbon stocks (Ni-Meister et al. 2010).
We also plotted histograms for the two averaged LAIs to show the forest density frequency distributions (Fig. 12).One can see that the LAIs of forest A are lower than 6.3 and most are 5.8 except for the non-forested area.On the other hand the LAIs of forest B are lower than 6.8 and most are 6.6.The area l averaged LAI of forest A is 3.37 and the area l averaged LAI of forest B is 5.52.Corresponding to the estimated CHM values, this indicates again that forest B is much thicker than forest A. This difference may related to differences in forest age, local topographic condition or local growing environment (Baldocchi et al. 2002).
In order to show the variations in the derived LAIs, we also calculated the STD of the LAIs. Figure 13 shows the STD of the four derived LAIs for forests A and B. This figure presents the degree of consistency among the derived LAIs from different datasets.The average STDs for forests A and B are ±0.69 and ±0.51 respectively.

ConCLUSIonS
This paper demonstrates that contemporary commercial airborne LiDAR systems are efficient in producing both CHM and LAI maps for heterogeneous forest areas.This is especially true for those LiDAR systems that are capable of collecting waveform data to improve the vegetation penetration capability.We developed and tested some methods to map CHM and LAI for a heterogeneous forest using airborne waveform LiDAR datasets acquired using two different LiDAR systems and flying altitudes.It can be concluded that GDP is essentially the most important factor for CHM and LAI estimation.The higher GDP will lead to higher CHM and LAI estimation accuracy.We recommend that the collected data have GDP higher than 0.5 pt m -2 to obtain reliable CHM and LAI estimation.Since using waveform data significantly increases the laser penetration rate, the test results strongly recommend using waveform data for both CHM and LAI estimation.Our experiments also revealed that increasing the flight data collection altitude would not affect CHMs and LAIs estimation accuracy.
In order to find an appropriate LPI function, five Li-DAR LPIs were tested for LAI estimation.Linear regression was applied to tune the relationship between the derived LPIs from LiDAR data and the corresponding LAI in-situ measurements.The R-squared coefficients for LPI 4 remain consistently higher than 0.55 for all datasets.We therefore considered LPI 4 the best one.We expect that the regression coefficients a and b determined using LAI ground measurements acquired in test area A can be applied for other heterogeneous forest areas.
One may expect to know the quality of CHM and LAI estimation for a heterogeneous forest.Through the analysis of CHMs and LAIs data variations derived from 4 different datasets, the CHM estimation may be good to 0.8 m and LAI estimation may be as precise as 0.5.
Fig. 1.(a) Aerial image of the test area; (b) the enlarged image of test area A; (c) a ground view in test area A; (d) a ground view of test area B. (Color online only) Fig. 2. The CHM estimation concept: one can subtract (a) DEM from (b) DSM to get (c) CHM (Kellner and Asner 2009).(Color online only) suggested that one should only count the number of single-return ground points for LPI 1 calculation(Morsdorf et al. 2006;Solberg 2010).Instead of taking only the number of single-return ground points (N sg ) into account, we suggest adding this number into the first function as a modification.Therefore, we propose the fourth function as:

Fig. 4 .
Fig. 4. Differences between DEMs generated from discrete-echo and waveform data.(Color online only) Fig. 5. Procedure of LAI estimation using LiDAR point-cloud data.

Fig. 6 .Fig. 7 .
Fig. 6.Diagram of scope searching strategy for LPI calculation: (a) constituted points for LPI calculation; (b) searching radius changing from 1 -15 m; (c) LAI measurement set up at a test site.(Color online only)

Fig. 9 .
Fig. 9. Histograms of averaged CHMs for forests A and B.

Fig. 10 .
Fig. 10.Standard deviations of the four derived CHMs for forests A and B.

Fig. 11 .
Fig. 11.Averaged LAIs of the four derived LAIs for forests A and B. (Color online only)

Table 1 .
Three major steps are involved in LAI estimation using LiDAR point-cloud data.The first step is point classification filtering.The point cloud should be categorized into three groups: ground, non-ground, and outlier (air) points.The commercial software, TerraSolid, is used in this study Scanning specifications of the applied datasets.

Table 2 .
Characteristics of the point-cloud datasets of test area A.

Table 3 .
Characteristics of the point-cloud datasets of test area B.

Table 4 .
Regression results of all datasets.