Predicting soil properties in the Canadian boreal forest with limited data: Comparison of spatial and non-spatial statistical approaches
Journal article, Peer reviewed
MetadataShow full item record
Original versionGeoderma. 2017, 306 195-205. 10.1016/j.geoderma.2017.06.016
Digital soil mapping (DSM) involves the use of georeferenced information and statistical models to map predictions and uncertainties related to soil properties. Many remote regions of the globe, such as boreal forest ecosystems, are characterized by low sampling efforts and limited availability of field soil data. Although DSM is an expanding topic in soil science, little guidance currently exists to select the appropriate combination of statistical methods and model formulation in the context of limited data availability. Using the Canadian managed forest as a case study, the main objective of this study was to investigate to which extent the choice of statistical method and model specification could improve the spatial prediction of soil properties with limited data. More specifically, we compared the cross-product performance of eight statistical approaches (linear, additive and geostatistical models, and four machine-learning techniques) and three model formulations (“covariates only”: a suite of environmental covariates only; “spatial only”: a function of geographic coordinates only; and “covariates + spatial”: a combination of both covariates and spatial functions) to predict five key forest soil properties in the organic layer (thickness and C:N ratio) and in the top 15 cm of the mineral horizon (carbon concentration, percentage of sand, and bulk density). Our results show that 1) although strong differences in predictive performance occurred across all statistical approaches and model formulations, spatially explicit models consistently had higher R2 and lower RMSE values than non-spatial models for all soil properties, except for the C:N ratio; 2) Bayesian geostatistical models were among the best methods, followed by ordinary kriging and machine-learning methods; and 3) comparative analyses made it possible to identify the more performant models and statistical methods to predict specific soil properties. We make modeling tools and code available (e.g., Bayesian geostastical models) that increase DSM capabilities and support existing efforts toward the production of improved digital soil products with limited data.