DANS - Data Archiving and Networked Services


Can we ask you a few questions about EASY? More information.

 "DEPOSIT YOUR DATA" doesn't currently work with Internet Explorer. Please use another browser instead. We're working to resolve this issue as soon as possible. Apologies for the inconvenience.

Search datasets

EASY offers sustainable archiving of research data and access to thousands of datasets.

Close Search help

Model overfitting using hyperspectral remote sensing

Cite as:

Duarte Rocha, A. (Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente) (): Model overfitting using hyperspectral remote sensing. DANS. https://doi.org/10.17026/dans-2c7-vdat

2017-12-31 Duarte Rocha, A. (Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente) 10.17026/dans-2c7-vdat

The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially
generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that
are only valid for the data used, and too complex to make inferences about the underlying process.
Published by: International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS), 2017.

is part of