Performance metrics for multi-step forecasting measuring win-loss, seasonal variance and forecast stability: an empirical study
Journal article, Peer reviewed
Published version
Date
2024Metadata
Show full item recordCollections
Abstract
This paper addresses the evaluation of multi-step point forecasting models. Currently, deep learning models for multi-step forecasting are evaluated on datasets by selecting one or several error metrics and aggregating errors across the time series and the forecast horizon. This approach hides insights that would otherwise be useful for practitioners and industry when evaluating and selecting the best forecasting models. We propose four novel metrics to provide additional insights when evaluating models: 1) a win-loss metric that shows how models perform across time series in the dataset, 2) a variance weighted metric that accounts for differences in variance across the seasonal period, 3) a delta horizon metric measuring how much models update their estimates over the forecast horizon, 4) decomposed errors that relate the forecasting error to trend, seasonality, and noise. To show the applicability of the proposed metrics, we implement four recent deep learning architectures and conduct experiments on five benchmark datasets. Our results show how the current approach of aggregating metrics neglects valuable information and we show the importance of considering seasonality and errors on individual time series. Lastly, we highlight several use cases for the proposed metrics and discuss the applicability in light of the empirical results. Performance metrics for multi-step forecasting measuring win-loss, seasonal variance and forecast stability: an empirical study