Changepoint model selection in Gaussian data by maximization of approximate Bayes Factors with the Pruned Exact Linear Time algorithm

Detection of multiple changepoints in a data set is treated as a model selection problem where the model complexity is dependent on the number of changepoints. The Bayes Factor is a practical model selection tool of which the Bayesian Information Criterion (BIC) is a popular approximation.

The BIC is twice the maximum log likelihood of the data under the model minus a penalty for number of changepoints, and is to be maximized. We develop the log likelihood for both univariate and multivariate Gaussian data.

Although the changepoint model is an irregular statistical model, BIC is asymptotically consistent when the data are univariate and independently Gaussian distributed with a known variance. For Gaussian data also two versions of the modified BIC (mBIC) are asymptotically consistent approximations of the Bayes Factor.

As the penalty for model complexity is often treated as a tuning parameter in applications, we propose a range for it when the data are independently Gaussian distributed and the approximate value of the variance is known.

For data that are univariate Gaussian distributed with known variances the mBIC involves an additional penalty on the relative positions of the changepoints which is small when the changepoints are evenly distributed in the data and large when they are clustered together.

Although in the mBIC criterion the penalty on the relative positions of the changepoints are set by maximization of the likelihood term, we instead let them be set by maximization of the sum of the likelihood and the penalty terms.

Thus we get a criterion that we can maximize with the algorithm Pruned Exact Linear Time (PELT), which runs on $O(n)$ time under certain conditions. In the thesis we also suggest a modification to the algorithm Changepoint Detection for a Range of Penalties (CROPS) that lets us maximize the original mBIC using PELT.

In simulations we see that PELT performs better than the popular changepoint detection algorithm Binary Segmentation (BinSeg) when both are applied to maximize BIC. Although BIC is usually a strict criterion in the sense that it prefers a parsimonious model,

on simulations where the variance is known it is outperformed by mBIC which has a higher penalty on model complexity than BIC for most data.

For the case where the data are univariate Gaussian but the variance is not known, we do not find a simple criterion to maximize. Rather we propose an ad hoc criterion similar to both BIC as applied to these changepoint data, and to mBIC for Gaussian data with known variance. When $p$ parameters are estimated in the likelihood, changepoints need to be separated by at least $p-1$ points. We generalize PELT to account for this, and use Directed Asyclic Graphs to illustrate the inner workings of OP, PELT and our generalized PELT.

Publisher

NTNU