On Eigen-decompositions and Adaptivity of Markov Chains
MetadataShow full item record
Part I and Part III-V concern Markov chain Monte Carlo and in particular those algorithms related to the Metropolis-Hastings algorithm. In Part I-II we apply an eigendecomposition of the transition probability matrix. The eigen-decomposition is well known for reversible Markov kernels, but seem to be less common in the analysis of non-reversible kernels. In Part I we use this decomposition to visualise and give a geometric interpretation of the path to stationarity on the space of probability distributions. Moreover, it is a tool in the analysis of the coefficient of ergodicity under the χ2-norm. The result is that we can state conditions for when a non-reversible combination of reversible Markov kernels outperforms any of the reversible kernels. A dramatic acceleration may happen when the left eigenvectors corresponding to the largest eigenvalues of the two kernels, point in different directions under the χ2-norm. We also obtain sufficient and necessary conditions for reversibility in terms of the eigen-decomposition. Finally, several secondorder properties of both reversible and non-reversible Markov chains and processes are presented and compared. This includes the asymptotic variance of a time average, the correlation function and the spectral density. The spectral density f(ω) is the main theme of Part II. For processes in nature, the spectral density is a useful tool in the investigation of long-term dependencies. The omnipresence of natural processes with estimates of f(ω) being proportional to 1/ωα for low frequencies, has puzzled scientists the last half century. And it is one of the oldest problems in contemporary physics still lacking a generally accepted explanation that is able to embrace the whole variety of instances. The phenomenon has been observed in a large number of fields including astrophysics, electrical engineering, geophysics, hydrology, economics, medicine and psychology, with α usually being in the interval (0:5; 1:5). Specific examples are sunspot activity, music and speech, human cognition, various electrical systems, neuronal spike trains, flow of rivers, stock exchange price indices and freeway traffic (Davidsen and Schuster, 2000, 2002; Bak et al., 1988; Press, 1978). There are various mathematical models that can explain particular instances of this phenomenon. The probably most common models are obtained when a number of certain Markov chains called relaxation processes are aggregated in a specific way (see e.g. Granger (1980)). We note that the same kind of spectral density also appears for just one single general Markov chain on a finite state space. In terms of the eigen-decomposition, we can thereby state precise conditions for the 1/ωα behaviour of Markov chains. This is done in a way that includes values of α > 1, which often implies non-stationarity for other models. Given the eigenvalue function, there is a variety of ways to assign values to the states such that the 1/ωα condition is satisfied. While aggregations of relaxation processes are a quite limited class of models that can show 1/ωα behaviour, we can consequently extend the class by adding single Markov chains and aggregations of those. We conclude by constructing several examples exhibiting this behaviour in a specified range of frequencies. In Part III-IV we focus on recent developments in the field of adaptive Markov chains. These chains are based on Markov chains, but can to a certain extent learn from its history, and are therefore non-Markovian. As in Part II, the adaptive Markov chains are combinations of different kernels and the coefficient of ergodicity is an important ingredient in both cases. An optimal tuning of the proposal distribution q in the Metropolis-Hastings algorithm usually depends on various characteristics of the target π. These are, however, often unknown until π is explored. Previously q has been tuned by tedious preliminary runs. If q can be tuned during the exploration of π, the idea behind adaptive MCMC is that the exploration of π will go faster. A review of recent results regarding adaptive MCMC appears in Part IV. We explain and compare the different approaches and categorise most of the approaches into either diminishing or regenerative adaptation. In diminishing adaptation the extent of adaptation has to diminish at a certain rate, while in regenerative adaptation, adaptation can be performed after each regeneration, but these regenerations tend to happen seldom in high dimensions. Having this option of adaptation, it should be exploited to the full. The common strategy has been to choose one of the many proposal distributions in the already existing MCMC-toolbox and find ways to adapt these. In Part III we develop a particular algorithm which we believe might be able to exploit the ability of adaptation even better. The non-adaptive Metropolised Hit-and-Run algorithm is superior to a corresponding Metropolised independence sampler in terms of convergence to stationarity. Preliminary investigations also indicate that the performance of the algorithm is more robust to deviations from optimal tuning of the proposal distribution than both the Metropolised independence sampler and the Metropolis algorithm. This is an important characteristic for an adaptive algorithm. In addition the algorithm has an inherent ability to search in various directions for unexplored modes of the state space. Finally we show that if an adaptive Metropolised independence sampler can be proved to converge under the conditions of Gåsemyr (2003), the same is the case for the corresponding adaptive Metropolised Hit-and-Run. In Part V we develop a non-adaptive Metropolis Hastings-algorithm designed for a certain class of Bayesian models. These are known as unimodal hidden Gaussian Markov random fields with likelihoods that consist of mutually independent data. Such models have previously mainly been handled by schemes where each variable or block of variables has been updated one by one, which often result in very slow convergence to the stationary distribution. We propose to update all variables at once in a Metropolised independence sampler. This approach does, however, require a good approximation to the posterior distribution of the Bayesian model. The main contribution of Part V is the development of a class of non-Gaussian approximations that can be constructed for a wide range of likelihood models. They have the appealing properties that exact samples can be drawn from them, the normalisation constant is computable, and the computational complexity is only O(n2) in the spatial case with n variables. The non- Gaussian approximations are refined versions of a Gaussian approximation. They are applied to models in spatial disease mapping and model-based geostatistics models with different likelihoods. The algorithm is a major improvement compared to the single-site schemes.