Investigating the Consistency and Convexity of Restricted Boltzmann Machine Learning
Abstract
In this thesis we asses the consistency and convexity of the parameter inference in Boltzmann machine learning algorithms based on gradient ascent on the likelihood surface. We do this by rst developing standard tools for generating equillibrium data drawn from a Boltzmann distribution, as well as analytically exact algorithms for inferring the parameters of restricted and semi-restricted Boltzmann machine architctures.
After testing, and showing, the functionality of our algorithms, we assess how dierent network properties eect the inferrence quality of restricted Boltzmann machines. Subsequently, we look closer at the likelihood function itself, in an attempt to uncover more rigid details about its curvature, and the nature of its convexity.
As we present results of our investigation, we discuss the ndings, before suggesting possible future directions to take, improvements to make and aspects to further investigate.
We conclude that the standard, analytically exact restricted Boltzmann machine algorithm is convex up to certain permutations of the parameters, when initialized within reasonable ranges of parameter values, and given that the strength of connectivity in the underlying model is within a specied range. Additionaly, for strengths of connectivity, the distribution of Hessian eigenvalues of the likelihood function, as a funtion of the distance to a peak, may be stable both within and across network sizes.