Learning for Model Predictive Control
Abstract
This thesis focuses on learning-based control, with an emphasis on control designs for which we can analyze stability and robustness properties. The topic is motivated by the lack of available controllers for complex, nonlinear dynamical systems that are hard to model, and that are also applicable to safety-critical applications.
Recent successes in the field of machine learning (ML), as well as the availability of increased sensing and computational capabilities, have led to a growing interest in data-driven control techniques. For systems that require systematic handling of constraints, MPC has established itself as the primary control method. The combination of ML and MPC has therefore become a popular field of research, as data can be exploited to improve controller performance, while tools for stability and robustness analysis are well-established.
The most intuitive combination of MPC and ML is using available data to improve the MPC prediction model. Supervised ML methods based on e.g. rich function approximators such as neural networks (NNs) and Gaussian processes (GPs) can be leveraged to learn parts of or entire dynamical models from data. In part I of this thesis, we propose two different MPC formulations that leverage ML to learn the dynamics. Using techniques from robust control, we provide stability guarantees under core assumptions on the approximation error.
There has also been an increasing interest in inferring the parameterization of the MPC controller, of not only the prediction model but also the cost and constraints, that lead to the best closed-loop performance. Reinforcement learning (RL) is a framework for developing self-optimizing controllers that adjust their behaviors based on observed outcomes of their actions. As the policies are usually modeled using NNs, the resulting closed-loop behavior is difficult to analyze. In Part II of this thesis, we consider RL as a tool to infer the optimal parameterization of an MPC scheme. Leveraging existing theory on stability analysis of MPC, we propose a cost parameterization and constrained RL parameter updates such that the nominal closed-loop stability of the learned MPC is ensured by design. We also consider different approaches for combining RL methods, as a way to speed up learning. Finally, we propose a new method for exploration during learning.
On a more general level, this thesis has considered two conceptually different approaches to learning-based MPC. Whereas the combination of supervised ML and MPC can be exploited to design MPC schemes with highly accurate prediction models and exploit potentially already available data sets in order to learn them, the models are not learned to optimize the closed-loop performance directly. The combination of MPC and RL on the other hand, allows us to learn not only the prediction model but also the cost and constraints that directly optimizes the closed-loop performance.