A Painless Deterministic Policy Gradient Method for Learning-based MPC
Chapter
Published version
Permanent lenke
https://hdl.handle.net/11250/3110304Utgivelsesdato
2023Metadata
Vis full innførselSamlinger
Sammendrag
The combination of Reinforcement Learning (RL) and Model Predictive Control (MPC) has gained a lot of interest in the recent literature as a way of computing the optimal policies from MPC schemes based on inaccurate models. In that context, the Deterministic Policy Gradient (DPG) methods are often observed to be the most reliable class of RL methods to improve the MPC closed-loop performance. The DPG methods are fairly easy to formulate when used with compatible function approximation as an advantage function. However, this formulation requires an additional value function approximation, often carried out using Deep Neural Networks (DNNs). In this paper, we propose to estimate the required value function approximation as a first-order expansion of the value function estimate from the MPC scheme providing the policy. The proposed approach drastically simplifies the use of DPG methods for learning-based MPC as no additional structure for approximating the value function needs to be constructed. We illustrate the proposed approach with two numerical examples of varying complexity.