A Painless Deterministic Policy Gradient Method for Learning-based MPC

Sadanandan Anand, Akhil; Sawant, Shambhuraj Vijaysinh; Gros, Sebastien Nicolas; Gravdahl, Jan Tommy

Sadanandan Anand, Akhil; Sawant, Shambhuraj Vijaysinh; Gros, Sebastien Nicolas; Gravdahl, Jan Tommy

Chapter

Published version

Åpne

A_Painless_Deterministic_Policy_Gradient_Method_for_Learning-based_MPC.pdf (Låst)

Permanent lenke

https://hdl.handle.net/11250/3110304

Utgivelsesdato

2023

Metadata

Vis full innførsel

Samlinger

Institutt for teknisk kybernetikk [3740]
Publikasjoner fra CRIStin - NTNU [38070]

Sammendrag

The combination of Reinforcement Learning (RL) and Model Predictive Control (MPC) has gained a lot of interest in the recent literature as a way of computing the optimal policies from MPC schemes based on inaccurate models. In that context, the Deterministic Policy Gradient (DPG) methods are often observed to be the most reliable class of RL methods to improve the MPC closed-loop performance. The DPG methods are fairly easy to formulate when used with compatible function approximation as an advantage function. However, this formulation requires an additional value function approximation, often carried out using Deep Neural Networks (DNNs). In this paper, we propose to estimate the required value function approximation as a first-order expansion of the value function estimate from the MPC scheme providing the policy. The proposed approach drastically simplifies the use of DPG methods for learning-based MPC as no additional structure for approximating the value function needs to be constructed. We illustrate the proposed approach with two numerical examples of varying complexity.

Utgiver

IEEE