Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC

Seel, Katrine; Gros, Sebastien Nicolas; Gravdahl, Jan Tommy

dc.contributor.author	Seel, Katrine
dc.contributor.author	Gros, Sebastien Nicolas
dc.contributor.author	Gravdahl, Jan Tommy
dc.date.accessioned	2024-06-06T11:31:11Z
dc.date.available	2024-06-06T11:31:11Z
dc.date.created	2024-01-24T16:16:28Z
dc.date.issued	2023
dc.identifier.citation	IEEE Conference on Decision and Control. Proceedings. 2023, 62 .	en_US
dc.identifier.issn	0743-1546
dc.identifier.uri	https://hdl.handle.net/11250/3132876
dc.description.abstract	This paper considers adjusting a fully parametrized model predictive control (MPC) scheme to approximate the optimal policy for a system as accurately as possible. By adopting MPC as a function approximator in reinforcement learning (RL), the MPC parameters can be adjusted using Q-learning or policy gradient methods. However, each method has its own specific shortcomings when used alone. Indeed, Q-learning does not exploit information about the policy gradient and therefore may fail to capture the optimal policy, while policy gradient methods miss any cost function corrections not affecting the policy directly. The former is a general problem, whereas the latter is an issue when dealing with economic problems specifically. Moreover, it is notoriously difficult to perform second-order steps in the context of policy gradient methods while it is straightforward in the context of Q-learning. This calls for an organic combination of these learning algorithms, in order to fully exploit the MPC parameterization as well as speed up convergence in learning.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.title	Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC	en_US
dc.title.alternative	Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© Copyright 2023 IEEE - All rights reserved.	en_US
dc.source.pagenumber	8	en_US
dc.source.volume	62	en_US
dc.source.journal	IEEE Conference on Decision and Control. Proceedings	en_US
dc.identifier.doi	10.1109/CDC49753.2023.10383562
dc.identifier.cristin	2233972
dc.relation.project	Norges forskningsråd: 294544	en_US
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Combining_Q-learning_and_Deter ...
Størrelse:: 1.194Mb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for teknisk kybernetikk [3697]
Publikasjoner fra CRIStin - NTNU [37666]

Vis enkel innførsel