dc.contributor.author | Seel, Katrine | |
dc.contributor.author | Gros, Sebastien Nicolas | |
dc.contributor.author | Gravdahl, Jan Tommy | |
dc.date.accessioned | 2024-06-06T11:31:11Z | |
dc.date.available | 2024-06-06T11:31:11Z | |
dc.date.created | 2024-01-24T16:16:28Z | |
dc.date.issued | 2023 | |
dc.identifier.citation | IEEE Conference on Decision and Control. Proceedings. 2023, 62 . | en_US |
dc.identifier.issn | 0743-1546 | |
dc.identifier.uri | https://hdl.handle.net/11250/3132876 | |
dc.description.abstract | This paper considers adjusting a fully parametrized model predictive control (MPC) scheme to approximate the optimal policy for a system as accurately as possible. By adopting MPC as a function approximator in reinforcement learning (RL), the MPC parameters can be adjusted using Q-learning or policy gradient methods. However, each method has its own specific shortcomings when used alone. Indeed, Q-learning does not exploit information about the policy gradient and therefore may fail to capture the optimal policy, while policy gradient methods miss any cost function corrections not affecting the policy directly. The former is a general problem, whereas the latter is an issue when dealing with economic problems specifically. Moreover, it is notoriously difficult to perform second-order steps in the context of policy gradient methods while it is straightforward in the context of Q-learning. This calls for an organic combination of these learning algorithms, in order to fully exploit the MPC parameterization as well as speed up convergence in learning. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | IEEE | en_US |
dc.title | Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC | en_US |
dc.title.alternative | Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC | en_US |
dc.type | Peer reviewed | en_US |
dc.type | Journal article | en_US |
dc.description.version | publishedVersion | en_US |
dc.rights.holder | © Copyright 2023 IEEE - All rights reserved. | en_US |
dc.source.pagenumber | 8 | en_US |
dc.source.volume | 62 | en_US |
dc.source.journal | IEEE Conference on Decision and Control. Proceedings | en_US |
dc.identifier.doi | 10.1109/CDC49753.2023.10383562 | |
dc.identifier.cristin | 2233972 | |
dc.relation.project | Norges forskningsråd: 294544 | en_US |
cristin.ispublished | true | |
cristin.fulltext | original | |
cristin.qualitycode | 1 | |