Vis enkel innførsel

dc.contributor.authorSeel, Katrine
dc.contributor.authorGros, Sebastien Nicolas
dc.contributor.authorGravdahl, Jan Tommy
dc.date.accessioned2024-06-06T11:31:11Z
dc.date.available2024-06-06T11:31:11Z
dc.date.created2024-01-24T16:16:28Z
dc.date.issued2023
dc.identifier.citationIEEE Conference on Decision and Control. Proceedings. 2023, 62 .en_US
dc.identifier.issn0743-1546
dc.identifier.urihttps://hdl.handle.net/11250/3132876
dc.description.abstractThis paper considers adjusting a fully parametrized model predictive control (MPC) scheme to approximate the optimal policy for a system as accurately as possible. By adopting MPC as a function approximator in reinforcement learning (RL), the MPC parameters can be adjusted using Q-learning or policy gradient methods. However, each method has its own specific shortcomings when used alone. Indeed, Q-learning does not exploit information about the policy gradient and therefore may fail to capture the optimal policy, while policy gradient methods miss any cost function corrections not affecting the policy directly. The former is a general problem, whereas the latter is an issue when dealing with economic problems specifically. Moreover, it is notoriously difficult to perform second-order steps in the context of policy gradient methods while it is straightforward in the context of Q-learning. This calls for an organic combination of these learning algorithms, in order to fully exploit the MPC parameterization as well as speed up convergence in learning.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.titleCombining Q-learning and Deterministic Policy Gradient for Learning-Based MPCen_US
dc.title.alternativeCombining Q-learning and Deterministic Policy Gradient for Learning-Based MPCen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.rights.holder© Copyright 2023 IEEE - All rights reserved.en_US
dc.source.pagenumber8en_US
dc.source.volume62en_US
dc.source.journalIEEE Conference on Decision and Control. Proceedingsen_US
dc.identifier.doi10.1109/CDC49753.2023.10383562
dc.identifier.cristin2233972
dc.relation.projectNorges forskningsråd: 294544en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel