Vis enkel innførsel

dc.contributor.authorSeel, Katrine
dc.contributor.authorBemporad, Alberto
dc.contributor.authorGros, Sebastien Nicolas
dc.contributor.authorGravdahl, Jan Tommy
dc.date.accessioned2023-11-29T08:37:31Z
dc.date.available2023-11-29T08:37:31Z
dc.date.created2023-06-06T14:08:10Z
dc.date.issued2023
dc.identifier.citationIEEE Access. 2023, 11 60724-60736.en_US
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/11250/3105167
dc.description.abstractThe combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to control systems that may be difficult to model. Using MPC as a function approximator in reinforcement learning (RL) is one approach to reduce the reliance on accurate models. RL is dependent on exploration to learn, and currently, simple heuristics based on random perturbations are most common. This paper considers variance-based exploration in RL geared towards using MPC as function approximator. We propose to use a non-probabilistic measure of uncertainty of the value function approximator in value-based RL methods. Uncertainty is measured by a variance estimate based on inverse distance weighting (IDW). The IDW framework is computationally cheap to evaluate and therefore well-suited in an online setting, using already sampled state transitions and rewards. The gradient of the variance estimate is then used to perturb the policy parameters in a direction where the variance of the value function estimate is increasing. The proposed method is verified on two simulation examples, considering both linear and nonlinear system dynamics, and compared to standard exploration methods using random perturbations.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleVariance-Based Exploration for Learning Model Predictive Controlen_US
dc.title.alternativeVariance-Based Exploration for Learning Model Predictive Controlen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.pagenumber60724-60736en_US
dc.source.volume11en_US
dc.source.journalIEEE Accessen_US
dc.identifier.doi10.1109/ACCESS.2023.3282842
dc.identifier.cristin2152302
dc.relation.projectNorges forskningsråd: 294544en_US
dc.relation.projectNorges forskningsråd: 300172en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal