Vis enkel innførsel

dc.contributor.advisorGros, Sebastien
dc.contributor.advisorLekkas, Anastasios
dc.contributor.authorEsfahani, Hossein Nejatbakhsh
dc.date.accessioned2024-06-26T08:10:07Z
dc.date.available2024-06-26T08:10:07Z
dc.date.issued2024
dc.identifier.isbn978-82-326-8129-7
dc.identifier.issn2703-8084
dc.identifier.urihttps://hdl.handle.net/11250/3135857
dc.description.abstractA new Reinforcement Learning (RL) algorithm based on Model Predictive Control (MPC) has been recently proposed in which the optimal state (-action) value function and the optimal policy can be captured by a parameterized MPC scheme even if the system model underlying the MPC scheme cannot capture the real system perfectly. However, the main idea above was investigated upon the Markov Decision Process (MDP), where a full observation of the states of the real system is needed. Moreover, the idea of using the MPC-based RL can be investigated for other types of MPC schemes such as robust MPC and Linear Parameter Varying- MPC (LPV-MPC). To investigate the above mentioned ideas and develop new frameworks in the context of MPC-based reinforcement learning, in the first part of this thesis, we investigate the use of the MPC-based RL framework in the context of Partially Observable Markov Decision Process (POMDP). We next show that the core idea of modifying the MPC scheme by RL can also be used for modifying a Moving Horizon Estimation (MHE) scheme so that the MHE performance is improved even if the system model underlying the MHE scheme is imperfect. Moreover, we propose an MHE/MPC-based RL in the context of LPV systems. In the second part of the thesis, we investigate the use of the MPC-based RL for an approximate Robust Nonlinear MPC (RNMPC). We then use a second-order Q-learning algorithm to adjust a set of parameters attached to this approximate RNMPC scheme aiming to achieve the best closed-loop performance. In the context of POMDP, we propose an observer-based framework for solving POMDPs, where the real system is partially observable. We first propose to use a Moving Horizon Estimation-Model Predictive Control (MHE-MPC) scheme in order to provide a policy for the POMDP problem, where the states of the real system are not fully measurable and necessarily known. We propose to parameterize both the MPC and MHE formulations, where certain adjustable parameters are regarded for tuning the policy. In this work, for the sake of tackling the unmodeled and partially observable dynamics, we leverage the RL to tune the parameters of MPC and MHE schemes jointly, with the closed-loop performance of the policy as a goal rather than model fitting or the MHE performance. To deal with the model-based state estimation problems with imperfect models, we next present a reinforcement learning-based observer/controller using MHE and MPC schemes, where the model used in the MHE-MPC scheme cannot accurately capture the dynamics of the real system. We show how an MHE cost modification can improve the performance of the MHE scheme such that a true state estimation is delivered even if the underlying MHE model is imperfect. A compatible Deterministic Policy Gradient (DPG) algorithm is then proposed to directly tune the parameters of both the estimator (MHE) and controller (MPC) aiming to achieve the best closed-loop performance. The LPV models use a linear structure to capture time-varying and nonlinear dynamics of complex systems. These models then facilitate the formulation of computationally efficient design algorithms for observers and controllers synthesis of nonlinear systems. In the LPV framework, we propose an MHE/MPC-based RL method for the polytopic LPV systems with inexact scheduling parameters (as exogenous signals with inexact bounds), where the Linear Time Invariant (LTI) models (vertices) captured by combinations of the scheduling parameters becomes wrong. We first propose to adopt an MHE scheme to simultaneously estimate the convex combination vector and unmeasured states based on the observations and model matching error. To tackle the wrong LTI models used in both the MPC and MHE schemes, we then exploit a Policy Gradient (PG) to learn both the estimator (MHE) and controller (MPC) so that the best closed-loop performance is achieved. In the context of robust MPC, we present an RL-based Robust Nonlinear Model Predictive Control (RL-RNMPC) framework for controlling nonlinear dynamical systems in the presence of disturbances and uncertainties. An approximate RNMPC of low computational complexity is used in which the state trajectory uncertainty is modelled via ellipsoids. Reinforcement Learning is then used in order to handle the ellipsoidal approximation and improve the closed-loop performance of the scheme by adjusting the MPC parameters generating the ellipsoids.en_US
dc.language.isoengen_US
dc.publisherNTNUen_US
dc.relation.ispartofseriesDoctoral theses at NTNU;2024:269
dc.relation.haspartNejatbakhsh Esfahani, Hossein; Bahari Kordabad, Arash; Gros, Sebastien. Reinforcement Learning based on MPC/MHE for Unmodeled and Partially Observable Dynamics. I: Proc. 2021 American Control Conference. IEEE conference proceedings 2021. s. 2121-2126 https://doi.org/10.23919/ACC50511.2021.9483399en_US
dc.relation.haspartNejatbakhsh Esfahani, Hossein; Bahari Kordabad, Arash; Cai, Wenqi; Gros, Sebastien Nicolas. Learning-based state estimation and control using MHE and MPC schemes with imperfect models. European Journal of Control 2023 ;Volum 73. https://doi.org/10.1016/j.ejcon.2023.100880 This is an open access article under the CC BY licenseen_US
dc.relation.haspartNejatbakhsh Esfahani, Hossein; Gros, Sebastien Nicolas. Policy Gradient Reinforcement Learning for Uncertain Polytopic LPV Systems based on MHE-MPC. IFAC-PapersOnLine 2022 ;Volum 55.(15) s. 1-6 https://doi.org/10.1016/j.ifacol.2022.07.599 This is an open access article under the CC BY-NC-ND licenseen_US
dc.relation.haspartNejatbakhsh Esfahani, Hossein; Bahari Kordabad, Arash; Gros, Sebastien. Approximate Robust NMPC using Reinforcement Learning. I: 2021 European Control Conference, ECC 2021. IEEE conference proceedings https://doi.org/10.23919/ECC54610.2021.9655129en_US
dc.titleReinforcement Learning-based Control and State Estimation using Model Predictive Control and Moving Horizon Estimationen_US
dc.typeDoctoral thesisen_US
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550::Technical cybernetics: 553en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel