Vis enkel innførsel

dc.contributor.advisorGros, Sebastien
dc.contributor.advisorLekkas, Anastasios
dc.contributor.authorKordabad, Arash Bahari
dc.date.accessioned2023-04-12T10:51:13Z
dc.date.available2023-04-12T10:51:13Z
dc.date.issued2023
dc.identifier.isbn978-82-326-6983-7
dc.identifier.issn2703-8084
dc.identifier.urihttps://hdl.handle.net/11250/3062609
dc.description.abstractRecently, the core idea of using Model Predictive Control (MPC) as a function approximator for the Reinforcement Learning (RL) methods has been proposed and justified. More specifically, it has been shown that a parameterized MPC scheme with a possibly inaccurate model can capture the optimal value functions and policy of a given Markov Decision Process (MDP). The thesis investigates more on this idea and provides theorems supporting and developing this idea and answering some fundamental questions in the intersection of MDP, MPC, Moving Horizon Estimation (MHE), and RL based on the publications during the Ph.D. We implement MPC-based RL in engineering applications such as Autonomous Surface Vehicle (ASV), including path planning, obstacle avoidance, and docking, and some investigations in the smart grid context, including learning the optimal bang-bang policy and multi-agent batteries with power peak constraint. In the intersection of MDP and MPC, we provide a theory on the equivalence of optimality criteria for MPC and MDP. We show that an (undiscounted) MPC scheme can capture the optimal value and optimal policy of a (possibly discounted) MDP, even if an inaccurate model is used in the MPC scheme. This equivalence can be established using a proper selection of the stage cost and the terminal cost of an MPC scheme. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. Using the cost modification idea, we also eliminate the bias of the optimal steady state in the discounted setting. In the context of MDP and RL, we provide the Quasi-Newton technique with a novel approximated hessian of the performance function that yields a superlinear convergence in the learning using the policy gradient method. In addition, we characterize the stability of MDPs with discounted cost using Economic Model Predictive Control (EMPC) dissipativity theory in the measure space. In the context of EMPC, we propose the use of Q-learning to capture a valid storage function that satisfies the dissipation inequality and verify the dissipativity for both discounted and undiscounted settings. Robust Model Predictive Control (RMPC) is used for different purposes and forms. We address the bias issue in the MPC-based policy gradient method when a linear compatible advantage function approximator is used in the actor-critic. When hard constraints restrict the policy, the exploration may not be Centred or Isotropic (non-CI). As a result, the policy gradient estimation can be biased. We solve this issue using the RMPC approach accounting for the exploration based on the first-order Taylor approximation of the constraint-tightening. Moreover, we investigate using RL methods to adjust RMPC with ellipsoidal uncertainty set for stochastic nonlinear systems. Scenario-tree-based RMPC was implemented to handle potential failures of the ship thrusters and Q-learning was used to improve the closed-loop performance. Moreover, we provide a generic convex function approximator in the stage cost of the MPC scheme and also address the safe RL problem using the Distributionally Robust Model Predictive Control (DRMPC) scheme and chance constraints.en_US
dc.language.isoengen_US
dc.publisherNTNUen_US
dc.relation.ispartofseriesDoctoral theses at NTNU;2023:82
dc.relation.haspartPaper A: Bahari Kordabad, Arash; Nejatbakhsh Esfahani, Hossein; Lekkas, Anastasios M.; Gros, Sebastien. Reinforcement Learning based on Scenario-tree MPC for ASVs. I: Proc. 2021 American Control Conference. IEEE conference proceedings 2021 ISBN 978-1-7281-9704-3. s. 1985-1990 https://doi.org/10.23919/ACC50511.2021.9483100en_US
dc.relation.haspartPaper B: Bahari Kordabad, Arash; Cai, Wenqi; Gros, Sebastien. MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage. I: 2021 European Control Conference, ECC 2021. IEEE conference proceedings 2021 ISBN 978-9-4638-4236-5. s. 2573-2578 https://doi.org/10.23919/ECC54610.2021.9654852en_US
dc.relation.haspartPaper C: Bahari Kordabad, Arash; Cai, Wenqi; Gros, Sebastien. Multi-agent Battery Storage Management using MPC-based Reinforcement Learning. I: 2021 IEEE Conference on Control Technology and Applications (CCTA). IEEE conference proceedings 2021 ISBN 978-1-6654-3643-4. s. 57-62 https://doi.org/10.1109/CCTA48906.2021.9659202en_US
dc.relation.haspartPaper D: Bahari Kordabad, Arash; Nejatbakhsh Esfahani, Hossein; Gros, Sebastien. Bias Correction in Deterministic Policy Gradient Using Robust MPC. I: 2021 European Control Conference, ECC 2021. IEEE conference proceedings 2021 ISBN 978-9-4638-4236-5. s. 1086-1091 https://doi.org/10.23919/ECC54610.2021.9654962en_US
dc.relation.haspartPaper E: Bahari Kordabad, Arash; Nejatbakhsh Esfahani, Hossein; Cai, Wenqi; Gros, Sebastien. Quasi-Newton Iteration in Deterministic Policy Gradient. 2022 American Control Conference (ACC) https://doi.org/10.23919/ACC53348.2022.9867217en_US
dc.relation.haspartPaper F: Bahari Kordabad, Arash; Gros, Sebastien. Functional Stability of Discounted Markov Decision Processes Using Economic MPC Dissipativity Theory. 2022 European Control Conference (ECC); 2022-07-12 - 2022-07-15 https://doi.org/10.23919/ECC55457.2022.9838064en_US
dc.relation.haspartPaper G: Bahari Kordabad, Arash; Gros, Sebastien. Q-learning of the storage function in Economic Nonlinear Model Predictive Control. Engineering Applications of Artificial Intelligence 2022 ;Volum 116. November 2022, 105343 https://doi.org/10.1016/j.engappai.2022.105343 his is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).en_US
dc.relation.haspartPaper H: Bahari Kordabad, Arash; Zanon,Mario; Gros, Sebastien. Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control. arXiv preprint, https://doi.org/10.48550/arXiv.2210.04302en_US
dc.relation.haspartPaper I: Bahari Kordabad, Arash; Wisniewski, Rafael; Gros, Sebastien. Safe Reinforcement Learning Using Wasserstein Distributionally Robust MPC and Chance Constraint. IEEE Access 2022 ;Volum 10. s. 130058-130067 https://doi.org/ 10.1109/ACCESS.2022.3228922en_US
dc.relation.haspartPaper J: Bahari Kordabad, Arash; Reinhardt,Dirk; Anand, Akhil S. Reinforcement Learning for MPC: Fundamentals and Current Challengesen_US
dc.relation.haspartPaper K: Bahari Kordabad, Arash; Gros, Sebastien. Bias correction of discounted optimal steady state using cost modificationen_US
dc.titleTheoretical Properties of Learning-based Model Predictive Controlen_US
dc.typeDoctoral thesisen_US
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550::Technical cybernetics: 553en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel