Reinforcement Learning for Optimization of Nonlinear and Predictive Control
Doctoral thesis
Permanent lenke
https://hdl.handle.net/11250/2981909Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
Sammendrag
Autonomous systems extend upon human capabilities and can be equipped with superhuman attributes in terms of durability, strength, and perception to name a few, and can provide numerous benefits such as superior efficiency, accuracy and endurance, and the ability to explore dangerous environments. Delivering on this potential requires a control system that can skillfully operate the autonomous system to complete its objectives. A static control system must be carefully designed to handle any situation that might arise. This motivates the introduction of learning in the control system since a learning system can learn from its experiences to manage novel unexpected events and changes in its operating environment.
Traditional formal control techniques are typically designed offline assuming exact knowledge of the dynamics of the system to be controlled. These knowledgebased approaches have the important benefit that the stability properties of the control algorithm can be analyzed and certified, such that one can have confidence in the control system’s ability to safely operate the controlled system. However, linear control techniques applied to nonlinear systems (which all real systems are to some extent) lead to increasingly conservative and therefore suboptimal control performance the more nonlinear the controlled system is. Nonlinear control techniques often have considerable online computational complexity, which makes them infeasible for systems with fast dynamics and for embedded control applications where computational power and energy are limited resources.
Reinforcement learning is a framework for developing self-optimizing controllers, that learn to improve its operation through trial-and-error and adjusting its behaviour based on the observed outcomes of its actions. In general, reinforcement learning requires no knowledge about the dynamics of the controlled system, can learn to operate arbitrarily nonlinear systems, and its online operation can be designed to be highly computationally efficient. It is therefore a valuable tool for control systems where the dynamics are fast, nonlinear, or uncertain, and difficult to model. A central challenge of reinforcement learning control on the other hand is that its behaviour is complex and difficult to analyze, and it has no inherent support for specification of operating constraints.
An approach to remedy these challenges for reinforcement learning control is to combine its learning capabilities with an existing trusted control technique. In Part I of this thesis, we employ reinforcement learning for optimization of the model predictive control (MPC) scheme, a powerful yet complex control technique. We propose the novel idea of optimizing its meta-parameters, that is, parameters affecting the structure of the control problem the MPC solves as opposed to internal
parameters affecting the solution to a given problem. In particular, we optimize the meta-parameters of when to compute the MPC and with what prediction horizon, and show that by intelligently selecting the conditions under which it is computed, the control performance and computational complexity can be simultaneously improved. We subsequently present a framework in which these meta-parameters as well as any other internal parameter of the MPC can be jointly optimized with a configurable objective. Finally, Part I of the thesis also considers how an existing controller can be used to accelerate the learning process of a learning controller.
Control of unmanned aerial vehicles (UAVs) is precisely such an embedded application with limited computational- and energy-resources, and moreover where the dynamics are highly nonlinear and affected by significant disturbances such as turbulence. In Part II of this thesis, we propose the novel idea of employing deep reinforcement learning (DRL) for low-level control of fixed-wing UAVs, a UAV-design that exhibit superior range and payload capacity compared to the popular multirotor drone design. We present a method capable of learning flightworthy DRL controllers with as little as 3 minutes of interaction with the controlled system, and demonstrate through field experiments with the real UAV that the DRL controller is competitive with the state-of-the-art existing autopilot, generating smooth responses in the controlled states and in the control signals to the actuators.