Solving Adaptive Optimal Control Problems with Dynamic Programming

Fuglstad, Hilde

Fuglstad, Hilde

Master thesis

Åpne

16539_FULLTEXT.pdf (3.873Mb)

16539_COVER.pdf (1.556Mb)

Permanent lenke

http://hdl.handle.net/11250/2450460

Utgivelsesdato

2017

Metadata

Vis full innførsel

Samlinger

Institutt for teknisk kybernetikk [3742]

Sammendrag

In optimal control of uncertain systems, lack of crucial information about the system can lead to unacceptable performance like the violation of constraints. In these, or similar situations where it is important to reduce uncertainty quickly, excitation can be used for learning purposes. The optimal balance between learning and control is achieved with dual control. This concept was introduced over seventy years ago and is still relevant. It has been shown that dynamic programming (DP) can be used to solve these problems, along with a number of approximate methods. Analytical solution of the problems are in most cases impossible and it is therefore necessary to solve them numerically.

The purpose of this thesis is to provide an overview of adaptive optimal control problems (AOCP) and the use of DP for solving them. The method is explored through several illustrating examples and the dual control algorithm is evaluated through computer simulations. The main examples considered are a simple integrator problem with unknown gains, and a minimum-time problem with an unknown breaking coefficient. The unknown parameters and noise in the systems are modelled as stochastic variables with known statistical distributions that are utilized by the dual controller.

It is shown how the different AOCP can be formulated, and the DP algorithms can be implemented. Different noise model assumptions are evaluated to see how this can affect the problem. Numerical experiments assess the capabilities of typical hardware configurations and parallelization options explore the possibility of reduced runtime. Results from simulations certainly demonstrate how the dual controller manage to both control the process and learn about it simultaneously. The controller is also compared to a certainty equivalent (CE) and cautious controller to further emphasize the advantages it has to these heuristic, adaptive controllers. Despite the well-known problems related to the curse of dimensionality, it is shown that it is possible to solve the given AOCP using DP with a desired accuracy, within reasonable time.

Utgiver

NTNU