• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for teknisk kybernetikk
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for teknisk kybernetikk
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Safe Reinforcement Learning for Autonomy in Resource-Constrained Environments

Larsen, Thomas Nakken
Doctoral thesis
Thumbnail
View/Open
Thomas Nakken Larsen.pdf (84.37Mb)
URI
https://hdl.handle.net/11250/3199395
Date
2025
Metadata
Show full item record
Collections
  • Institutt for teknisk kybernetikk [4104]
Abstract
Artificial Intelligence (AI) has rapidly permeated various sectors of society in recent years. From consumer electronics, education, finance, medicine, autonomous driving, and chatbots, AI is transforming the way we live, work, and interact with the world around us. The success of AI follows the recent increase in the availability and capacity of hardware accelerators capable of effectively processing artificial neural networks (ANNs). In the context of control, Deep Reinforcement Learning (DRL) methods leverage ANNs to enable intelligent control where classical methods fail and have attained superhuman performance in several domains as a result. These successes, however, mask several practical challenges.

DRL is a computationally expensive approach that is fundamentally based on trial and error and requires a significant amount of attempts to train autonomous agents. This inhibits applying DRL directly in real-world applications, both from a time perspective and, more importantly, a safety perspective. This limits most current approaches to training in simulation. However, given a fully trained DRL agent in simulation, the black-box nature of the ANN parameterization prevents the safety guarantees necessary to apply the agent in safety-critical applications. On the edge, assuming that the safety concerns can be addressed, the computational capacity of hardware restricts the model complexity of the onboard control system, particularly on small platforms like quadrotor drones.

This thesis presents topics related to increasing the safety and efficiency of DRL in such resource-constrained systems. It is split into an introduction, three main parts, and a conclusion. The introduction provides context for the publications, describes the considered applications, presents the research questions, and outlines the structure of this thesis.

The main parts incrementally introduce new topics and detail the context and preliminaries leading to the presented published works. The first part introduces DRL as a model-free control framework. It establishes Proximal Policy Optimization (PPO) as the best-performing and most versatile DRL algorithm for continuous control through a comparative analysis against other leading algorithms. The second part introduces safe control and presents two articles, each proposing a different approach for improving safety in DRL-based control. Through reward engineering, DRL agents demonstrate compliance with the International Regulations for Preventing Collisions at Sea (COLREG). Safety guarantees are established via a Predictive Safety Filter (PSF), combining model-based and model-free control. The third part moves into DRL for resource-constrained environments and is two-fold in its presentation. First, transfer learning through the latent representation of high-dimensional sensor data is proposed to reduce the computational complexity of DRL. Second, digital twins show great promise toward the training and validation of DRL agents prior to the deployment in their real-world counterparts, inherently minimizing the sim-to-real gap in cluttered indoor environments. Further, surrogate modeling and efficient representations enable the online utilization of digital twins on edge hardware. This enables safe and dynamic path planning and DRL-based control using otherwise immeasurable information in an urban environment.

Ultimately, this thesis explores the challenges and proposes advancements in enabling autonomy in resource-constrained environments using DRL. Through this series of research contributions, the work highlights the importance of balancing safety and efficiency to develop autonomous systems capable of operating in complex and dynamic scenarios. Each paper contributes to a layered understanding of how these challenges can be addressed within the DRL framework, culminating in a promising outlook for its position in future real-world autonomy.
Has parts
Paper 1: Larsen, Thomas Nakken; Teigen, Halvor Ødegård; Laache, Torkel; Varagnolo, Damiano; Rasheed, Adil. Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters. Frontiers in Robotics and AI 2021 ;Volum 8. s. - © 2021 Frontiers Media S.A. All rights reserved. Available at: http://dx.doi.org/10.3389/frobt.2021.738113

Paper 2: Heiberg, Amalie; Larsen, Thomas Nakken; Meyer, Eivind; Rasheed, Adil; San, Omer; Varagnolo, Damiano. Risk-based implementation of COLREGs for autonomous surface vehicles using deep reinforcement learning. Neural Networks 2022 ;Volum 152. s. 17-33. Published by Elsevier. This is an open access article under the CC BY license. Available at: http://dx.doi.org/10.1016/j.neunet.2022.04.008

Paper 3: Vaaler, Aksel; Husa, Svein Jostein; Menges, Daniel; Larsen, Thomas Nakken; Rasheed, Adil. Modular control architecture for safe marine navigation: Reinforcement learning with predictive safety filters. Artificial Intelligence 2024 ;Volum 336. s. - Published by Elsevier. This is an open access article under the CC BY license. Available at: http://dx.doi.org/10.1016/j.artint.2024.104201

Paper 4: Larsen, Thomas Nakken; Hansen, Hannah; Rasheed, Adil. Risk-based Convolutional Perception Models for Collision Avoidance in Autonomous Marine Surface Vessels using Deep Reinforcement Learning. IFAC-PapersOnLine 2023 ;Volum 56.(2) s. 10033-10038. Published by Elsevier. This is an open access article under the CC BY-NC-ND license. Available at: http://dx.doi.org/10.1016/j.ifacol.2023.10.870

Paper 5: Larsen, Thomas Nakken; Barlaug, Eirik Runde; Rasheed, Adil. Variational Autoencoders for Exteroceptive Perception in Reinforcement Learning-Based Collision Avoidance. I: Proceedings of the ASME 2024 43rd International Conference on Ocean, Offshore and Arctic Engineering. Volume 1: Offshore Technology. The American Society of Mechanical Engineers (ASME) 2024 ISBN 978-0-7918-8778-3. s. -

Paper 6: Barlaug, Eirik Runde; Fløystad, Jørgen Lind; Larsen, Thomas Nakken; Rasheed, Adil. Resource- Constrained Model-Free Control in Indoor Digital Twin Environments. In Expert Systems with Applications (2025).

Paper 7: Larsen, Thomas Nakken; Tabib, Mandar; Rasheed, Adil. Resource-Contstraint Dynamic Planning and Model-Free Control in Turbulent Urban Environments. In Robotics and Autonomous Systems (2025)
Publisher
NTNU
Series
Doctoral theses at NTNU;2025:235

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit