Accelerating Training of Deep Reinforcement Learning-based Autonomous Driving Agents Through Comparative Study of Agent and Environment Designs

Vergara, Marcus Loo

Vergara, Marcus Loo

Master thesis

Åpne

no.ntnu:inspera:2524341.pdf (32.00Mb)

Permanent lenke

http://hdl.handle.net/11250/2625841

Utgivelsesdato

2019

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6808]

Sammendrag

In this thesis, we will be investigating the current landscape of state-of-the-art methods using deep reinforcement learning for the purposes of training self-driving cars. Autonomous driving has garnered the interest of researchers, governments, and private companies as of late, as such technologies promise to solve several problems that are prominent in modern society. Examples of such problems include individuals spending a lot of their time in traffic due to congestion, and people and institutions having to financially support costly car accidents made by human errors. Advancements in machine learning is what drives autonomous vehicle technology forward, and we have already seen several big actors in the automobile and artificial intelligence industries take advantage of this; having autonomous vehicles drive several miles on public roads without incidents. The primary goal of this thesis is to provide a comprehensive analysis of current methods in deep reinforcement learning for training autonomous vehicle agents, and our main contribution comes in the form of providing a working example of a Proximal Policy Gradient (PPO) based agent that can reliably learn to drive in the urban driving simulator, CARLA. Through our work, we provide two OpenAI-like environments for CARLA that we have designed to (1) minimize overall training time, and to (2) provide the necessary metrics for comparing models across runs. One of these environments is only concerned with following a predetermined lap, while the other is focused on navigating arbitrary paths provided by a topological planner -- similar to how we would navigate in real-life. In creating these environments, we provide some analysis as to how various environment design decisions -- such as training with different reward formulations, training in asynchronous/synchronous environments, or using environments with or without checkpoints -- affect the resulting agent. Furthermore, we will be presenting various experiments on the use of variational autoencoders in the training pipeline, and show how we were able to significantly improve the quality of our agent by training a variational autoencoder to reconstruct semantic segmentation maps rather than training it to reconstruction the source RGB images themselves. For the lap environment, we will provide a couple of models that reliably learn to drive along the 1245m lap in approximately 8 hours. For the route environment, we will show that we can train a PPO network with multiple policy networks to create an agent that is able to follow the commands of a topological planner to moderate success.

Utgiver

NTNU