Path Following and Collision Avoidance
for Quadcopters using Deep
Reinforcement Learning

Sundøen, Ludvig Løken

dc.contributor.advisor	Rasheed, Adil
dc.contributor.advisor	Larsen, Thomas Nakken
dc.contributor.author	Sundøen, Ludvig Løken
dc.date.accessioned	2022-09-28T17:43:23Z
dc.date.available	2022-09-28T17:43:23Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:102231297:37187344
dc.identifier.uri	https://hdl.handle.net/11250/3022408
dc.description.abstract	Klassiske kontrollmetoder avhenger av nøyaktige modeller. Slike modeller eksisterer ikke alltid for komplekse systemer, og kontroll er begrenset til enkle oppgaver på lavt nivå. Modellfrie Reinforcement Learning (RL) metoder kan finne nesten optimale kontrollere fra data og erfaring uten behov for slike modeller og lære komplekse kontrollpolicyer med en vilkårlig nivå av abstraksjon på inndataen. Forskning på RL i kontinuerlig kontroll har bare startet de siste årene, spesielt med økningen i popularitet og anvendelighet av dype nevrale nettverk som universelle funksjonstilnærere. Proximal Policy Optimization (PPO) er allment akseptert som den foretrukne RL-algoritmen for kontrollproblemer i kontrollapplikasjoner. Denne oppgaven brukte PPO-algoritmen for å løse banefølging og kollisjonsunngåelse i en simulert kvadrorotor-applikasjon, ved å bruke styringsteoretiske navigasjonsfunksjoner for propriosepsjon og sfærisk LIDAR for eksterosepsjon. Et læringsrammeverk er implementert for å iscenesette læringen i trinnvist; den autonome agenten lærer å stabilisere systemet, følge vilkårlige veier i 3D og unngå hindringer, trinnvis. Den resulterende RL-agenten lykkes stort sett med å følge veien, men har vanskeligheter med å unngå hindringer. Videre generaliserer den resulterende agenten vellykket til scenarier som ikke oppstår under treningsfasen. Fremtidige forskningsretninger foreslås for å forbedre agentens ytelse for å unngå kollisjoner.
dc.description.abstract	Classical control methods depend on accurate models. Such models may not exist for complex systems, and control is limited to simple low-level tasks. Model-free Reinforcement Learning (RL) methods can find near-optimal controllers from data and experience without the need for such models and learn complex control policies with an arbitrary input abstraction. Research in RL for continuous control has only kicked off in the last few years, especially with the rise in popularity and applicability of deep neural networks as universal function approximators. Proximal Policy Optimization (PPO) is widely accepted as the preferred RL algorithm for control problems in control applications. This thesis applied the PPO algorithm to solve simultaneous path following and collision avoidance in a synthetic quadcopter environment, using guidance-theoretic navigation features for proprioception and spherical LIDAR for exteroception. A curriculum learning framework is implemented to stage the learning in incremental steps; the autonomous agent learns to stabilize the system, follow arbitrary paths in 3D, and avoid obstacles, incrementally. The resulting RL agent largely succeeds in path following but has difficulties avoiding obstacles. Furthermore, the resulting agent successfully generalizes to scenarios not encountered during the training phase. Future research directions are suggested to improve the agent's collision avoidance performance.
dc.language	eng
dc.publisher	NTNU
dc.title	Path Following and Collision Avoidance for Quadcopters using Deep Reinforcement Learning
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:102231297:3718 ...
Størrelse:: 8.656Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for teknisk kybernetikk [3666]

Vis enkel innførsel

Path Following and Collision Avoidance for Quadcopters using Deep Reinforcement Learning

Tilhørende fil(er)

Denne innførselen finnes i følgende samling(er)