Deep Reinforcement Learning for Valve Manipulation

Vermedal, Arild

dc.contributor.advisor	Lekkas, Anastasios
dc.contributor.author	Vermedal, Arild
dc.date.accessioned	2019-10-31T15:12:30Z
dc.date.issued	2019
dc.identifier	no.ntnu:inspera:35771502:17406400
dc.identifier.uri	http://hdl.handle.net/11250/2625748
dc.description.abstract	Denne oppgaven er inspirert av den raske utviklingen innen kunstig intelligens, særlig forsterkende læring, og den økte anvendelsen for praktiske formål, blant annet til undervannsoperasjoner i olje- og gassindustrien, og da spesielt anvendelsen av forsterkende læring til¨å kontrollere robotmanipulatorer. Formålet med denne oppgaven er å utvikle en plattform som muliggjør opplæring av en forsterkende læringsagent i en simulering og på en virkelig manipulator slik at agenten er i stand til å bytte mellom dem uten omkonfigurering og er i stand til å gjenbruke samme politikk i begge tilfeller, og deretter bruke plattformen for å løse en modellert undersjøisk oppgave ved hjelp av både forsterkende læring og atferdsmessig kloning. "OpenManulator-X (RM-X52-TNM)"-manipulatoren med en simulering implementert i ROS og Gazebo ble kjøpt etter et søk for en robotmanipulator i ee prisklasse som var rimelig for forskning med en tilhørende simuleringsløsning. Denne programvareløsningen for simulering ble evaluert for forsterkende læring, noe som viste seg å være dårlig egnet for episodisk opplæring, spesielt ved å ikke være i stand til å nullstille manipulatoren til den opprinnelige konfigurasjonen og ha en dårlig sanntidsfaktor. Erfaring i det simulerte miljøet overføres imidlertid godt til virkelige omgivelser. Et forenklet miljø for oppgaven å nå frem til og å vri en ventil ble designet og opprettet for både simulering og virkeligheten. Proximal Policy Optimizaton (PPO) ble da brukt til å lære å løse hver enkelt oppgave i simuleringen, og den resulterende politikken brukes til å kontrollere den virkelige manipulatoren. Å åpne ventilen ble løst godt i simuleringen og gav ekvivalente resultater når den ble brukt på manipulatoren, mens å vri ventilen var svakt vellykket i simuleringen, men ble utført veldig bra på den virkelige manipulatoren. En annen løsning ble forsøkt ved å bruke adferdskloning og flere sett med ekspertdemonstrasjoner som utførte oppgavene, noe som ga bedre ytelse enn forsterkende læring med PPO når det gjelder å nå ventilen, men verre for å vri den. Til slutt ble de to tilnærmingene kombinert, hvor adferdskloning ble brukt til å generere en politikk i form av et kunstig nevralt nettverk, og denne politikkfunksjonen ble brukt som innledende politikk for algoritmen for PPO. Dette presterte på samme nivå som PPO-tilnærmingen og reduserte treningstiden sterkt. I denne oppgaven presenteres et oversiktsbilde av utviklingen til kontroll av robot manipulatorer over de siste 50 årene, etterfulgt av en introduksjon til det nåværende feltet for forsterkende læring, den grunnleggende teorien som ligger til grunn for den og de spesifikke aspektene som brukes av PPO. Videre presenteres prosessen og opplysningene om å sette opp programvaren og maskinvaren, og resultatene av de ulike oppgaveløsningene. Til slutt diskuteres prosessen og resultatene, og avhandlingens konklusjoner blir gitt.
dc.description.abstract	This thesis is inspired by the recent rapid advancements in artificial intelligence, in particular reinforcement learning, and the increased viability of practical applications such as subsea operations in the oil and gas industry, specifically the application of reinforcement learning to the task of controlling robotic manipulators. The aim of this thesis is to develop a platform enabling the training of an RL agent in a simulation and on a real-world manipulator such that the agent is able to switch between them with no reconfiguration and is able to reuse the same policy in both cases, and then use that platform to solve a modelled subsea task using both reinforcement learning and behavioral cloning. An "OpenManipulator-X (RM-X52-TNM)" manipulator with a simulation implemented in ROS and Gazebo was purchased after a search for a robotic manipulator in a price range affordable for research with an accompanying simulation solution. This software solution for simulation was evaluated for reinforcement learning, showing itself to be ill-suited for episodic training, in particular by being unable to reset the manipulator to the initial configuration, and having a poor real-time factor. However, experience in the simulated environment transferred well to a real-world setting. A simplified environment for the task of reaching and turning a valve was designed and created for both simulation and real-world. Proximal Policy Optimizaton (PPO) was then used to learn to solve each task in simulation, and the resulting policy used to control the real-world manipulator. Reaching the valve was solved well in simulation and gave equivalent results when applied to the manipulator, while turning the valve was moderately successful in simulation but performed very well on the real-world manipulator. A second solution was attempted using behavioral cloning and several sets of expert demonstrations performing the tasks, giving better performance than reinforcement learning with PPO on reaching the valve, but worse on turning it. Lastly, the two approaches were combined, where behavioral cloning was used to generate a policy in the form of an artificial neural network, and this policy function used as the initial policy for the PPO algorithm. This performed similarly to the PPO approach and greatly reduced the training time. In this thesis, an overview of the development of the control of robotic manipulator over the last 50 years is presented, followed by an introduction to the current field of reinforcement learning, the basic theory underlying it and the specifics employed by PPO. The process and particulars of setting up the software and hardware are described, and the results of the various task solutions presented. Lastly, the process and results are discussed, and the thesis' conclusions given.
dc.language	eng
dc.publisher	NTNU
dc.title	Deep Reinforcement Learning for Valve Manipulation
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:35771502:17406 ...
Størrelse:: 14.33Mb
Format:: PDF

Åpne

Filnavn:: no.ntnu:inspera:35771502:17406 ...
Størrelse:: 2.157Mb
Format:: application/zip

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for teknisk kybernetikk [3789]

Vis enkel innførsel