Active Perception for 3D Reconstruction of Objects based on RGB-D images and Deep Reinforcement Learning

Mikkel Svagård

dc.contributor.advisor	Theoharis Theoharis
dc.contributor.author	Mikkel Svagård
dc.date.accessioned	2019-11-16T15:00:29Z
dc.date.available	2019-11-16T15:00:29Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/11250/2628789
dc.description.abstract	3D rekonstruksjon av objekter har, med framgangen innnen datasyn, blitt en integral komponent i et bredt mangfold av felter. Active vision (aktiv datasyn), som er 3D rekonstruksjon ved hjelp av kameraer montert til robotarmer, har følgelig utviklet seg. Dette er delvis på grunn av progresjonen innen datasyn-metodologi, men også grunnet den statid økende tilgjengeligheten av høyt-ytende hardware. “Strategien” innen active vision er den konseptuelle tilnærmingen til hvordan manøvrere kameraet under den pågående konstruksjonen. Invuitivt, med framgangen innen slik teknologi er en tilsvarende godt utviklet strategi nødvending. I de senere årene har disipliner som maskinlæring og dyp læring raskt utviklet seg og revlusjonert felter slik som datasyn. De nyeste maskinlæringsteknikkene, brukt på problemer med rik informasjon som bilde- og objekt-gjenkjenning, har gitt imponerende resultater. SINTEF Ocean kom opp med den interessante ideen om å parre deep reinforcement learning (dyp forsterkningslæring) med active vision, en tilnærming enda ikke sett innen 3D rekonstruksjon. I denne oppgaven er moderne datasyn metoder og relevante dyp læringsstudier undersøkt. Basert på disse funnene er en fornufig tilnærming som benytter seg av dyp læring, sammen med en basis for et simulert system, foreslått og implementert. Systemet benytter seg av konseptet voxel-representasjon i mål om å trene deep reinforcement agenten. Resultatene og funnene av eksperimentene rundt 3D rekonstruksjons-systemet vil bli overført til et ekte system under SINTEF Ocean. Følgelig må da det utviklede systemet ta hensyn til forskjellene mellom de to miljøene, og videre være utviklet med hensyn på flekisibilet og modularitet. Det foreslåtte systemet er kalt Voxel-basert NBV policy-model; voxel-basert fordi det benytter seg av voxel representasjoner, og NBV fordi det følger next-best-view (neste, beste vinkel) tilnærmingen. Arkitekturen av systemet er basert på VoxNet(Maturana and Scherer, 2015), da det er sett i lys av funnene og undersøkelsene av relevante arkiteturer og deres antatte affinitet med dette prosjektet. Videre er en “pipeline” for systemet, en kompleks og fornuftig belønningsfunksjon, og et detaljert system-oppsett presentert. Systemet er utviklet i det virtuelle miljøet Unity, og benytter seg av maskinlærsingsrammeverket TensorFlow. Videre er dybde bilde- og punkt-sky prossesering utviklet ved hjelp av moderne, anerkjente teknikker, og belønnings-funksjonen er satt sammen av flere belønningskomponenter inspirert av lignende læringssystemer. Ti unike 3D modeller er konstruert til å teste og evaluere et stort spenn variasjoner av Voxel-based NBV policy-modellen.
dc.description.abstract	In recent years, the disciplines of machine learning and deep learning have progressed rapidly, revolutionizing fields such as computer vision. The state-of-the-art machine learning techniques on tasks of dense information such as image and object classification have yielded impressive, never-seen-before results. SINTEF Ocean proposed the interesting idea of pairing deep reinforcement learning with active vision, a yet to be thoroughly explored approach to 3D reconstruction. In this thesis, state-or-the-art active vision methods are investigated, along with relevant deep learning studies and findings. Based on the investigations, viable active vision approaches using deep learning, and the basis of a simulated system are proposed and implemented. This system uses the concept of voxel representations and its aim is to train the deep reinforcement learner. The results of the investigations and experiments of the 3D reconstruction system are to be transferred to a real-world application hosted by SINTEF Ocean. As such, the developed system takes into account the differences between the two environments and is developed with a sense of flexibility and modularity. The proposed reinforcement learning system is called Voxel-based NBV policy-model; Voxel-based due to its use of voxel-representations, and NBV as it follows the next-best-view approach. The architecture of the system is based in VoxNet(Maturana and Scherer, 2015), grounded in the findings of the investigations of relevant architectures and their corresponding assumed affinity with this project. A full pipeline of the system is further proposed, along with a complex, yet reasonable reward function and detailed system setup. The system is developed in the environment of Unity, using the auxiliary software TensorFlow as machine learning framework. Depth image- and point-cloud processing is developed using modern, well-known techniques, while the reinforcement learning reward function is constructed using several reward components inspired by other similar learning systems. A wide range of variations of the Voxel-based NBV policy-model is tested and evaluated using ten different 3D models created especially for this project. The experiments conducted using the Voxel-based NBV policy-model prove significant performance, tough yielding some unresolved problems yet to be thoroughly investigated. The top-performing model variations produced satisfactory camera-paths with few steps and reasonable distance, proving the success of the proposed approach. Moreover, the variations are further assumed to hold great promise in the real-world application as well. As such, the investigations of the proposed deep reinforcement learning approach in the field of active vision has yielded novel and pioneering findings, with ambitions to carry over its success to the continuing experiments with the Voxel-based NBV policy-model in both the simulated environment and the real world.
dc.language	eng
dc.publisher	NTNU
dc.title	Active Perception for 3D Reconstruction of Objects based on RGB-D images and Deep Reinforcement Learning
dc.type	Master thesis

Files in this item

Files	Size	Format	View

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6551]

Show simple item record