Rotation Representations Methods for Pose Estimation with Deep Learning

Grüner, Henrik

dc.contributor.advisor	Egeland, Olav
dc.contributor.author	Grüner, Henrik
dc.date.accessioned	2022-10-07T17:32:31Z
dc.date.available	2022-10-07T17:32:31Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:109479168:33700882
dc.identifier.uri	https://hdl.handle.net/11250/3024739
dc.description.abstract	Problemstillingen med å estimere rotasjonsmatriser i 3D er et gammelt problem i datasyn som har gjenoppstått sammen med den ekstreme økningen av metoder i dyp læring. Det har blitt forsøkt å benytte nevrale nettverk for en bedre estimering av rotasjoner i 3D. Problemet er generelt ansett som vært vanskelig, ettersom det er topologiske forskjeller mellom utdataen til nevrale nettverk og rotasjonsrommet. Nevrale nettverk gir vanligvis ut data i det Euklidske rom, mens rotasjoner ligger i den spesielle ortogonale gruppen SO(3), som ikke er topologisk homeomorfe med noen under-set av fire-dimensjonalt Euklidsk rom. Dette misforholdet kan overkommes ved å benytte en parametriseringsfunksjon som overfører utdataen fra det nevrale nettverket til en matrise i det spesielle ortogonale rommet SO(3). De tradisjonelle metodene for å representere rotasjoner Euler vinkler og quaternioner er henholdsvis tre- og fire-dimensjonale, og følgelig kreves det en annen representasjon. Andre forslag er Gram-Schmidt ortogonalisering eller symmetrisk ortogonalisering gjennom singular value decomposition. Masteroppgaven er ment til å gi en oversikt over vanskelighetsgraden for estimering av rotasjonsmatriser i dyp læring, hvilke parametriseringsfunksjoner som eksisteres og bruker, og illustrere hvilke egenskaper som er ønskelige i en slik funksjon. De teoretiske argumentene er styrket gjennom eksperimenter med topp moderne maskinlæringsarkitekturer som i) sammenlikner parametriseringsfunksjonene, ii) Illustrerer potensialet for symmetrisk ortogonalisering innenfor rotasjonsestimering.
dc.description.abstract	The ability to regress 3D rotation matrices is an old problem in the field of computer vision. With the rise of deep learning methods, researchers have attempted to leverage neural networks for regression on rotation matrices. The task is difficult due to topological differences between the rotations and output of the models. Neural networks usually output data in the Euclidean space, whereas rotations in 3D is represented by the special orthogonal group SO(3), which is not topological homeomorphic to any subset of real 4D Euclidean space. This mismatch calls for a mapping function between the model and the estimated rotation output. The traditional methods for representing rotations, such as Euler angles and quaternions, have dimensions 3 and 4, respectively, and are hence discontinuous in the real space. Other representation functions such as Gram-Schmith orthogonalization and symmetric orthogonalization with singular value decomposition (SVD) have been proposed as a solution. This thesis gives an overview of the state of rotation matrix regression, the desirable properties for a mapping function from real Euclidean space to non-Euclidean manifolds, and shows that the SVD orthogonalization is the mapping with the best performance. The theoretical arguments are strengthened by experiments using state-of-the-art deep learning methods which i) show in a comparison environment that symmetric orthogonalization outperforms the other methods, and ii) show the potency of symmetric orthogonalization within the field of pose estimation.
dc.language	eng
dc.publisher	NTNU
dc.title	Rotation Representations Methods for Pose Estimation with Deep Learning
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:109479168:3370 ...
Størrelse:: 27.04Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for maskinteknikk og produksjon [4033]

Vis enkel innførsel