Pose Estimation with Convolutional Neural Networks.

Vakhidi, Fabian.

Vakhidi, Fabian.

Master thesis

Åpne

no.ntnu:inspera:109479168:64498681.pdf (9.185Mb)

Permanent lenke

https://hdl.handle.net/11250/3036873

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Institutt for maskinteknikk og produksjon [4028]

Sammendrag

Positurestimering ved hjelp av convolutional neural networks (CNN) faller under fellesbetegnelsen deep rotation regression. Deep rotation regression bestemmer en rotasjonsmatrise fra punktskyer, hvor løsningen vil sterkt avhenge av representasjonen som brukes for rotasjonsmatrisen. Denne masteroppgaven er inspirert av bidraget fra Chen et al. \cite{RPMG} som studerer gradientene til lærevennlige rotasjonsrepresentasjoner under backpropagation-stadiet til et CNN. Simuleringene utført i denne oppgaven beviser at ved å bruke Riemann-optimalisering for å beregne manifoldbevisste gradienter gjennom en målrotasjon $R_{g}$, konsekvent forbedrer nettverksytelsen ved bruken av $g_{M}$ og $g_{RPM}$ på quaternion, 6D, 9D og 10D representasjonene. Simuleringene viser at $g_{RPM}$ fra 6D, 9D og 10D representasjonene gir mest optimal konvergens. Simuleringene viser også at de homeomorfe rotasjonsrepresentasjonene har bedre nettverksytelse enn deres diskontinuerlige motsetninger når det brukes Euklidiske gradienter, $g_{M}$ og $g_{RPM}$.

Pose estimation with convolutional neural networks (CNN) falls under the umbrella as deep rotation regression. Deep rotation regression determines a rotation matrix from point cloud measurements, and the solution will depend on the representation that is used for the rotation matrix. In particular, this master's thesis is inspired by the contribution of Chen et al.\cite{RPMG} which studies the gradients of the quaternion, 6D, 9D and 10D representations during the backpropagation stage of a CNN. The simulations conducted in this thesis proves that by employing Riemannian optimization to compute manifold-aware gradients through a goal rotation $R_{g}$, consistently improves network performance when using $g_{M}$ and $g_{RPM}$ on quaternion, 6D, 9D and 10D representations. The simulations shows that the $g_{RPM}$ from 6D, 9D and 10D representations provides the most optimal convergence and neural network learning. The simulations further proves that the homeomorphic rotation representations enjoys the better network performance than their discontinuous counterparts when using Euclidean gradients, $g_{M}$ and $g_{RPM}$.

Utgiver

NTNU