Parallelization of Local Learning Rules

Hovind, Ingebrigt Kristoffer Thomassen; Sletta, Erling Sung

dc.contributor.advisor	Eidheim, Ole Christian
dc.contributor.author	Hovind, Ingebrigt Kristoffer Thomassen
dc.contributor.author	Sletta, Erling Sung
dc.date.accessioned	2022-07-30T17:19:30Z
dc.date.available	2022-07-30T17:19:30Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:111604085:111608604
dc.identifier.uri	https://hdl.handle.net/11250/3009254
dc.description.abstract	Denne bacheloroppgaven er et arbeid basert på en artikkel skrevet av Ole Christian Eidheim, som beskriver en "novel unsupervised learning rule based on Gaussian functions that can perform online clustering without needing to specify the number of clusters prior to training". Mer spesifikt så handler oppgaven om å oversette en Pythonalgoritme utgitt sammen med Eidheims artikkel til C++, og så parallellisere den på både CPU og GPU. Vi har produsert fem forskjellige, godt optimaliserte implementasjoner. Den første er en standard C++ implementasjon som kun bruker én tråd. Den andre og tredje er utvidede versjoner av den første, og bruker flere tråder men med forskjellige strategier. De siste to implementasjonene bruker to forskjellige tredjepartsbibliotek, ArrayFire og Boost Compute, for å parallellisere algoritmen på GPU. Vi har samlet data på flere forskjellige plattformer om hvordan forskjellige implementasjoner av samme algoritme, som gir oss verdifull informasjon om nøyaktig hvilke strategier det er som lønner seg og hvor mye man kan forvente å tjene på parallellisering. Vi har funnet at med riktig implementasjon på riktig system, så kan GPU-parallellisering kjøre raskere enn CPU-versjoner, selv når man bare kjører 16 parallelle løkker. Også med mindre optimale implementasjoner så skalerer de GPU-parallelliserte implementasjonene som oftest bedre enn de fleste CPU-implementasjoner, og oppnår lik eller raskere kjøretid med 169 parallelle utregninger på våre maskiner.
dc.description.abstract	This thesis is a piece of further work based on an article written by Ole Christian Eidheim. There, he describes "a novel unsupervised learning rule based on Gaussian functions that can perform online clustering without needing to specify the number of clusters prior to training". The specific focus has been to improve the computational speed of the Python algorithm released together with Eidheim's article, by porting it to C++ and parallelizing it on both CPU and GPU. We have developed five different, well-optimized implementations. The first is a standard C++ implementation using only one thread, the second and third implementations are then extensions of the first, both utilizing several threads but with different strategies. The final two implementations utilize the third party libraries ArrayFire and Boost Compute in order to GPU-accelerate the algorithm. We have gathered data on the execution time of the various algorithms on two different systems, giving valuable information on exactly what strategies pay off, and how much one can expect to benefit from parallelism. We have found that with the right implementation running on the right system, GPU-acceleration can be faster than ordinary CPU implementations, even when running only 16 calculation loops in parallel. For less optimal implementations, the GPU-accelerated programs scale better than most of the CPU implementations, and are able to achieve a faster execution time on our hardware setups when utilizing 169 calculation loops.
dc.language	eng
dc.publisher	NTNU
dc.title	Parallelization of Local Learning Rules
dc.type	Bachelor thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:111604085:1116 ...
Størrelse:: 8.241Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6552]

Vis enkel innførsel