Automatic Model Parallelism for Deep Learning Using Execution Time Modelling and Evolutionary Computation

Andreassen, Eivind Lie

dc.contributor.advisor	Downing, Keith L.
dc.contributor.advisor	Chandra, Arjun
dc.contributor.advisor	Cevolani, Lorenzo
dc.contributor.author	Andreassen, Eivind Lie
dc.date.accessioned	2021-09-15T16:12:43Z
dc.date.available	2021-09-15T16:12:43Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:57320302:23169505
dc.identifier.uri	https://hdl.handle.net/11250/2777823
dc.description.abstract	Dette prosjektet tar sikte på å automatisk finne modell-parallelle konfigurasjoner for dype nevrale nettverk ved hjelp av en kombinasjon av evolusjonære algoritmer og simulering av kjøretid for dyp læring. De siste årene har kraften og nytteverdien av dyp læring økt betydelig, men kompleksiteten til modellene har samtidig økt, noe som har ledet til et stort behov for regnekraft. Dette har gjort parallelliseringsteknikker helt essensielle -- både for å redusere tiden det tar å trene opp slike modeller, og for å få plass til modellene i hurtigminnet til tilgjengelige prosesseringsenheter. Bruken av parallellisering for dyp læring er et ikke-trivielt problem, men mye av denne kompleksiteten kan avlastes ved å benytte moderne optimeringsteknikker, slik som evolusjonære algoritmer. Videre kan bruken av en simulering av treningsprosessen muliggjøre en slik prosess uten behov for tilgang til dyr treningsmaskinvare. Dette vil også la prosessen terminere fortere enn hvis evalueringen skal foregå gjennom testing på den fysiske maskinvaren. Denne rapporten presenter en kjøretidssimulator for nevrale nettverk, og to optimeringsalgoritmer som kan finne treningskonfigurasjoner for nevrale nettverk -- en genetisk algoritme, og en MAP-Elites-algoritme. Fokuset er på å løse enhetsplasseringsproblemet, som innebærer å plassere individuelle operasjoner fra et nevralt nettverk på prosesseringsenheter, slik at nettverket kan bli trent i en modell-parallell konfigurasjon. I eksperimentene gir de to optimeringsalgoritmene bedre resultater enn et sammenlikningsgrunnlag bestående av "Hill Climbing"-algoritmen og "Simulated Annealing"-algoritmen. De evolusjonære algoritmene er i stand til å finne gode løsninger for en rekke probleminstanser, og finner optimale løsninger i de enkleste instansene. Påvirkningen kjøretidssimulatoren har på løsningene blir også evaluert gjennom eksperimentene. Resultatene indikerer at simulatoren gir en tilnærmet korrekt sortering av løsningene basert på kjøretid. Dette betyr at en optimeringsprosess som bruker simulatoren for evaluering av løsninger vil ende opp på en endelig løsning som er gyldig for bruk i den virkelige verden.
dc.description.abstract	Using methods from the field of evolutionary computation combined with an execution time simulator for deep neural networks, this project aims to automate the configuration of model parallel training strategies for deep learning. Recent years have seen significant advances in the power and utility of deep learning, but also an increase in the complexity of deployed models, leading to large computational requirements. Parallelism techniques have become essential, both in order to reduce training time and to fit the models in the memory of available computational devices. Applying parallelism to deep learning is non-trivial, but much of the complexity of the task can be alleviated by applying optimization techniques such as evolutionary algorithms. Moreover, through the use of a simulation of the training process, parallel configurations can be found without access to expensive training hardware, while terminating faster than possible if evaluation runs on real hardware were to be carried out. This report presents an execution simulator for neural networks, and two optimization algorithms for finding configurations for neural networks -- a genetic algorithm, and a MAP-Elites algorithm. The focus is on solving the device placement problem, in which the individual operations in a neural network are placed onto a set of devices for model parallel execution. In the experiments, the two algorithms are shown to outperform a baseline consisting of a hill climbing and a simulated annealing algorithm. The algorithms are able to find good solutions across several problem instances, with the optimal solution being found in the simplest instances. The impact of the execution simulator is also evaluated through experiments. These indicate that the execution simulator gives an approximately correct ordering of solutions with regards to their quality, indicating that an optimization process run against the simulator will yield solutions that are valid for application in the real world.
dc.language
dc.publisher	NTNU
dc.title	Automatic Model Parallelism for Deep Learning Using Execution Time Modelling and Evolutionary Computation
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:57320302:23169 ...
Størrelse:: 13.04Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6830]

Vis enkel innførsel