Detection of Vulnerabilities in Source Code Using Machine Learning and Natural Language Processing

Knutsen, Mathias; Lervik, Eivind Hestnes

dc.contributor.advisor	Morrison, Donn
dc.contributor.author	Knutsen, Mathias
dc.contributor.author	Lervik, Eivind Hestnes
dc.date.accessioned	2022-10-07T17:31:13Z
dc.date.available	2022-10-07T17:31:13Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:112046434:23311757
dc.identifier.uri	https://hdl.handle.net/11250/3024691
dc.description.abstract	Påvisning av sårbarheter er ikke er nytt tema, men de siste årene har det bare blitt viktigere. Ettersom sikkerhetskrav for programvareløsninger stadig blir strengere, og kostnadene for utvikling og testing øker, er det nødvendig å finne sårbarhetene før produksjon. Tiltakene som er på plass i dag inkluderer tunge statiske og dynamiske analyseverktøy, samt tidkrevende testing og fuzzing. Moderne maskinlæringsteknikker har blitt brukt på toppen av statiske analyseverktøy, og som frittstående løsninger for å bidra til å lette på arbeidsmengden til utviklere. I denne oppgaven gjennomgår vi litteraturen på temaet om påvisning av sårbarheter ved bruk av maskinlæring. Basert på litteraturen foreslår vi flere måter å løse problemet med maskinlæring. Våre modeller trener bare på kodesnutter av funksjoner uten noe kontekst om den omkringliggende koden for å forenkle problemet og minske behandlingstid. Vår beste modell oppdager 70% av alle sårbarheter i et testsett hentet fra kildekode i den virkelige verden, samtidig som den produserer færre enn én falsk positiv for hver ekte sårbarhet funnet.
dc.description.abstract	Vulnerability detection is not a new topic, but in recent years it has only become more important. As security requirements for software solutions become increasingly stricter, and the cost of development and testing only rises, there is a need to catch the vulnerabilities before production. The countermeasures in place today include heavy static and dynamic analysis, as well as time-consuming testing and fuzzing. Modern machine learning techniques have been applied on top of static analyzers, and as standalone solutions in order to contribute and take some weight off the shoulders of developers. In this thesis, we review the literature on the topic of vulnerability detection using machine learning. Based on the literature, we propose several machine learning approaches to the problem. Our models train only on function snippets, without any outside context about the code to simplify the problem and improve processing speeds. Our best model successfully detects 70% of all vulnerabilities in a test set extracted from real-world source code, while producing fewer than one false positive for each real vulnerable function found.
dc.language	eng
dc.publisher	NTNU
dc.title	Detection of Vulnerabilities in Source Code Using Machine Learning and Natural Language Processing
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:112046434:2331 ...
Størrelse:: 7.598Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6623]

Vis enkel innførsel