Early cyber-grooming detection: Combining the Sliding window and Continuous risk-metric methods

Stien, Trond.

dc.contributor.advisor	Bours, Patrick.
dc.contributor.author	Stien, Trond.
dc.date.accessioned	2024-03-05T18:19:29Z
dc.date.available	2024-03-05T18:19:29Z
dc.date.issued	2023
dc.identifier	no.ntnu:inspera:155686180:16723619
dc.identifier.uri	https://hdl.handle.net/11250/3121160
dc.description	Full text not available
dc.description.abstract	Cyber-grooming er et økende problem. Gjennom bruk av moderne teknologi kan en overgriper få tilgang til barn som tidligere var utenfor overgriperens geografiske begrensende område. Problemet er ikke bare et samfunnsproblem, men også et stordata problem da storparten av meldingene som sendes over internett til daglig er harmløse. Deteksjon gjennom manuell gjennomgang av samtaler er ikke lengre tilstrekkelig og det eksisterer et behov for automatiske verktøy som kan bistå med en bedre og mer gjennomførbar deteksjonsmetodikk. Tidligere forskning rundt temaet cyber-grooming har hovedsakelig fokusert på etter hendelser av samtaler som allerede har funnet sted, som i verste fall kan bety at en overgriper allerede har hatt suksess med deres cyber-grooming mål. Deteksjonsmetoder som på et tidlig stadium kan detektere cyber-grooming samtaler gjennom automatisk prosessering av store mengder data, flagge de samtalene som trenger nærmere undersøkelse av en analytiker og dermed muliggjør proaktive handlinger på samtalene, er noe som trengs forskes på. Denne forskningen adresserer dette forskningsgapet ved å iterere og utbedre på to eksisterende deteksjonsmetodikker for tidlig deteksjon: Sliding window metodikken og Continuous risk-metric metodikken. Forskningen introduserer også en ny tredje metodikk kalt Combined metodikken, som er bygd ved hjelp av sammenslåing av de to nevnte metodikkene. Metodikkene blir evaluert av F-Latency metrikken, samt en ny foreslått Dynamic F1-Speed metrikk. Disse metrikkene evaluerer ikke bare hvor god metodene er til å klassifisere samtaler, men inkorporerer også et element av hvor tidlig klassifiseringen ble gjort. Gjennom forbedringene som blir gjort i prosjektet, er en F-Latency-Score på 0,86 og en DF1S-Score på 44,06 oppnådd på PAN-12 datasettet. Gjennom testing på FullPJ datasettet, oppnås identifikasjon av suspekte samtaler etter et gjennomsnitt på bare 10,1 meldinger. Disse forbedringene fasiliterer til ytterligere overgang fra fokus på etter hendelser, til en mer proaktiv og forebyggende tilnærming gjennom deteksjon av cyber-grooming på et tidligere stadium.
dc.description.abstract	Cyber-grooming is an increasing problem. Through the use of modern technology, a child predator can gain access to children that were previously outside of their geographical reach. The issue is not only a societal concern but a big data problem, as the vast amount of internet messages being sent daily are benign messages. Traditional detection through manual parsing of conversations is no longer feasible, indicating the need for effective and automated tools that enable for more suitable detection methodologies for the modern era. Research into the cyber-grooming domain has mainly focused on the post-incident forensic approach that deals with classifications of conversations after they have already happened. This can, in worst-case scenarios, mean that the predator has already succeeded in their cyber-grooming goal. Research on early detection methods that can automatically parse large amounts of data, flag conversations that need a closer look by an analyst, and thus create opportunities for proactive intervention steps on the ongoing conversations, is sorely needed. This research addresses this gap in research by iterating and expanding upon two existing techniques, the Sliding window method, and the Continuous risk metric method. It also introduces a new third method named the Combined method, which is built from a combination of the aforementioned methods. The methodologies are evaluated using the F-Latency-metric, and a new suggested Dynamic F1-Speed-metric. These metrics take into account not only how good the methods are at classifying the conversations, but also add an element of how early the classification was done. Through the advancements made within the research project, an F-Latency-Score of 0.86 and a DF1S-Score of 44.06 are achieved on the PAN-12 dataset. Through testing on the FullPJ dataset, suspicious conversation detections after an average of only 10.1 messages are also achieved. These improvements facilitate a further shift from the forensic post-incident focus, to a more proactive and preventative one through the detection of cyber-grooming at an earlier stage.
dc.language	eng
dc.publisher	NTNU
dc.title	Early cyber-grooming detection: Combining the Sliding window and Continuous risk-metric methods
dc.type	Master thesis

Tilhørende fil(er)

Filer	Størrelse	Format	Vis

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2526]

Vis enkel innførsel