Online Grooming Detection on Social Media Platforms

Borj, Parisa Rezaee

dc.contributor.advisor	Bours, Patrick
dc.contributor.advisor	Hellan, Dorothee Beermann
dc.contributor.advisor	Raja, Kiran
dc.contributor.author	Borj, Parisa Rezaee
dc.date.accessioned	2023-03-30T12:57:47Z
dc.date.available	2023-03-30T12:57:47Z
dc.date.issued	2023
dc.identifier.isbn	978-82-326-5587-8
dc.identifier.issn	2703-8084
dc.identifier.uri	https://hdl.handle.net/11250/3061201
dc.description.abstract	Online grooming detection has become a critical research topic in the era of extensive data analysis. It is essential to protect vulnerable users, particularly adolescents, against sexual predation on online platforms and media. However, many factors challenge online grooming detection, which leads to a high-risk problem for youth. The primary goal of this research work is to provide techniques that increase children’s security on online chat platforms. To this extent, many experiments have been conducted to create models fulfilling our research goal. As such, this thesis contains a comprehensive survey of child exploitation in chat logs that provides the readers with a deep knowledge of the problem, possible research gaps, and proposed solutions. In this research, we split the online grooming detection problem into several subproblems, including author profiling, predatory conversation detection, predatory identification, and data limitations issues. The leading theory behind the author profiling in this problem comes from the fact that online predators provide fake identities to tarp their young victims. At the same time, children’s characteristics differ from the ones who imitate a minor, which leads us to detect the gender of users in this research. In this thesis, we propose a gender detection model that can recognize the gender of authors based on their keystroke dynamics features. This research also provides a fake identity detection technique with a high performance that detects users who are dishonest about their identity. Providing an automatic predatory conversation detection system facilitates law enforcement authorities to act on time before any tragedy occurs. Therefore, we have examined and proposed several predatory conversation detection and predatory identification techniques focusing on finding the best feature vectors and embeddings that lead to the best performance in online grooming detection. This thesis also aims to gain deep knowledge about predatory behaviour with semantic analysis. We might lose some semantic information by applying conventional embeddings such as Word2vec or GloVe feature vectors since they provide a single word embedding for a term in different contexts. At the same time, humans show their motivations in phrases or sentences rather than single terms. So, we provide an online grooming detection model based on extracting embeddings from sentences rather than single words. We apply contextual model based such as Bert-based and RoBerta-based systems for each sentence. Several constraints, such as privacy and security issues, availability, and the imbalanced nature of the datasets, challenge online grooming datasets. The number of predatory chat logs is considerably lower than the other online conversations, leading to a highly imbalanced data problem. It is challenging to build a machine learning model based on imbalanced datasets, which motivates us to provide a model to handle this issue. This research proposes a model that uses a hybrid sampling and class re-distribution to gain augmented data for coping with highly imbalanced datasets. We also improve the diversity of classifiers and feature vectors by perturbing the data along with the augmentation in an iterative manner. Finally, we conclude our research by discussing potential research gaps and open problems and proposing possible solutions for them to give deep insights to the readers of future work based on the work of this thesis.	en_US
dc.language.iso	eng	en_US
dc.publisher	NTNU	en_US
dc.relation.ispartofseries	Doctoral theses at NTNU;2023:111
dc.relation.haspart	Paper 1: Rezaee Borj, Parisa; Raja, Kiran; Bours, Patrick Adrianus. Online grooming detection: A comprehensive survey of child exploitation in chat logs. Knowledge-Based Systems 2022 ;Volum 259. This is an open access article under the CC BY license	en_US
dc.relation.haspart	Paper 2: Borj, Parisa Rezaee; Bours, Patrick. Detecting Liars in Chats using Keystroke Dynamics. I: Proceedings of the 2019 International Conference on Biometric Engineering and Applications (ICBEA 2019). Association for Computing Machinery (ACM) 2019 ISBN 978-1-4503-6305-1. Copyright © 2019 ACM	en_US
dc.relation.haspart	Paper 3: Li, Guoqiang; Borj, Parisa Rezaee; Bergeron, Loic; Bours, Patrick. Exploring Keystroke Dynamics and Stylometry Features for Gender Prediction on Chatting Data. I: Proceedings of the International Convention MIPRO. IEEE conference proceedings 2019 ISBN 978-1-5386-9296-7. s. - © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.relation.haspart	Paper 4: Borj, Parisa Rezaee; Bours, Patrick. Predatory Conversation Detection. I: International Conference on Cyber Security for Emerging Technologies. IEEE conference proceedings 2019 ISBN 978-1-7281-4539-6. s. - © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.relation.haspart	Paper 5: Borj, Parisa Rezaee; Bylappa Raja, Kiran; Bours, Patrick. On Preprocessing the Data for Improving Sexual Predator Detection. I: 15th International Workshop on Semantic and Social Media Adaptation and Personalization. IEEE 2020 ISBN 978-1-7281-5920-1 © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.relation.haspart	Paper 6: Borj, Parisa Rezaee; Raja, Kiran; Bours, Patrick. Detecting Sexual Predatory Chats by Perturbed Data and Balanced Ensembles. I: Proceedings of the 20th International Conference of the Biometrics Special Interest Group (BIOSIG2021). Gesellschaft für Informatik 2021 ISBN 978-1-6654-2693-0. © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.relation.haspart	Paper 7: Borj, Parisa Rezaee; Raja, Kiran; Bours, Patrick. (2023). Detecting Online Grooming By Simple Contrastive Chat Embeddings, 9th ACM International Workshop on Security and Privacy Analytics (IWSPA 2023) [Accepted]. This paper is not yet published and is therefore not included.	en_US
dc.title	Online Grooming Detection on Social Media Platforms	en_US
dc.type	Doctoral thesis	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US

Files in this item

Name:: Parisa Rezaee Borj.pdf
Size:: 4.650Mb
Format:: PDF

View/Open

Name:: Parisa Rezaee Borj_PhD.pdf
Size:: 5.014Mb
Format:: PDF
Description:: Fulltext not available

Locked

This item appears in the following Collection(s)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2527]

Show simple item record