dc.contributor.author | Johnsen, Jan William | |
dc.contributor.author | Franke, Katrin | |
dc.date.accessioned | 2020-04-01T09:56:03Z | |
dc.date.available | 2020-04-01T09:56:03Z | |
dc.date.created | 2020-03-02T20:29:44Z | |
dc.date.issued | 2019 | |
dc.identifier.citation | TEMP 2017 IEEE International Conference on Big Data (Big Data). 2019, 4248-4254. | en_US |
dc.identifier.issn | 2639-1589 | |
dc.identifier.uri | https://hdl.handle.net/11250/2649835 | |
dc.description.abstract | Underground forums serves as gathering place for like-minded cyber criminals and are an continued threat to law and order. Law enforcement agencies can use Open-Source Intelligence (OSINT) to gather valuable information to proactively counter existing and new threats. For example, by shifting criminal investigation's focus onto certain cyber criminals with large impact in underground forums and related criminal business models. This paper presents our study on text preprocessing requirements and document construction for the topic model algorithm Latent Dirichlet Allocation (LDA). We identify a set of preprocessing requirements based on literature review and demonstrate them on a real-world forum, similar to those used by cyber criminals. Our result show that topic modelling processes needs to follow a very strict procedure to provide significant result that can be useful in OSINT. Additionally, more reliable results are produced by tuning the hyper-parameters and the number of topics for LDA. We demonstrate improved results by iterative preprocessing to continuously improve the model, which provide more coherent and focused topics. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | en_US |
dc.title | The impact of preprocessing in natural language for open source intelligence and criminal investigation | en_US |
dc.type | Journal article | en_US |
dc.description.version | acceptedVersion | en_US |
dc.source.pagenumber | 4248-4254 | en_US |
dc.source.journal | TEMP 2017 IEEE International Conference on Big Data (Big Data) | en_US |
dc.identifier.doi | 10.1109/BigData47090.2019.9006006 | |
dc.identifier.cristin | 1799094 | |
dc.description.localcode | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | en_US |
cristin.unitcode | 194,63,30,0 | |
cristin.unitname | Institutt for informasjonssikkerhet og kommunikasjonsteknologi | |
cristin.ispublished | true | |
cristin.fulltext | original | |