Vis enkel innførsel

dc.contributor.authorJohnsen, Jan William
dc.contributor.authorFranke, Katrin
dc.date.accessioned2020-04-01T09:56:03Z
dc.date.available2020-04-01T09:56:03Z
dc.date.created2020-03-02T20:29:44Z
dc.date.issued2019
dc.identifier.citationTEMP 2017 IEEE International Conference on Big Data (Big Data). 2019, 4248-4254.en_US
dc.identifier.issn2639-1589
dc.identifier.urihttps://hdl.handle.net/11250/2649835
dc.description.abstractUnderground forums serves as gathering place for like-minded cyber criminals and are an continued threat to law and order. Law enforcement agencies can use Open-Source Intelligence (OSINT) to gather valuable information to proactively counter existing and new threats. For example, by shifting criminal investigation's focus onto certain cyber criminals with large impact in underground forums and related criminal business models. This paper presents our study on text preprocessing requirements and document construction for the topic model algorithm Latent Dirichlet Allocation (LDA). We identify a set of preprocessing requirements based on literature review and demonstrate them on a real-world forum, similar to those used by cyber criminals. Our result show that topic modelling processes needs to follow a very strict procedure to provide significant result that can be useful in OSINT. Additionally, more reliable results are produced by tuning the hyper-parameters and the number of topics for LDA. We demonstrate improved results by iterative preprocessing to continuously improve the model, which provide more coherent and focused topics.en_US
dc.language.isoengen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.titleThe impact of preprocessing in natural language for open source intelligence and criminal investigationen_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber4248-4254en_US
dc.source.journalTEMP 2017 IEEE International Conference on Big Data (Big Data)en_US
dc.identifier.doi10.1109/BigData47090.2019.9006006
dc.identifier.cristin1799094
dc.description.localcode© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
cristin.unitcode194,63,30,0
cristin.unitnameInstitutt for informasjonssikkerhet og kommunikasjonsteknologi
cristin.ispublishedtrue
cristin.fulltextoriginal


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel