• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

The impact of preprocessing in natural language for open source intelligence and criminal investigation

Johnsen, Jan William; Franke, Katrin
Journal article
Accepted version
Thumbnail
View/Open
Johnsen.pdf (92.00Kb)
URI
https://hdl.handle.net/11250/2649835
Date
2019
Metadata
Show full item record
Collections
  • Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2805]
  • Publikasjoner fra CRIStin - NTNU [41952]
Original version
TEMP 2017 IEEE International Conference on Big Data (Big Data). 2019, 4248-4254.   10.1109/BigData47090.2019.9006006
Abstract
Underground forums serves as gathering place for like-minded cyber criminals and are an continued threat to law and order. Law enforcement agencies can use Open-Source Intelligence (OSINT) to gather valuable information to proactively counter existing and new threats. For example, by shifting criminal investigation's focus onto certain cyber criminals with large impact in underground forums and related criminal business models. This paper presents our study on text preprocessing requirements and document construction for the topic model algorithm Latent Dirichlet Allocation (LDA). We identify a set of preprocessing requirements based on literature review and demonstrate them on a real-world forum, similar to those used by cyber criminals. Our result show that topic modelling processes needs to follow a very strict procedure to provide significant result that can be useful in OSINT. Additionally, more reliable results are produced by tuning the hyper-parameters and the number of topics for LDA. We demonstrate improved results by iterative preprocessing to continuously improve the model, which provide more coherent and focused topics.
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Journal
TEMP 2017 IEEE International Conference on Big Data (Big Data)

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit