The study of keyword search in open source search engines and digital forensics tools with respect to the needs of cyber crime investigations

Hansen, Joachim

Hansen, Joachim

Master thesis

View/Open

18187_FULLTEXT.pdf (2.113Mb)

18187_ATTACHMENT.zip (18.39Mb)

18187_COVER.pdf (1.556Mb)

URI

http://hdl.handle.net/11250/2479196

Date

2017

Metadata

Show full item record

Collections

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2578]

Abstract

This master thesis consists of both a theoretical and practical part. In the theoretical

part of the thesis are three main parts of study: 1) Exploration of experimental search

methodologies used in a Digital forensics setting. 2) Analysis of the differences in documented

search capabilities between a set of open source search engines and open source

forensics tools capable of keyword search. 3) Identified and summarized publicly available

Digital forensic related datasets.

For the first area of exploration no surveys published in the period 2014-2017 could

be found. Therefore, this exploration tackles a missing gap in the current knowledge.

The second exploration creates an in-depth and up-to-date analysis of differences

in search capabilities, not found anywhere else. This analysis is useful for forensic examiners

and researcher that want to know which application is most suitable for their

problem domain.

The third exploration extends previous lists of its kind, and adds many new unlisted

forensic related datasets. This list, is to the best of my knowledge, the largest collection,

of publicly forensic related datasets published in any paper. This addition in the paper

will be useful for researchers in many subfields of Information security who are looking

for a dataset to use in their research. Using publicly available datasets will also make

their experiments more reproducible.

Some of the datasets are also used in the practical part of the thesis. The practical

part is a benchmark experiment where the open source search engines are tested on how

well they perform at indexing, searching and memory performance during searching.

Elasticsearch was generally better then Solr at index creation time, minimizing index size

and response time for the first run of search terms. Solr outperformed Elasticsearch on

second run of search terms. The difference between the search engines with regard to

memory performance during searching was negligible.

There are two main limitations with the experiment. The first being that the experiments

are performed on only one virtual host machine. This environment does not allow

testing for how well the search engines perform at distributed search. The second main

issue is that only the default configurations was tested (out-of-the-box setup) with Solr

and Elasticsearch. If more configurations had been tested, then some of the variables

such as sharding and segment count could be controlled. Up-to-date experiments with

the same testing methodology could not be found. The experiments provide information

that is useful for forensic examiners when deciding which search engine is best suitable

for their forensics tasks.

Publisher

NTNU