dc.description.abstract | This master thesis consists of both a theoretical and practical part. In the theoretical
part of the thesis are three main parts of study: 1) Exploration of experimental search
methodologies used in a Digital forensics setting. 2) Analysis of the differences in documented
search capabilities between a set of open source search engines and open source
forensics tools capable of keyword search. 3) Identified and summarized publicly available
Digital forensic related datasets.
For the first area of exploration no surveys published in the period 2014-2017 could
be found. Therefore, this exploration tackles a missing gap in the current knowledge.
The second exploration creates an in-depth and up-to-date analysis of differences
in search capabilities, not found anywhere else. This analysis is useful for forensic examiners
and researcher that want to know which application is most suitable for their
problem domain.
The third exploration extends previous lists of its kind, and adds many new unlisted
forensic related datasets. This list, is to the best of my knowledge, the largest collection,
of publicly forensic related datasets published in any paper. This addition in the paper
will be useful for researchers in many subfields of Information security who are looking
for a dataset to use in their research. Using publicly available datasets will also make
their experiments more reproducible.
Some of the datasets are also used in the practical part of the thesis. The practical
part is a benchmark experiment where the open source search engines are tested on how
well they perform at indexing, searching and memory performance during searching.
Elasticsearch was generally better then Solr at index creation time, minimizing index size
and response time for the first run of search terms. Solr outperformed Elasticsearch on
second run of search terms. The difference between the search engines with regard to
memory performance during searching was negligible.
There are two main limitations with the experiment. The first being that the experiments
are performed on only one virtual host machine. This environment does not allow
testing for how well the search engines perform at distributed search. The second main
issue is that only the default configurations was tested (out-of-the-box setup) with Solr
and Elasticsearch. If more configurations had been tested, then some of the variables
such as sharding and segment count could be controlled. Up-to-date experiments with
the same testing methodology could not be found. The experiments provide information
that is useful for forensic examiners when deciding which search engine is best suitable
for their forensics tasks. | |