dc.description.abstract | The use of mobile internet is increasing as the service becomes faster
and more reliable. It is not only used by smartphones and tablets, but
also regular computers are connected. With the increase in usage comes
the need for an increased security. Companies have over the last 15
years been aware of Domain Name System (DNS) tunneling as means to
perform data exfiltration and Command and Control (C&C) attacks in
their networks. Before that DNS tunnels were used to access the internet
at cafés and hotels without having to pay for it.
Mobile devices today contain more and more data which might be sensitive
for both the user and his company and DNS tunnels are already in use on
mobile devices to avoid paying for internet data usage. If history repeats
itself, as it often does, will DNS tunnels soon be used to exfiltrate data
from mobile devices without anyone noticing. This is what this study
is trying to prevent. The study tries to find a viable machine learning
classifier for detecting DNS tunnels.
Machine learning is a great tool to find statistical properties of datasets,
and as DNS tunnels are irregularities should its properties be different.
The K-means classifier, a cluster classifier, and the One-Class SVM
(OCSVM) classifier, an outlier detector, are studied and tested in this
study.
The data was planned to be gathered using the opensource software
openGGSN. Using much time trying to set it up, did this plan have to
change. The data was then gathered with Wireshark. It captured DNS
traffic generated from four Virtual Machines (VMs) where one was using
a DNS tunnel. At first the DNS tunnel stood for over 50% of the data
collected, so it had to be reduced to be more representing of a larger
network. The data was reformatted by merging the request and response
in one line so the classifier could use those features together.
The precision, recall and F-score of the classifiers were tested on different
initiation parameters and features. For the K-means the results started
bad and neither changing the parameters nor features helped the results.
The OCSVM has multiple kernels which were tested and the poly kernel
looked very good on the first test. When changing the nu parameter
and the features, did the results of the poly kernel change drastically for
the worse. The Radial Basis Function (RBF) kernel kept a quite high
score specifically on the recall of the outliers and the precision on the inliers. More tests were executed using the RBF kernel changing both the
gamma and the nu parameters, which are the most sensitive parameters
for the kernel. Which in the end resulted in a 96% F-score where only
the precision on outliers was under 90% which means the models largest
weakness is a few false positive. | |