Anonymization of real data for IDS benchmarking
Abstract
ENGELSK:
Most IDS evaluation approaches use simulated network traffic as
base for the test data sets used in the evaluation. Simulated network
traffic lacks the diversities characteristic to a real world network.
These diversities may be caused by non-standard implementations
of protocols or abnormal protocol behavior, like un-
finished threeway TCP handshakes and teardowns.
For realistic IDS evaluations, there is a need for test data sets
based on real recorded network traffic. Such data sets must also
be distributable since a valid test should be possible to reproduce
by other evaluators. Due to legal concerns test data sets based on
real recorded traffic must be anonymized.
This thesis presents a methodology for anonymization of real network
data. The methodology focuses on information at the application
layer, and HTTP/1.1 in particular. A prototype, called
Anonymator, is implemented based on the methodology. A data
set anonymized using such a methodology can be used in IDS
evaluations, providing more realistic evaluations. It can also be
distributed since identifying information is anonymized. This way
evaluations can be validated by third parties.
The methodology and prototype are tested thoroughly through
experiments using a data set consisting of HTTP traffic mixed
with attacks. The prototype implements different anonymization
strengths that can be chosen by the operator. The experiments
show the differences between the anonymization schemes. The
differences are carefully explained. Results show that the two
strongest anonymization schemes give good level of anonymity
without losing too much realism.