Anonymization of real data for IDS benchmarking

2006

ENGELSK:

Most IDS evaluation approaches use simulated network traffic as

base for the test data sets used in the evaluation. Simulated network

traffic lacks the diversities characteristic to a real world network.

These diversities may be caused by non-standard implementations

of protocols or abnormal protocol behavior, like un-

finished threeway TCP handshakes and teardowns.

For realistic IDS evaluations, there is a need for test data sets

based on real recorded network traffic. Such data sets must also

be distributable since a valid test should be possible to reproduce

by other evaluators. Due to legal concerns test data sets based on

real recorded traffic must be anonymized.

This thesis presents a methodology for anonymization of real network

data. The methodology focuses on information at the application

layer, and HTTP/1.1 in particular. A prototype, called

Anonymator, is implemented based on the methodology. A data

set anonymized using such a methodology can be used in IDS

evaluations, providing more realistic evaluations. It can also be

distributed since identifying information is anonymized. This way

evaluations can be validated by third parties.

The methodology and prototype are tested thoroughly through

experiments using a data set consisting of HTTP traffic mixed

with attacks. The prototype implements different anonymization

strengths that can be chosen by the operator. The experiments

show the differences between the anonymization schemes. The

differences are carefully explained. Results show that the two

strongest anonymization schemes give good level of anonymity

without losing too much realism.