Anonymization of real data for IDS benchmarking
MetadataShow full item record
ENGELSK: Most IDS evaluation approaches use simulated network traffic as base for the test data sets used in the evaluation. Simulated network traffic lacks the diversities characteristic to a real world network. These diversities may be caused by non-standard implementations of protocols or abnormal protocol behavior, like un- finished threeway TCP handshakes and teardowns. For realistic IDS evaluations, there is a need for test data sets based on real recorded network traffic. Such data sets must also be distributable since a valid test should be possible to reproduce by other evaluators. Due to legal concerns test data sets based on real recorded traffic must be anonymized. This thesis presents a methodology for anonymization of real network data. The methodology focuses on information at the application layer, and HTTP/1.1 in particular. A prototype, called Anonymator, is implemented based on the methodology. A data set anonymized using such a methodology can be used in IDS evaluations, providing more realistic evaluations. It can also be distributed since identifying information is anonymized. This way evaluations can be validated by third parties. The methodology and prototype are tested thoroughly through experiments using a data set consisting of HTTP traffic mixed with attacks. The prototype implements different anonymization strengths that can be chosen by the operator. The experiments show the differences between the anonymization schemes. The differences are carefully explained. Results show that the two strongest anonymization schemes give good level of anonymity without losing too much realism.