Reliable Machine Learning Algorithms for Intrusion Detection Systems: Machine Learning for Information Security and Digital Forensics

Nguyen, Hai Thanh

Nguyen, Hai Thanh

Doctoral thesis

View/Open

Thesis_Electronic_Version_HaiThanhNguyen.pdf (3.583Mb)

URI

http://hdl.handle.net/11250/144370

Date

2012-11-09

Metadata

Show full item record

Collections

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2577]

Abstract

The principal focus of the present dissertation is to develop new machine learning methods for increasing the reliability, efficiency and effectiveness of intrusion detection systems. The dissertation studies (i) feature selection methods, (ii) supervised learning algorithms and (iii) un-supervised learning algorithms. Applications in intrusion detection include (1) general network-based intrusion detection systems, (2) general host-based intrusion detection systems, (3)Web application firewalls, (4) botnet-malware detection systems, and (5) testing systems ofWeb application firewalls. For the new machine learning methods, we propose to reformulate (i) a class of feature selection methods, e.g. correlation-based and mutual-information-based feature selection, (ii) Lp-norm support vector machines and (iii) the K-means clustering algorithm as discrete optimization problems and propose to unify them into one framework. We prove that these algorithms can be casted into a mixed 0-1 linear programming problems (M01LP), in which the number of variables and constraints are linear in the number of the input features. The obtained M01LP is solved by means of adequate algorithms, such as the branch and bound algorithm or the D.C. (Difference of Convex Functions) programming approach. The new formulation of machine learning algorithms allows to (a) realize the same representation of many different algorithms, (b) easily combine these algorithms to study their reliability including their optimality, generalization, consistency and robustness and (c) optimize the feature selection process and learning model selection process. For the applications in intrusion detection systems, we conduct experiments on five different datasets: KDD CUP 1999, UNM audit dataset, CSIC 2010 HTTP dataset, ECMLPKDD 2007 HTTP dataset, and Botnet Malware. The experimental results show that our new proposed approaches (a) decrease the computational efforts due to optimal learning algorithms and optimal feature selection, (b) increase the reliability including the generalization and robustness and (c) increase the efficiency and effectiveness of network-based intrusion detection systems, host-based intrusion detection systems, Web application firewalls, botnet-malware detection systems and testing systems of Web application firewalls.

Series

Doctoral dissertations at Gjøvik University College;4/2012
Doktorgradsavhandlinger ved Høgskolen i Gjøvik;4/2012