URL Crawling & classification system

Today, malware is often found on legitimate web sites that have been hacked. The aim of this thesis was to create a system to crawl potential malicious web sites and rate them as malicious or not. Through research into current malware trends and mechanisms to detect malware on the web, we analyzed and discussed the problem space, before we began design the system architecture. After we had implemented our suggested architecture, we ran the system through tests. These test shed some light on the challenges we had discussed. We found that our hybrid honey-client approach was of benefit to detect malicious sites, as some malicious sites were only found when both honey-clients cooperated. In addition, we got insight into how a LIHC can be useful as a queue pre-processor tool for a HIHC. On top of that, we learned the consequence of operating a system like this without a well built proxy server network: false-negatives.

Utgiver

Institutt for telematikk