Event Detection using Wikipedia
Abstract
The amount of information on the web is ever growing, and analysing and making sense ofthis is difficult without specialised tools. The concept of Big Data is the field of processing and storing these enormous quantities of data. Wikipedia is an example of a source that can benefit from improvements provided by this concept. It is an online, user driven encyclopedia that contains vast amounts of information on nearly all aspects known to man. Every hour, Wikipedia releases logs showing the number of views each page had, and it is also possible to gain access to all edits as Wikipedia frequently release the entire encyclopedia - containing all revisions. This makes it a great source when studying trends of recent years. In order to systematise these page views and edits, we design a scalable database, and implement a number of analysing jobs to process this information. From the page views and edit count, we perform burst and event detection, and compare the usefulness of two methods used for this purpose. We test the validity of our system by examining the case of the of the football transfer window in August 2011. Our system generate admirable and accurate results in regards to being able to detect these football transfers as events. In order to visualise the information we gather, we design a web application that give users structured access to our findings.