Extracting and Exploring Examples from semi-structured Text
Abstract
Active Learning has been a highly promoted form of learning the last decades. One of the exercises Active Learning makes use of is task and problem solving. Utilizing examples solving similar problems, can be very helpful when performing these exercises. Therefore this project will create a searchable database of examples, to help users finding relevant examples which can aid them in exercises involving solving tasks and problems. We will use Wikipedia as a source of examples. The system will extract examples found in an XML dump of Wikipedia, transform them into example objects, which will be inserted into a database. A user interface will be created for searching the database and displaying the examples for the user. A software pipeline will be used as the system's main architectural pattern. The independent processes in a software pipeline is very beneficial in the time consuming task of parsing the XML dump of the entire Wikipedia.
The work of the thesis resulted in a system that can parse the XML dump of Wikipedia and create a database of examples. A search interface lets the user enter keywords and displays the returned examples. The system is able to find relevant examples to a satisfactory degree. Since the final implemented system acts as a minimum viable product, a number of propositions for future improvements are also included in the thesis.