System for Distributed Text Mining
MetadataShow full item record
Text mining presents us with new possibilities for the use of collections of documents.There exists a large amount of hidden implicit information inside these collection, which text mining techniques may help us to uncover. Unfortunately, these techniques generally requires large amounts of computational power. This is addressed by the introduction of distributed systems and methods for distributed processing, such as Hadoop and MapReduce.This thesis aims to describe, design, implement and evaluate a prototypical systemfor distributed text mining in a MapReduce/Hadoop environment, called TextMiner.