Søkeapplikasjon i skyene

This thesis has focused on how to process and store big data in thecloud, with a special focus on challenges on creating an informationretrieval system and how distributed information retrieval methods canbe used in the cloud. After evaluating three cloud platforms, WindowsAzure was chosen because it gave more hardware resources in the freetrial than the others, and due to the fact that it had an emulator thatcould be used to set up the system locally before testing it on the cloud.The search engine should also be chosen, but since Windows Azurewas the preferred platform, the search engine choices was limited tothose that were created in the .NET languages. I ended up withLucene.NET because it is a powerful search tool. In addition, Lucene.NETis open source.The evaluation was done on a distributed information retrieval sys-tem that had a server-client set up, and used partial indexes that wasdistributed out to the clients. The evaluation was done with a smalldata set to nd optimization problems that has to be attended whencreating a distributed system that handles large amounts of data. Icarried out four evaluations on four dierent clients.The results revealed optimization problems that was special for thecloud, and that has to be attended when creating a distributed systemthat has to process and store big data in the cloud. Also, since scalingsystems in the cloud is easier, the recommendation was that scaling ofthe clients should be dependent on how much Azure Cache is left onthe clients due to a optimization problem that has to do with the searchspeed of the search engine.With some more tweaking and solving these optimization problems,the Cloud should provide an advantageous place to process and storebig data.

Utgiver

Institutt for datateknikk og informasjonsvitenskap