TODO 05-11-2017

Version 2 - Updated on 06 Nov 2017 at 1:54AM by Joachim Hansen

Description

  • Find out how to formulate the search in each search engine (exact and fulltext (sentance/phrase/sequence of symbols)
  • Find out how to get index size and index creation time (last should be provided by the application ...)
  • Make scripts to import/index to solr and sphinx in bulk and try with dummy dataset
  • Preproccess candidate datasets and import them, messure performance during import, capture index creation time and index size ( I have already a table to fill in index creation time and index size)
  • Find exact words and sentences that I will search for in each dataset 
  • Find out what I should search for that should not be there md5 of my name or the thesis title 
  • Could I start monitoring with TOP and then execute all the searches and after all the searches is done on one search engine or one search engine and then terminate TOP. Or should I terminate TOP after each search? The latter will result in seaches * 2 number of graphs (one cold cache and one warmer) given that I only use the same search 2 times to warm up the cache.