Observations, what I think would have happen, argumentations
Version 12 - Updated on 29 Nov 2017 at 5:04PM by Joachim Hansen
Description
(unexpected) I thought that because of solr had a simpler JSON import format, that that would make solr perform better as the JSON parser would have to process less information. Also solr wrote less to the command line and I would assume that this would be more reflected in the performance of the time command ... but on the contrary
(observation) There is a bigger difference between took and the time command in elasticsearch then solr with QTime and time command
Some of the time elapsed in the time command results might be accounted to multiple supporting threads or forked suporting procces (workers)... bit I largly think that it is about right as I also followed the starting time and ending time on my pc clock. (could check this again more closely for the new datasets to be indexed... but the number of workers might not be fixed...)
(I expected minimal differerances)(observations) big differerence between index sizes in solr and elasticsearch and Qtime and took and time command values. Does I compare like for like with respect to index sizes? Some of this might be due to suboptimal configurations in solr. But my configurations are largly if not exclusivly out of the box default values... then with that in mind Elasticsearch is better at indexing... maybe I should have changed the curl command in a way to only commit on the last import round??? Maybe elasticsearch takes better advantage of caching or parlalazation during indexing???....
(Expextation) the extra overhead in indexing in solr index size should result in better search speed. Or otherwise this extra size would be wasted... why else store so much information?
() What are the performance gains from more seldom commit in solr?
Observation: solr performed better than Elasticsearch on smaller datasets. This is suprising (still 2 commits was performed... so maybe commits are more expensive over larger documents... or the number of commits makes solr perform worse than Elasticsearch)
Does solr large index size due to its handling many small documents being bad? hmm
The index size is not fixed on the same input vector in neither Solr or elasticsearch. Future research might look into why the same input vector can result in differnt index sizes.
There are misses in Solr. Big Difference in the number of search hits. Some of them are comparable.
Elastic seems to be generally faster with cold searches , but Solr is better at seccond runs.
What is the case for solr and Elastic? Does some of them automatically convert.... What is the default... if the default of one is case sensitive and the other is case insesetive that could partly explain the big difference. Still Solr misses some strings it should have found... (NOOOOOPE search string "shoppin" matched both Shoppin and shoppin with Solr search.... could show this in appendix...)
Anouther possible explenation is that Solr is more strict in enforcing the order of the terms and that all terms have to be present then Elasticsearch.
It can also be that Elasticsearch uses more resources into finding all possible matches... and Solr makes a tradeoff. (Combination of all 3 seems possible as Solr misses some strings with the same case as the search string)
Maby the big difference in memory consumption between solr and Elasticsearh is can be partly contibuted to that the memory Solr uses is spread accross multiple solr procceses and Elasticsearch is mainly one big process with many threads .... hmm but Solr threads takes more memory than Elastic threads...???