Observations, what I think would have happen, argumentations
Version 2 - Updated on 22 Nov 2017 at 12:08AM by Joachim Hansen
Description
(unexpected) I thought that because of solr had a simpler JSON import format, that that would make solr perform better as the JSON parser would have to process less information. Also solr wrote less to the command line and I would assume that this would be more reflected in the performance of the time command ... but on the contrary
(observation) There is a bigger difference between took and the time command in elasticsearch then solr with QTime and time command
Some of the time elapsed in the time command results might be accounted to multiple supporting threads or forked suporting procces (workers)... bit I largly think that it is about right as I also followed the starting time and ending time on my pc clock. (could check this again more closely for the new datasets to be indexed... but the number of workers might not be fixed...)
(I expected minimal differerances)(observations) big differerence between index sizes in solr and elasticsearch and Qtime and took and time command values. Does I compare like for like with respect to index sizes? Some of this might be due to suboptimal configurations in solr. But my configurations are largly if not exclusivly out of the box default values... then with that in mind Elasticsearch is better at indexing... maybe I should have changed the curl command in a way to only commit on the last import round??? Maybe elasticsearch takes better advantage of caching or parlalazation during indexing???....
(Expextation) the extra overhead in indexing in solr index size should result in better search speed. Or otherwise this extra size would be wasted... why else store so much information?
() What are the performance gains from more seldom commit in solr?