Challanges importing

Version 6 - Updated on 31 Oct 2017 at 11:33PM by Joachim Hansen

Description

It seems that both Elasticsearch and solr can import JSON files. I have figures out how to add JSON entries in bulk in Elasticsearch. Next I should figure out if this is possible in Solr. If so that would mae my life easier. Sphinx on the other hand do not seem to have support for indexing JSON files. But have ways to index SQL formats, csv files and XML. I could use a online/offline parser to convert from JSON format to one of the formats supported by Sphinx. But does the tool parse correctly?, does it have the ID'S (which have to be unique in Sphinx in the right place in the generated files. And a challenge with XML format is that it have to be 'well formed' in order to be indexable. 


.csv or a SQL format may be the easiet to index on Solr. Probably best with SQL!

http://sphinxsearch.com/docs/current.html#confgroup-source

It may be no quick fix on Sphinx, so I think that I will try to perform the experiments on Elasticsearch and Solr. And then figure out how to get the data to Solr.


One crude solution could be to create a code/script that uses the ID'S in JSON entries as delimiters and than foreach ID it takes the content in one variable/column in .csv or SQL and have anouther column (first column with a unique numerical ID that is just incremented)... To make sure this scheme works I can try uploading it in phpmyadmin

Incremented IDJSON content/JSON entry
1,...,.,.,.,.,.,.,,.,,,.,,.,.,.,,.,.,.,.,.,.,.,.,.,.,.,.,
2,.,.,,.,.,.,.,.,,.,,.,,.,..

 

DOES ALL THE files in JSON format and all the JSON entries have an ID? This is important if I am going to use it as a delimiter.