https://lucene.apache.org/solr/guide/6_6/collections-api.html
https://wiki.apache.org/solr/SolrTerminology
Solr instance is not running in solrCloud mode.
https://lucene.apache.org/solr/guide/6_6/getting-started-with-solrcloud.html
/opt/solr-7.1.0/bin/solr -e cloud
Command above to start SolrCloud
Have to use this parameters
sudo /opt/solr-7.1.0/bin/solr -e cloud -force
solr server on port 8985 pid 26517 and solr server on port 7576 pid 26726
collection name coltest
(collection is the terminology for index)
.... may need to close some of the solr instances (as some collections might be shared/blocked) as multiple istances is trying to access the same data
I may have to add a id field to each JSON document in Solr... May not need a unique key/id??? https://wiki.apache.org/solr/UniqueKey
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8985/solr/coltest/update/json/docs' --data-binary '
{
"id": "1",
"title": "Doc 1"
}'
command above worked with a Qtime of 702 milisecconds
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_collection/update/json/docs' --data-binary '
{
"title": "Doc bo ID"
}'
Seccond command also worked with no id field and with a mispelled bo (should be no).... Qtime 9 miliseconds
If I choose not to have a ID field this should be stated... some time may be spared... and Solr might have less proccssing to perform.
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8985/solr/coltest/update' --data-binary '
[
{"title": "test 15"},
{"title": "Doc 20"}
]'
command above added two documents sucessfully
curl -X POST -H 'Content-Type: application/json' 'http://localhost:8985/solr/coltest/update' --data-binary '
[
{"title": "test 150"},
{"title": "Doc 200"},
]'
comma after last line also worked...
I DO NOT NEED ' character in the JSON dataset file
I do need the character [ and ] at the beginning and end of the file.
I went to localhost:8985 and the collection tab to add a new collection named wizardnewcol (have to remember to set default config etc). I also was able to upload the command above to this collection. Can use this approoch when creating the collections/indexes for my dataset experiments.
created collection named dummy
curl -XPOST 'http://localhost:8985/solr/dummy/update?commit=true' --data-binary @/home/search/Downloads/cleansolr.json -H 'Content-type:application/json'
The command worked with a Qtime of 420 milisecconds.
For index size I can go and search for the filename in ubuntu... e.g. /cloud/node1/solr/wizardnewcol_shard1_replica_n1 then right click on properties and note the size... may also provide the size of subfolder index (this maybe more appropriate number)? I can also get a size on localhost:8985 and choosing the dummy shard. Can use all 3 numbers for the index size.
cat /home/search/Downloads/blanLinesDS.json | sed -e 's/:/ /g' -e 's/]/ /g' -e 's/{/ /g' -e 's/}/ /g' -e 's/,/ /g' -e 's/\[/ /g' | sed -e "s/\"id\"//g" -e "s/\"first\_name\"//g" -e "s/"last\_name"//g" -e "s/\"email\"//g" -e "s/\"gender\"//g" -e "s/\"ip\_address\"//g" | sed 's/"//g' | awk 'NF > 0' | sed 's/ \+/ /g' | sed -e 's/^/{"content":"/' | awk 'NF{print $0 " \"},"}' | awk 'BEGIN{print "[";}{print;}' |head -n -1 | awk 'END { print "]";}{print;}'
This command removes the common characters, removes dataset spesific fields, removes character ", removes empty lines adds {"content": at the start of each line and adds \"}, at the end of each line. At the start of the file it adds [ and at the end of the file it adds ].
I was able to upload this in the dummy collection with the command seen further up. It seems that it automatically adds a index for all of the uploaded documents.
I was also able to search using the search query https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#the-standard-query-parser
curl -XGET http://localhost:8985/solr/dummy/select?q=content:keen
curl -XGET http://localhost:8985/solr/dummy/select?q=content:Conroy Tunsley
both commands successfully obtained the exact match in the collection dummy. Seems to be exact search as the string con retrives 0 results. Again I get the search time with Qtime