Script for elasticsearch preprocessing:
cat /home/search/Downloads/blanLinesDS.json | awk 'NF > 0' | sed 's/"//g' | sed -e 's/^/{"content":"/' | awk 'NF{print $0 " \"}"}' | awk ' {print;} NR % 1 == 0 { print "{ \"index\":{} }"; }' | awk 'BEGIN{print "{ \"index\":{} }";}{print;}' |head -n -1 | awk 'END { print "";}{print;}'
The script does the following actions in a given order from 1 to 8
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/index_test3/working/_bulk?pretty --data-binary @/home/search/Downloads/pretest.json >> /home/search/Downloads/indexoutput2.txt
I was able to import the preproccesed dataset. The testing dataset contained the characters [ ] { } : , within quatation marks (that worked fine)
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/index_test4/working/_bulk?pretty --data-binary @/home/search/Downloads/pretest.json | head -20
{
"took" : 91,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "index_test4",
"_type" : "working",
"_id" : "AV-NVqBEFyZDNdvXeuVa",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true,
"status" : 201
}
},
I added the | head -20 to print the 20 first line of the output of the response from the index proccess
As I can see it took 91 milliseconds