sudo systemctl start elasticsearch.service sudo systemctl stop elasticsearch.service
http://www.theasciicode.com.ar/ascii-control-characters/unit-separator-ascii-code-31.html (shows list of non printable ascii characters)
can also remove non printable characters like this https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
Can remove non ascii characters https://stackoverflow.com/questions/3337936/remove-non-ascii-characters-from-csv
#!/bin/bash
# GNU bash, version 4.3.46
#go though all the files (no directories)... I dont really care about the order of the batches
for f in $(ls -p | grep -v '/');
do
echo "$1$f";
done
#!/bin/bash
# GNU bash, version 4.3.46
# Should be called like time ./bashname.sh indexName/outputDolderName
#go though all the files (no directories)... I dont really care about the order of the batches
#This .sh file have to be run in the same directory as the batch files to be indexed.
#using pwd to get current working directory
# using command line arguments to get the index name
currDir=$(pwd);
indexName=$1;
echo "index name=$indexName"
for f in $(ls -p | grep -v '/');
do
echo "index $currDir/$f";
done
remove non ascii characters, remove non printable ascii characters (except new line) and then esacape / with //
cat /home/search/Downloads/Datasets/dumper2NoLinesLessthen21Char2 | perl -pe 's/[^[:ascii:]]//g;' | tr -cd '\12\40-\176' | sed -e 's/\\/\\\\/g' >> /home/search/Downloads/dumpertrLineFeedOnly
this fixed all of the remaining json parser errors atlest for the 1st batch (think it solved it for all batches)