Elasticsearch misc

Version 38 - Updated on 02 Nov 2017 at 12:36AM by Joachim Hansen

Description

POST localhost:9200/accounts/person/1 
{
    "name" : "John",
    "lastname" : "Doe",
    "job_description" : "Systems administrator and Linux specialit"
}


To add a document to elastic server at address:localhost, default port:9200/index:accounts/document type:person/ID:1 (if index, documet type and document ID do not exist the system will create it automaticallty) 

Document content in JSON format and in block as seen above (the document content are stored in the _source variable

GET localhost:9200/accounts/person/_search?q=job_description:linux

The command above attempts to retrive information from elastic server at address:localhost : default port:9200/index:accounts/document type:person/seach query:_search?q=search field:job_description:search string:linux

The search above is single index and single type, but ElasticSearch also support multi index and multi type search. 

POST localhost:9200/shakespeare/scene/_search/
{
    "query":{
        "match" : {
            "play_name" : "Antony"
        }
    }
}

Depending on the operation the search can use the POST or GET command. The above set of commands also shows an alternative way to formulate a search where we expect a match for search string Anthony in search field play_name

POST localhost:9200/shakespeare/scene/_search/
{
    "query":{
     "bool": {
         "must" : [
             {
                 "match" : {
                     "play_name" : "Antony"
                 }
             },
             {
                 "match" : {
                     "speaker" : "Demetrius"
                 }
             }
         ]
     }
    }
}


The search above retrives documents that either includes the two search strings in the specified fields.


  • Elasticsearch can be started and stopped as follows:

    sudo systemctl start elasticsearch.service
    sudo systemctl stop elasticsearch.service

Can search all fields for a given string (would be a way to perform fulltext search)



usefull link containing example for atdding JSON entries in bulk in elasticsearch

https://stackoverflow.com/questions/23798433/json-bulk-import-to-elasticstearch


Can delete an entire index like this: DELETE localhost:9200/accounts

installed PHP curl sudo apt-get install php-curl

that will be used to interface with elasticsearch?

 

After installing curl this command worked

curl -XPUT 'localhost:9200/twitter/tweet/1?pretty' -H 'Content-Type: application/json' -d'
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
'

The content type seems to be important to include as well (in the command)

These commands did not work (atlest before installing curl):

}

POST http://localhost:9200/accounts/person/1{"name" : "Joachim","lastname" : "Hansen","job_description" : "test 1 ape"}

POST localhost:9200/accounts/person/1 -H 'Content-Type: application/json' -d'{"name" : "Joachim","lastname" : "Hansen","job_description" : "test 1 ape"}

POST http://localhost:9200/accounts/person/1 -H 'Content-Type: application/json' -d'{"name" : "Joachim","lastname" : "Hansen","job_description" : "test 1 ape"}


Command:

GET localhost:9200/twitter 

Does not work... But the command

GET http://localhost:9200/twitter 

does work

Because this format uses literal \n's as delimiters, please be surethat the JSON actions and sources are not pretty printed. Here is anexample of a correct sequence of bulk commands(REGARDING BULK ) - https://www.elastic.co/guide/en/elasticsearch/reference/5.5/docs-bulk.html

Have to make sure that the bulk process can proces the JSON datasets correctly. 


You don't need to specify your JSON objects inside an array (i.e. [...]) and no commas between documents, just one JSON per line with newline characters at the end of each line (don't forget a newline after the last line). I've updated my answer with your latest code. – Val Oct 26 '15 at 8:22


- https://stackoverflow.com/questions/33340153/elasticsearch-bulk-index-json-data

http://queirozf.com/entries/elasticsearch-bulk-inserting-examples

Actions (index, delete, create and update) and the documents seems to be on their own line when adding to bulk. 

POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

 

Example 2 (the solution I will use as all the datasets gets their own induvdual indexes and corresponding document type):

POST http://path.to.your.cluster/myIndex/person/_bulk
{ "index":{} }
{ "name":"john doe","age":25 }
{ "index":{} }
{ "name":"mary smith","age":32 }

So a solution could be 

  1. Ensure that all documents are on their separate lines (one for each line)
  2. Have a script that adds [this will add the document to the same type and index, probably auto generate a ID for them] { "index":{} } between each document (one before the first)
  3. Make sure the to have a empty line at the end of the file (like in SVN)

Will having a ID field in the document work or not? I have one dummy set with and one without


cat /home/search/Downloads/mockSed.json | sed ': loop; a insert  n; b loop'

cat /home/search/Downloads/mockSed.json | sed ': loop; a { "index":{} } n; b loop' > /home/search/Downloads/mockSed2.json


Almost a viable solution for inserting all lines (does between all lines, adds a newline at the end of the file? but not in the start of the file?)

cat /home/search/Downloads/mockSed.json | awk ' {print;} NR % 1 == 0 { print ""; }'

Almost there (wrong with first and last line)... escape "" characters so that it will see it as a string literal

cat /home/search/Downloads/mockSed.json | awk ' {print;} NR % 1 == 0 { print "{ \"index\":{} }"; }' 

The command below does also append {"index":{} } at the start of the file

cat /home/search/Downloads/mockSed.json | awk ' {print;} NR % 1 == 0 { print "{ \"index\":{} }"; }' | awk 'BEGIN{print "{ \"index\":{} }";}{print;}'


head -n -1 - removes last line


The character ] is in the next to last line in my test file... dont know if that will be ok or not


The command below does open mock file, make awk add { "index":{} } between each line, make awk append { "index":{} }  at the start of the file, make head remove last { "index":{} } and make awk add a single empty line at the end of the file. (I can simply add "\n" in print to add multiple empty lines ... but I think that I only need one empty line for Elastic bulk import to work). 

cat /home/search/Downloads/mockSed.json | awk ' {print;} NR % 1 == 0 { print "{ \"index\":{} }"; }' | awk 'BEGIN{print "{ \"index\":{} }";}{print;}' |head -n -1 | awk 'END { print "";}{print;}'


https://stackoverflow.com/questions/15856733/what-is-the-easiest-way-to-remove-1st-and-last-line-from-file-with-awk

https://stackoverflow.com/questions/1646633/how-to-detect-eof-in-awk

https://www.unix.com/unix-for-dummies-questions-and-answers/143507-add-new-line-top-file.html



curl -s -H "Content Type: application/x-ndjson" -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary /home/search/Downloads/MOCK_DATA.json


curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary /home/search/Downloads/MOCK_DATA.json

I might need to avoid having , (commas) at the end of a line

perl -00pe 's/,(?!.*,)//s' file

https://unix.stackexchange.com/questions/162377/sed-remove-the-very-last-occurrence-of-a-string-a-comma-in-a-file

sed -'s/.$//' filename

it matches any last character for a line and replaces that....,

sed -'s/,$//' filename

Replaces the last character if it is a comma?

https://unix.stackexchange.com/questions/220576/how-to-remove-last-comma-of-each-line-on-csv-using-linux

The command below is able to replace the comma at the end of each line (without removing the other commas)

sed 's/,$//' file.txt  

file.txt can be for example /home/search/Downloads/mockBulk2.json


$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary @/home/search/Downloads/mockBulk3.json
{"took":162,"errors":false,

The command above had no errors ( {"took":162,"errors":false ... (omitted))

The command needed 

  1. -s -H "Content-Type: application/x-ndjson"
  2. --data-binary
  3. @filepath 
  4. _bulk for autogen of indexes?
  5. One line for the action { "index":{} } followed by a separate line for the document
  6. start the file with action { "index":{} }
  7. Remove { "index":{} } for the last line
  8. Have no comma at the end of each line 
  9. I had removed the square brackets '[' and ']' (starting and end square brackets)


Example of mockBulk3.json

{ "index":{} }
{"id":1,"first_name":"Conroy","last_name":"Tunsley","email":"ctunsley0@github.com","gender":"Male","ip_address":"121.148.102.183"}
{ "index":{} }


GET localhost:9200/shakespeare/_search
{
    "query": {
            "match_all": {}
    }
}

Match all query


curl -XGET localhost:9200/index_local/_search { "query": {"match_all": {} }}

This command retrives all documents/matches all documents in the index? and shows a summary of 10 of the hits. I do get warnings in curl, but I dont think that matter... curl dont know what quary is but the request makes sense for elasticsearch


With the command below I use the  /?pretty=true to get more readable output

curl -XGET localhost:9200/index_local/?pretty=true

Below is a example of the match all quary in pretty output

ElasticSearch also shows the time by took:1????????????? 

curl -XGET localhost:9200/index_local/_search?pretty=true { "query": {"match_all": {} }}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1002,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index_local",
        "_type" : "my_doc_type",
        "_id" : "AV95vm0GFyZDNdvXesno",
        "_score" : 1.0,
        "_source" : {
          "id" : 4,
          "first_name" : "Eli",
          "last_name" : "Yerrall",
          "email" : "eyerrall3@rakuten.co.jp",
          "gender" : "Male",
          "ip_address" : "230.194.131.113"
        }