Is there a way to import a json file (containing 100 documents) into an elasticsearch. Server?

Is there a way to import a JSON file (containing 100 documents) into an elasticsearch server? I want to import a large json file into es-server.

+32
json artificial-intelligence elasticsearch bigdata elasticsearch-plugin
Dec 17 '13 at 23:32
source share
9 answers

You should use the Bulk API . Note that you will need to add a title bar before each json document.

$ cat requests { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } $ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo {"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1,"ok":true}}]} 
+20
Dec 18 '13 at 6:19 06:19
source share

As mentioned above, the main API is probably the way to go. To convert a file for a mass protocol, you can use jq .

Assuming the file contains only the documents themselves:

 $ echo '{"foo":"bar"}{"baz":"qux"}' | jq -c ' { index: { _index: "myindex", _type: "mytype" } }, . ' {"index":{"_index":"myindex","_type":"mytype"}} {"foo":"bar"} {"index":{"_index":"myindex","_type":"mytype"}} {"baz":"qux"} 

And if the file contains documents in the top-level list, they must be unpacked first:

 $ echo '[{"foo":"bar"},{"baz":"qux"}]' | jq -c ' .[] | { index: { _index: "myindex", _type: "mytype" } }, . ' {"index":{"_index":"myindex","_type":"mytype"}} {"foo":"bar"} {"index":{"_index":"myindex","_type":"mytype"}} {"baz":"qux"} 
Flag

jq -c ensures that each document is on its own line.

If you want to directly scroll the pipe, you need to use --data-binary @- , and not just -d , otherwise the curl will again separate newline characters.

+39
Jun 17 '15 at 18:01
source share

I'm sure someone wants this so that I can easily find.

Fyi. Used by Node.js (essentially a script package) on the same server as the new ES instance. I started it on 2 files with 4000 elements each, and it took about 12 seconds on my shared virtual server. Ymmv

 var elasticsearch = require('elasticsearch'), fs = require('fs'), pubs = JSON.parse(fs.readFileSync(__dirname + '/pubs.json')), // name of my first file to parse forms = JSON.parse(fs.readFileSync(__dirname + '/forms.json')); // and the second set var client = new elasticsearch.Client({ // default is fine for me, change as you see fit host: 'localhost:9200', log: 'trace' }); for (var i = 0; i < pubs.length; i++ ) { client.create({ index: "epubs", // name your index type: "pub", // describe the data thats getting created id: i, // increment ID every iteration - I already sorted mine but not a requirement body: pubs[i] // *** THIS ASSUMES YOUR DATA FILE IS FORMATTED LIKE SO: [{prop: val, prop2: val2}, {prop:...}, {prop:...}] - I converted mine from a CSV so pubs[i] is the current object {prop:..., prop2:...} }, function(error, response) { if (error) { console.error(error); return; } else { console.log(response); // I don't recommend this but I like having my console flooded with stuff. It looks cool. Like I'm compiling a kernel really fast. } }); } for (var a = 0; a < forms.length; a++ ) { // Same stuff here, just slight changes in type and variables client.create({ index: "epubs", type: "form", id: a, body: forms[a] }, function(error, response) { if (error) { console.error(error); return; } else { console.log(response); } }); } 

Hope I can help more than just that. Not rocket science, but it can save someone 10 minutes.

Greetings

+11
Aug 09 '14 at 12:02 on
source share

jq is a lightweight and flexible command line JSON processor.

Using:

cat file.json | jq -c '.[] | {"index": {"_index": "bookmarks", "_type": "bookmark", "_id": .id}}, .' | curl -XPOST localhost:9200/_bulk --data-binary @-

They occupied file.json and sent its contents to jq with the -c flag to create compact output. Heres the nugget: took advantage of the fact that jq can create not only one, but also several objects per input line. For each line, Elasticsearch JSON controls were created (with an identifier from our source object) and a second line was created, which is only our original JSON object (.).

At this point, our JSON formatted the way we use the Elasticsearchs APIs, so we just connect it to the curls that the POST files correspond to Elasticsearch!

Credit goes to Kevin Marsh

+10
Jul 05 '16 at 10:12
source share

There is no import, but you can index documents using the ES API.

You can use the api index to load each line (using some code to read the file and create twisted calls) or the api index array to load all of them. Assuming your data file can be formatted to work with it.

Read more here: ES API

A simple shell script would do the trick if you were comfortable with the shell something like this (not tested):

 while read line do curl -XPOST 'http://localhost:9200/<indexname>/<typeofdoc>/' -d "$line" done <myfile.json 

Personally, I would probably use Python either pyes or an elastic search client.

pyes on github
elastic python client for search

Stream2es is also very useful for quickly loading data into es and may have a way to simply transfer the file. (I did not test the file, but used it to download the wikipedia document for testing es perf)

+8
Dec 18 '13 at 1:14
source share

Stream2es is the easiest IMO way.

eg. assuming a file "some.json" containing a list of JSON documents, one per line:

 curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es cat some.json | ./stream2es stdin --target "http://localhost:9200/my_index/my_type 
+5
May 13 '15 at 6:25
source share

You can use esbulk , a quick and simple pointer to volume:

 $ esbulk -index myindex file.ldj 

Here asciicast shows that it loads Project Gutenberg data into Elasticsearch in about 11 seconds.

Disclaimer: I am the author.

+4
Dec 02 '15 at 23:50
source share

you can use Elasticsearch Gatherer Plugin

The collector plugin for Elasticsearch is the foundation for scalable data collection and indexing. Content adapters are implemented in the archives of the zip archives of the collectors, which are a special kind of plug-ins distributed through Elasticsearch nodes. They can receive job requests and execute them in local queues. Operating states are stored in a special index.

This plugin is under development.

Milestone 1 - Deploying Site Collector Zip Codes

Milestone 2 - specification and execution

Milestone 3 - Migrating the JDBC River to the JDBC Collector

Milestone 4 - distribution of collector vacancies by load / queue length / node name, cron jobs

Milestone 5 - More Collectors, More Content Adapters

link https://github.com/jprante/elasticsearch-gatherer

+3
Dec 28 '15 at 9:12
source share

One way is to create a bash script that does bulk insertion:

 curl -XPOST http://127.0.0.1:9200/myindexname/type/_bulk?pretty=true --data-binary @myjsonfile.json 

After starting the insert, run this command to get the counter:

 curl http://127.0.0.1:9200/myindexname/type/_count 
0
Mar 03 '16 at 21:42
source share



All Articles