How to read file lines with node.js or javascript with a delay rather than non-blocking behavior?

Question

How to read file lines with node.js or javascript with a delay rather than non-blocking behavior?

I am reading a file (300,000 lines) in node.js. I want to send rows in batches of 5000 rows to another application (Elasticsearch) to store them. Therefore, whenever I finish reading 5000 lines, I want to send them in bulk to Elasticsearch via the API to store them, and then continue reading the rest of the file and send each 5000 lines in bulk.

If I want to use java (or any other locking language such as C, C ++, python, etc.) for this task, I will do something like this:

int countLines = 0; String bulkString = ""; BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("filePath.txt"))); while ((currentLine = br.readLine()) != null) { countLines++; bulkString += currentLine; if(countLines >= 5000){ //send bulkString to Elasticsearch via APIs countLines = 0; bulkString = ""; } }

If I want to do the same with node.js, I will do:

 var countLines = 0; var bulkString = ""; var instream = fs.createReadStream('filePath.txt'); var rl = readline.createInterface(instream, outstream); rl.on('line', function(line) { if(countLines >= 5000){ //send bulkString to via APIs client.bulk({ index: 'indexName', type: 'type', body: [bulkString] }, function (error, response) { //task is done }); countLines = 0; bulkString = ""; } }

The problem with node.js is that it does not block, so it does not wait for the response of the first API before sending the next batch of lines. I know this can be considered an advantage for done.js because it does not wait for I / O, but the problem is that it sends too much data to Elasticsearch. Therefore, the Elasticsearch queue will be full, and it will exclude.

My question is how can I get node.js to wait for a response from the API before it continues to read the next lines or before it sends the next batch of lines to Elasticsearch.

I know that I can set some parameters in Elasticsearch to increase the size of the queue, but I'm interested in blocking node.js behavior for this problem. I am familiar with the concept of callbacks, but I cannot think of a way to use callbacks in this scenario to prevent node.js from calling the Elasticsearch API in non-blocking mode.

+5

javascript node.js nonblocking batch-processing elasticsearch

Soheil Jun 2 '15 at 19:34

source share

2 answers

use rl.pause() right after your if and rl.resume() after //task is done .

Please note that after a call break you may have several more line events.

+2

Pierre inglebert Jun 2 '15 at 19:41

source share

Soheil · Accepted Answer · 2015-06-02T20:52:54+0000

Pierre's answer is correct. I just want to introduce code that shows how we can benefit from the non-blocking concept of node.js, but at the same time, don't overload Elasticsearch with too many requests at a time.

Here is the pseudo code that you can use to provide code flexibility by setting a queue size limit:

 var countLines = 0; var bulkString = ""; var queueSize = 3;//maximum of 3 requests will be sent to the Elasticsearch server var batchesAlreadyInQueue = 0; var instream = fs.createReadStream('filePath.txt'); var rl = readline.createInterface(instream, outstream); rl.on('line', function(line) { if(countLines >= 5000){ //send bulkString to via APIs client.bulk({ index: 'indexName', type: 'type', body: [bulkString] }, function (error, response) { //task is done batchesAlreadyInQueue--;//we will decrease a number of requests that are already sent to the Elasticsearch when we hear back from one of the requests rl.resume(); }); if(batchesAlreadyInQueue >= queueSize){ rl.pause(); } countLines = 0; bulkString = ""; } }

How to read file lines with node.js or javascript with a delay rather than non-blocking behavior?

More articles: