I am reading a file (300,000 lines) in node.js. I want to send rows in batches of 5000 rows to another application (Elasticsearch) to store them. Therefore, whenever I finish reading 5000 lines, I want to send them in bulk to Elasticsearch via the API to store them, and then continue reading the rest of the file and send each 5000 lines in bulk.
If I want to use java (or any other locking language such as C, C ++, python, etc.) for this task, I will do something like this:
int countLines = 0; String bulkString = ""; BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("filePath.txt"))); while ((currentLine = br.readLine()) != null) { countLines++; bulkString += currentLine; if(countLines >= 5000){
If I want to do the same with node.js, I will do:
var countLines = 0; var bulkString = ""; var instream = fs.createReadStream('filePath.txt'); var rl = readline.createInterface(instream, outstream); rl.on('line', function(line) { if(countLines >= 5000){
The problem with node.js is that it does not block, so it does not wait for the response of the first API before sending the next batch of lines. I know this can be considered an advantage for done.js because it does not wait for I / O, but the problem is that it sends too much data to Elasticsearch. Therefore, the Elasticsearch queue will be full, and it will exclude.
My question is how can I get node.js to wait for a response from the API before it continues to read the next lines or before it sends the next batch of lines to Elasticsearch.
I know that I can set some parameters in Elasticsearch to increase the size of the queue, but I'm interested in blocking node.js behavior for this problem. I am familiar with the concept of callbacks, but I cannot think of a way to use callbacks in this scenario to prevent node.js from calling the Elasticsearch API in non-blocking mode.