Implementation of a long general survey

I am trying to implement a simple long-term survey service for use in my own projects and may release it as SAAS if I succeed. These are the two approaches I've tried so far using Node.js (PostgreSQL poll in the opposite direction).

1. Periodically check all clients for one interval

Each new connection is placed in the connection queue through which the interval passes.

var queue = []; function acceptConnection(req, res) { res.setTimeout(5000); queue.push({ req: req, res: res }); } function checkAll() { queue.forEach(function(client) { // respond if there is something new for the client }); } // this could be replaced with a timeout after all the clients are served setInterval(checkAll, 500); 

2. Check each client for a separate interval

Each client receives their own ticker , which checks for new data.

 function acceptConnection(req, res) { // something which periodically checks data for the client // and responds if there is anything new new Ticker(req, res); } 

While this keeps the minimum latency for each client lower, it also introduces overhead by setting a lot of timeouts.

Conclusion

Both of these approaches solve the problem quite easily, but I don’t feel that it scales easily to 10 million open connections, especially since I check the database for each check for each client.

I thought about it without a database and immediately broadcast new messages for all open connections, but this will not work if the client connection dies in a few seconds during the broadcast, because it is not permanent. This means that I basically should be able to search for messages in the story when the client first polls.

I think one step here was to have a data source in which I can subscribe to new data entering the system (CouchDB change notifications?), But maybe I missed something in the big picture here?

What is the usual approach for a large-scale long survey? I am not attached to Node.js, I would prefer any other sentence with an argument why.

+4
source share
2 answers

Not sure if this answers your question, but I like the PushPin approach (+ explanation of concepts ).

I like the idea (using a reverse proxy and communicating with return codes + deferred REST return requests), but I have caveats regarding implementation. Perhaps I underestimate the problem, but it seems to me that the technologies used are a bit overloaded. Not sure if I will use it or not, will prefer an easier solution, but I find the concept phenomenal.

I would like to hear what you ultimately used.

0
source

Since you mentioned scalability, I need a little theoretical, since the only practical measure is load testing. Therefore, all I can offer is advice.

Generally speaking, once for something is bad for scalability. Especially once per connection or once per request, as this makes part of your application proportional to the amount of traffic. Node.js removed the thread dependency per connection with a single-threaded asynchronous I / O model. Of course, you cannot completely exclude something for each connection, for example, a request and response object and socket.

I suggest avoiding anything that opens a database connection for every HTTP connection. This requires connection pools.

As for the choice between your two options above, I would personally go for the second choice, because he isolated each connection. The first option uses a loop for connections, which means the actual runtime for each connection. This is probably not that important, since I / O is asynchronous, but given the choice between iterating over a connection and the simple existence of an object per connection, I would rather just have an object. Then I have nothing to worry about when 10,000 connections suddenly happen.

The C10K issue seems like a good reference for this, although it is really a personal judgment to be honest.

http://www.kegel.com/c10k.html

http://en.wikipedia.org/wiki/C10k_problem

-1
source

All Articles