Design Guidelines Distributed Computing

Question

Design Guidelines Distributed Computing

I have a software system that runs OCR on multiple machines at the same time. The current system works as follows:

All documents that need to be written are inserted into the table in db.
Each client computer starts this table and whenever data is found for ocr, it locks the table and selects n no. files for ocr. Lock is used for atomicity.
After the completion of each document, the status of the document is updated as completed.

I know that these are serious errors in order to set the database as the place of synchronization. It works fine, but sometimes I see a dead database lock.

So my question is: what is the best way to develop such a system, I want the database as a storage not to be a synchronization place. I want to hear your thoughts.

+6

design c # .net system

crypted Sep 29 '10 at 6:13

source share

2 answers

Using database polling for ocr files, it is better to use the Windows messaging service. What to do if the database does not work and your ocr service is started, the ocr service does not start until the database service is started, using the Windows message queue you can get information for the ocr file from the messaging service (online or offline), so that the ocr service will automatically start after the machine has started, and there will be no problem with locking in the database.

+2

Syntax Sep 29 '10 at 6:38

source share

Jon skeet · Accepted Answer · 2010-09-29T06:21:02+0000

Well, you might have a column in the table that says if the record is being processed. As part of the transaction, retrieve the data for the record that is not currently being processed, and update the record to say that it is now being processed. The details of how conflicts will be handled will depend on the type of transactions you create and the database you use, but I suspect that transactions should be at the center of it.

Assuming you really want to use a database, rather than a message queue of some description. You might want to use a message queue in conjunction with a database ... and some databases have queues built into them, which can also be useful. Even if you need an entry in the database, you can only have a queue from identifiers — clients can simply pull the next item out of the queue and then retrieve the data. You can still record the time at which the item was taken out of the queue, so if the client works or something like that, the batch job can put any failed jobs (for example, those that were collected a day ago, t there are still results) in the queue.

Design Guidelines Distributed Computing

More articles: