Distributed Computing in C #

I have a specific DLL that contains some classes and language processing methods. One of these methods takes the word as an argument and does some calculation for about 3 seconds and saves the related result to the Db SQL server.

I want to run this DLL method in words of 900 thousand, and this task can be repeated every week. How can I easily spread this work across multiple systems to save time with C #?

+4
source share
5 answers

Since this is existing code, I would be looking for a way to break this 900 thousand word list.

Everything else will require much larger changes.

+4
source

Answer in the form: Requirement - Tool

Scheduled Runs - Quartz.NET

Quartz allows you to run "tasks" in any schedule. It also maintains state between runs, therefore, for some reason, the server drops, when it returns, it knows how to get started. Pretty cool stuff.

Distributed Queue - NServiceBus

A good ServiceBus is worth that weight in gold. Basically, what you want to do is to ensure that all your employees perform only a given operation, since many operations are queued. If you guarantee that your operations idempotent NServiceBus is a great way to accomplish this.

Queue โ†’ Worker 1 + = Worker 2 + = Worker 3 โ†’ Local data storage โ†’ Data queue + Workers โ†’ Remote data storage

Data Cache - RavenDb or SQLite

Basically, to ensure that the return values โ€‹โ€‹of these operations are sufficiently isolated from SQL Server, you want to make sure and cache the value somewhere on the local storage system. It could be something fast and non-relational, like RavenDB or something like SQLite. Then you throw some identifier into another queue via NServiceBus and synchronize it with SQL Server, queues are your friend !:-)

Async Operations - Parallel Task Library and TPL DataFlow

You essentially want to ensure that none of your operations are blocking or atomic enough. If you still do not know about TPL, then this is really very powerful stuff! I heard a lot from people from Java, but it's worth mentioning ... C # is becoming a really great language for asynchronous and parallel workflows!

Also, one good thing coming out of the new Async CTP is the TPL DataFlow. I have not used it, but it seems to be right up your alley!

+14
source

I think this is addressed to Dryadlinq . Just be aware of this, not one of them is worried by himself, but it seems that he is in line with the bill.

GJ

+2
source

You can create an application that acts as server software. Had to manage a list of words and distribute them to customers. Your client software will be installed on distributed computers. Then you can use MSMQ to quickly jump back and forth.

+2
source

You have the right idea. Divide and win. This is a typical task for distributed parallel computing. Let's say you have five machines, each with four cores, with hyperthreads. This gives you 40 logical processors.

As you described, you have 750 hours of processing, plus a bit of overhead. If you can divide work into 40 processing threads, you can do it all in 20 hours. Sharing work is the easy part.

The hard part distributes the work and performs it in parallel. You have some options here, as others have pointed out. Let me have something else for your consideration.

  • You can manually split the word list on demand or to another device and run separate and unique console applications on each node / workstation, which will use TPL to maximize the extraction of each logical processor of each machine.

  • You can use something MPAPI and encode your own nodes and workers.

  • You can install Windows Server on your node / workstations and run Microsoft HPC and use something like MPI.NET to kick in on jobs.

  • You can write a console application and use DuoVia.MpiVisor to distribute and run on your workstations. (Full disclosure: I am the author of MpiVisor)

Good luck to you.

+1
source

All Articles