The easiest way to parallelize my C # program on multiple PCs

Question

The easiest way to parallelize my C # program on multiple PCs

I have many unused computers at home. What would be the easiest way to use them to parallelize my C # program with little or no code changes?

The task that I am trying to do includes a cycle through many English sentences, the data set can be easily broken into smaller pieces processed on different computers at the same time.

+6

c # concurrency parallel-processing .net cloud

Hao wooi lim Nov 08 '08 at 19:18

source share

9 answers

Konrad Rudolph · Answer 1 · 2008-11-08T19:23:05+0000

... with little or no code changes?

Difficult. Basically, look at WCF as a way to communicate between different instances of a program over a network. Depending on the algorithm, the structure may have to be drastically changed or not altered at all. In any case, you need to find a way to divide the problem into parts that act independently of each other. Then you must develop a way to distribute these parts between different instances and collect the resulting data.

PLinq offers a great way to parallelize your program without major changes, but it only works in one process, in different threads, and then only if the algorithm can be parallelized. In general, manual refactoring is required.

user5913 · Answer 2 · 2008-11-08T19:21:15+0000

Perhaps this is not possible.

How to parallelize a program depends entirely on what your program does and how it is written, and usually requires extensive code changes and greatly increases the complexity of your program.

The usual way to easily increase the degree of compatibility in a program is to perform a task that is repeated many times and simply write a function that breaks this task into pieces and sends them to different kernels for processing.

vmarquez · Answer 3 · 2008-11-08T19:28:47+0000

The answer depends on the nature of the work that your application will do. Different types of work have different possible solutions for parallelization. For some types, there is no way / possible way to parallelize.

The simplest scenario I can come up with is an application whose work can easily be broken down into discrete pieces of work. If so, then you simply create an application to work in one workplace. Give your application the ability to accept new tasks and deliver finished tasks. Then create a task scheduler on top of it. This scheduler can be part of the same application (configure one computer as a scheduler and the rest as clients) or a separate application.

There are other questions: how will the communication between the machines (files ?, network connections?); should the application be able to report / be _queried about the percentage of work completed ?; Do you need to be able to make the application stop the current work ?; and etc.).

If you need a more detailed answer, edit your question and include information about the application, the problem that the application solves, the expected number of tasks, etc. Then the community will come with more specific answers.

Mauricio Scheffer · Answer 4 · 2008-12-08T14:17:08+0000

Dryad (a variant of Microsoft MapReduce) solves exactly this problem (parallelize .net programs on several computers). This is a research phase right now. It is a pity that there is no CTP yet: - (

CodeForNothing · Answer 5 · 2008-11-08T19:28:11+0000

You need to run the application in a distributed system, google for distributed computing windows or for C # grid computing.

Adam liss · Answer 6 · 2008-11-08T19:48:37+0000

Are each proposal processed independently, or are they combined in some way? If your processing runs one sentence at a time, you don’t have to change your code at all. Just execute the same code on each of your machines and share the data (your list of offers) between them. You can do this either by installing a piece of data on each computer, or by sharing a database and assigning a separate block to each machine.

If you want to modify your code a bit to facilitate parallelism, share the entire database and get the code to “tick” each sentence as it is processed, and then find the next unmarked sentence for processing. This will give you a gentle introduction to the concept of thread safety - methods that ensure that one processor does not interfere with another.

As always, the more details you can provide about your particular application, the better the SO community can tailor our answers to your goal.

Good luck - it sounds like an interesting project!

tvanfosson · Answer 7 · 2008-11-08T20:51:21+0000

Before I invest in parallelizing your program, why not just try to break the data sets into pieces and manually run your program on each computer and manually compare the results. If this works, try automating it with scripts and writing a program to sort the outputs.

Brett McCann · Answer 8 · 2008-12-10T21:06:16+0000

There are several software solutions that allow you to use hardware based products. One of them is Appistry . I work in Appistry, and we made many solutions to run C # applications on hundreds of machines.

Some useful links: http://www.appistry.com/resource-library/index.html

You can download the product for free here: http://www.appistry.com/developers/

Hope this helps - Brett

Paul morrison · Answer 9 · 2009-02-22T13:29:36+0000

Perhaps you should take a look at Flow-Based Programming - it has an implementation of Java and C #. Most approaches to this problem involve trying to use a regular single-threaded program and find out which parts can work in parallel. FBP takes a different approach: the application is designed from the very beginning in terms of several black box components that work asynchronously (think of an assembly line for production). Since a typical single-threaded program acts as a single component in an FBP environment, it is very simple to extend an existing application. In fact, fragments of an existing application can often be broken down and turned into separate components, provided that they can be executed asynchronously with the rest of the application (i.e. Not by subroutines). Someone called it “turning an iceberg into ice cubes”).

The easiest way to parallelize my C # program on multiple PCs

More articles: