Parallelization / Cluster Options for Code Execution

Question

Parallelization / Cluster Options for Code Execution

I come from a java background and have a CPU binding problem that I am trying to parallelize to improve performance. I broke my code to execute it in a modular way so that it could be distributed and run in parallel (hopefully).

@Transactional(readOnly = false, propagation = Propagation.REQUIRES_NEW) public void runMyJob(List<String> some params){ doComplexEnoughStuffAndWriteToMysqlDB(); }

Now I was thinking about the following options for parallelizing this problem, and I would like people to think / worry in this area.

The options I'm thinking of now:

1) Use a Java EE cluster (e.g. JBoss) and MessageDrivenBeans. MDBs are located on the sub-nodes of the cluster. Each MDB can select an event that fires, as described above. AFAIK Java EE MDBs are multi-threaded by the application server, so we hopefully can also use multi-core processors. Thus, it must be scalable vertically and horizontally.

2) I could use something like Hadoop and Map Reduce. The concern I would like to get here is that my processing logic is actually quite high level, so I'm not sure how much this can be done to convert the map. Also, I am new to MR.

3) I could look at something like Scala, which in my opinion makes concurrency programming a lot easier. However, although it is scalable vertically, it is not a cluster / horizon scalable solution.

In any case, hope that all of this makes sense and thanks you for any help provided.

+6

java-ee scala architecture mapreduce cluster-computing

Brian Jan 26 '11 at 14:34

source share

2 answers

Edmondo1984 · Answer 1 · 2012-05-08T19:58:15+0000

The solution you are looking for is Akka. Clustering is a feature developed and commonly included in Akka 2.1.

Great Scala and Java Api, extremely complete
Purely message-centric template with no general state
Fault Tolerance and Scalability
Extremely easy to distribute tasks

Please get rid of J2EE if you are still on time. You can join the Akka mailing list to ask your questions.

ptikobj · Answer 2 · 2013-07-22T19:34:30+0000

You should look spark . This is a clustered computing environment written in Scala, whose goal is a viable alternative to Hadoop. This has a number of good feats:

In-Memory Computations: You can control the degree of caching
Hadoop I / O Interaction: Spark can read / write data from all Hadoop input sources such as HDFS, EC2, etc.
The concept of "Resilient Distributed Datasets" (RDD), which allows you to directly execute most MR-type workloads in parallel in a cluster, as you would do locally
Primary API = Scala, optional python and Java API
He uses Akka :)

If I understand your question correctly, Spark will combine your options 2) and 3).

Parallelization / Cluster Options for Code Execution

More articles: