Parallelization / Cluster Options for Code Execution

I come from a java background and have a CPU binding problem that I am trying to parallelize to improve performance. I broke my code to execute it in a modular way so that it could be distributed and run in parallel (hopefully).

@Transactional(readOnly = false, propagation = Propagation.REQUIRES_NEW) public void runMyJob(List<String> some params){ doComplexEnoughStuffAndWriteToMysqlDB(); } 

Now I was thinking about the following options for parallelizing this problem, and I would like people to think / worry in this area.

The options I'm thinking of now:

1) Use a Java EE cluster (e.g. JBoss) and MessageDrivenBeans. MDBs are located on the sub-nodes of the cluster. Each MDB can select an event that fires, as described above. AFAIK Java EE MDBs are multi-threaded by the application server, so we hopefully can also use multi-core processors. Thus, it must be scalable vertically and horizontally.

2) I could use something like Hadoop and Map Reduce. The concern I would like to get here is that my processing logic is actually quite high level, so I'm not sure how much this can be done to convert the map. Also, I am new to MR.

3) I could look at something like Scala, which in my opinion makes concurrency programming a lot easier. However, although it is scalable vertically, it is not a cluster / horizon scalable solution.

In any case, hope that all of this makes sense and thanks you for any help provided.

+6
java-ee scala architecture mapreduce cluster-computing
source share
2 answers

The solution you are looking for is Akka. Clustering is a feature developed and commonly included in Akka 2.1.

  • Great Scala and Java Api, extremely complete
  • Purely message-centric template with no general state
  • Fault Tolerance and Scalability
  • Extremely easy to distribute tasks

Please get rid of J2EE if you are still on time. You can join the Akka mailing list to ask your questions.

0
source share

You should look spark . This is a clustered computing environment written in Scala, whose goal is a viable alternative to Hadoop. This has a number of good feats:

  • In-Memory Computations: You can control the degree of caching
  • Hadoop I / O Interaction: Spark can read / write data from all Hadoop input sources such as HDFS, EC2, etc.
  • The concept of "Resilient Distributed Datasets" (RDD), which allows you to directly execute most MR-type workloads in parallel in a cluster, as you would do locally
  • Primary API = Scala, optional python and Java API
  • He uses Akka :)

If I understand your question correctly, Spark will combine your options 2) and 3).

0
source share

All Articles