Hadoop or Hadoop Streaming for MapReduce on AWS

I am about to launch a mapreduce project that will run on AWS, and I am offered a choice to use Java or C ++.

I understand that writing a project in Java will make more functionality available to me, however C ++ could also disable it through Hadoop Streaming.

Remember, I have little background in any language. A similar project was executed in C ++, and the code is available to me.

So my question is: is there any additional functionality available through AWS, or is it only relevant if you have more control over the cloud? Is there anything else I need to keep in mind in order to make a decision, for example, having hadoop plugins that work better with one language or another?

Thank you in advance

+5
source share
3 answers

You have several options for running Hadoop on AWS. The easiest way is to run your MapReduce jobs through the Elastic MapReduce service: http://aws.amazon.com/elasticmapreduce . You can also start a Hadoop cluster on EC2, as described in http://archive.cloudera.com/docs/ec2.html .

, /, , Java . , Hadoop - , , , EMR.

, !

: Cloudera.

,

+6

, Java , ++ Java.

.

+1

. /? ? ? ? ?

What I mean is that if you only need the basics of chaos, then streaming will be great. But if you need a little more complicated (from the hadoop framework, and not from your own business logic), the chaos gang will be more flexible.

Sagie

0
source

All Articles