I am new to cluster computing and I am trying to set up a minimum 2 node cluster in Spark. What am I still a bit confused: Do I need to first install a full Hadoop installation if the Spark ship with the included version of Hadoop inside?
What I find in Spark doesn't really make it clear. I realized that Spark is meant as an extension of Hadoop, not a replacement, but if it requires that an independently working Hadoop system does not understand me.
I need HDFS, is this enough to just use the Hadoop file system?
Can anyone point out this probably the obvious thing for me?
source
share