Failed to run Pi example for Apache Spark from Python

Question

Failed to run Pi example for Apache Spark from Python

I installed my first spark cluster (1 master, 2 employees) and the iPython laptop server, which I installed to access the cluster. I am running workers from Anaconda to make sure that the python configuration is correct on every mailbox. It seems that everything is set up correctly on the iPy laptop, and I can initialize Spark and get the job done. However, the work does not work, and I am not sure how to troubleshoot. Here is the code:

from pyspark import SparkContext
from numpy import random
CLUSTER_URL = 'spark://192.168.1.20:7077'
sc = SparkContext( CLUSTER_URL, 'pyspark')
def sample(p):
    from numpy import random
    x, y = random(), random()
    return 1 if x*x + y*y < 1 else 0

count = sc.parallelize(xrange(0, 20)).map(sample).reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / 20)

And here is the error:

Py4JJavaError Traceback (last call last) in () 3 returns 1 if xx + yy <1 else 0 4 ----> 5 count = sc.parallelize (xrange (0, 20)). Map (sample) .reduce (lambda a, b: a + b) 6 print "Pi about% f"% (4.0 * count / 20)
/opt/spark-1.2.0/python/pyspark/rdd.pyc (self, f)     713 (f, , )     714 → 715 vals = self.mapPartitions(func).collect()     716, vals:     717 return reduce (f, vals)
/opt/spark-1.2.0/python/pyspark/rdd.pyc (self)     674 ""     675 SCCallSiteSync (self.context) css: → 676 bytesInJava = self._jrdd.collect(). Iterator()     677 (self._collect_iterator_through_file (bytesInJava))     678
/opt/spark-1.2.0/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py (self, * args)     536 = self.gateway_client.send_command ()     537 return_value = get_return_value (, self.gateway_client, → 538 self.target_id, self.name)     539     540 temp_arg temp_args:
/opt/spark-1.2.0/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py get_return_value (, gateway_client, target_id, )     298 Py4JJavaError (     299 ' {0} {1} {2}.\N'. → 300 (target_id, '.', Name), )     301 :     302 Py4JError (
Py4JJavaError: o28.collect.: org.apache.spark.SparkException: - : 31 0.0 4 , : 31.3 0.0 (TID 72, 192.168.1.21): org.apache.spark.api.python.PythonException: Traceback ( call last): "/opt/spark-1.2.0/python/pyspark/worker.py", 107,     process() "/opt/spark-1.2.0/python/pyspark/worker.py", 98,     serializer.dump_stream (func (split_index, iterator), outfile) "/opt/spark-1.2.0/python/pyspark/serializers.py", 227, dump_stream     vs = list (itertools.islice(iterator, batch)) "/opt/spark-1.2.0/python/pyspark/rdd.py", 710,     initial = next (iterator) "", 2, TypeError: 'module'
org.apache.spark.api.python.PythonRDD $$ $1.read(PythonRDD.scala: 137)    org.apache.spark.api.python.PythonRDD $$ $1 (PythonRDD.scala: 174).    org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala: 96)    org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala: 263)    org.apache.spark.rdd.RDD.iterator(RDD.scala: 230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala: 61) org.apache.spark.scheduler.Task.run(Task.scala: 56) at org.apache.spark.executor.Executor $TaskRunner.run(Executor.scala: 196)    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)    java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:615)    java.lang.Thread.run(Thread.java:745)
stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $ $ $ $DAGScheduler $$ failJobAndIndependentStages (DAGScheduler.scala: 1214)    org.apache.spark.scheduler.DAGScheduler $$ anonfun $abortStage $1.Apply(DAGScheduler.scala: 1203)    org.apache.spark.scheduler.DAGScheduler $$ anonfun $abortStage $1.Apply(DAGScheduler.scala: 1202)    scala.collection.mutable.ResizableArray $class.foreach(ResizableArray.scala: 59)    scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala: 47)    org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala: 1202)    org.apache.spark.scheduler.DAGScheduler $$ anonfun $handleTaskSetFailed $1.Apply(DAGScheduler.scala: 696)    org.apache.spark.scheduler.DAGScheduler $$ anonfun $handleTaskSetFailed $1.Apply(DAGScheduler.scala: 696)    scala.Option.foreach(Option.scala: 236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala: 696)    org.apache.spark.scheduler.DAGSchedulerEventProcessActor $$ anonfun $ $2.applyOrElse(DAGScheduler.scala: 1420)   at akka.actor.Actor $class.aroundReceive(Actor.scala: 465) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala: 1375)   at akka.actor.ActorCell.receiveMessage(ActorCell.scala: 516) at . akka.dispatch.Mailbox.processMailbox(Mailbox.scala: 238) at akka.dispatch.Mailbox.run(Mailbox.scala: 220) akka.dispatch.ForkJoinExecutorConfigurator $AkkaForkJoinTask.exec(AbstractDispatcher.scala: 393)    scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)    scala.concurrent.forkjoin.ForkJoinPool $WorkQueue.runTask(ForkJoinPool.java:1339)    scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)    scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

, /, . , .

+4

python ipython-notebook apache-spark

Dave 21 . '15 16:57

1

Andrew_Lvov · Accepted Answer · 2015-01-21T17:01:39+0000

numpy.random - Python, , random().

, random.random(), .

Failed to run Pi example for Apache Spark from Python

More articles: