Spark gives a long and recurring timeout error when I increase the size of my data

I use a Cassandra database to host some data, and I wrote a script to pull a certain amount of data from it and put it in a pandas data frame. Then I convert certain columns of this data into spark RDD so that I can find the informative information.

When I give it over a certain amount of data, it does not fall, but repeatedly sends this error. I am new to python, sparks and programming in general, so I really can't say anything more about this.

Here is the error:

Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1176983551]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132) at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:299) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 14 more 15/07/28 11:39:53 WARN AkkaRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@544f81f3,BlockManagerId(driver, localhost, 41320))] in 1 attempts akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-1176983551]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132) at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:299) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:444) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:464) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:464) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:464) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:464) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 

Code executed until a normal search for information from Kassandra is performed, so I think the problem is spark, but I'm not sure.

+4
source share

All Articles