Apache Spark on EC2 "Killed"

My program, which I ran many times on different clusters, suddenly stops. Journal:

15/04/20 19:19:59 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 15.0 (TID 374) in 61 ms on ip-XXX.compute.internal (16/24) 15/04/20 19:19:59 INFO storage.BlockManagerInfo: Added rdd_44_14 in memory on ip-XXX.compute.internal:37999 (size: 16.0 B, free: 260.6 MB) Killed 

What does β€œkilled” mean and why is this happening? There are no other errors.

+5
source share
3 answers

"Killed" usually means that the OS terminated the process by sending a SIGKILL signal. This is a non-blocking signal that immediately terminates the process. It is often used as a killer of an OOM (out-of-memory) process - if the OS decides that memory resources become dangerously low, it can choose a process to kill to try to free some memory.

Without additional information, it is impossible to determine if your process was killed due to memory problems or for some other reason. The type of information you can provide to help diagnose what is happening includes: how long did the process take until it died? can you enable and provide more verbose debug output from the process? Is this a process termination associated with any particular communication or processing pattern?

+1
source

Try setting yarn.nodemanager.vmem-check-enabled to false in your Spark config program, something like this:

 val conf = new SparkConf().setAppName("YourProgramName").set("yarn.nodemanager.vmem-check-enabled","false") val sc = new SparkContext(conf) 

http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-being-killed-by-YARN-node-manager-td22199.html

0
source

maybe vm problem

  • make sure you have a swap partition.
  • ensure vm.swappiness not zero.
0
source

All Articles