Send & Kill Spark Application programmatically from another application

I am wondering if it is possible to send , monitor and kill application sparks from another service.

My requirements are as follows:

I wrote a service that

  • analyze user commands
  • translate them into understandable arguments for the already prepared Spark-SQL application
  • send the application with arguments to Spark Cluster using spark-submit from ProcessBuilder
  • And he plans to launch the driver of the generated applications in cluster mode.

Other needs:

  • Request for applications status , for example, the percentage remains
  • Kill requests accrodingly

What I find in fixing the offline documentation suggests killing the application using:

 ./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID> 

And should find the driver ID through the standalone Master web UI at http://<master url>:8080.

So what should I do?

Related SO questions:
Spark Callback Completion
Deploy Apache Spark from another Java application, best practices

+5
source share
6 answers

You can use shell script for this.

Deployment script:

 #!/bin/bash spark-submit --class "xx.xx.xx" \ --deploy-mode cluster \ --supervise \ --executor-memory 6G hdfs:///spark-stat.jar > output 2>&1 cat output 

and you will get the result as follows:

 16/06/23 08:37:21 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://node-1:6066. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160623083722-0026. Polling submission state... 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160623083722-0026 in spark://node-1:6066. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: State of driver driver-20160623083722-0026 is now RUNNING. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160621162532-192.168.1.200-7078 at 192.168.1.200:7078. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20160623083722-0026", "serverSparkVersion" : "1.6.0", "submissionId" : "driver-20160623083722-0026", "success" : true } 

And based on this, create your kill script driver

 #!/bin/bash driverid=`cat output | grep submissionId | grep -Po 'driver-\d+-\d+'` spark-submit --master spark://node-1:6066 --kill $driverid 

Make sure that permission is granted to execute the script using chmod +x

+3
source

The dirty trick to kill spark applications is to kill jps called SparkSubmit. The main problem is that the application will be “killed”, but in the spark wizard’s log it will appear as “finished” ...

 user@user :~$ jps 20894 Jps 20704 SparkSubmit user@user :~$ kill 20704 

Honestly, I do not like this solution, but by now I am the only way to kill the application.

Hope this helps.

+2
source

That's what I'm doing:

  • To send applications, use the (hidden) Spark REST API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

    • This way you get the DriverID (in the presentationId file), which you can use to kill your task later (you should not kill the application, especially if you use "control" in offline mode).
    • This API also allows you to request driver status.
  • Request status for applications using the (also hidden) JI API: http: // [master-node]: [master-ui-port] / json /

    • This service provides all the information available in the main user interface in JSON format.
  • You can also use the "public" REST API to request applications for Master or Executors for each worker, but this will not lead to driver detection (at least not from Spark 1.6)

+2
source

you can run the spinning commands from Processbuilder to display applications, and then filter based on your application name that is available with you, extract the application, and then use the "Yarn" commands to poll status / kill, etc.

0
source

kill -9 $ (jps | grep SparkSubmit | grep -Eo '[0-9] {1,7}')

0
source

The driver ID can be found in [spark] / work /. Identifier is the name of the directory. Kill the job with spark-submit.

0
source

All Articles