Send & Kill Spark Application programmatically from another application

Question

Send & Kill Spark Application programmatically from another application

I am wondering if it is possible to send , monitor and kill application sparks from another service.

My requirements are as follows:

I wrote a service that

analyze user commands
translate them into understandable arguments for the already prepared Spark-SQL application
send the application with arguments to Spark Cluster using spark-submit from ProcessBuilder
And he plans to launch the driver of the generated applications in cluster mode.

Other needs:

Request for applications status , for example, the percentage remains
Kill requests accrodingly

What I find in fixing the offline documentation suggests killing the application using:

 ./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

And should find the driver ID through the standalone Master web UI at http://<master url>:8080.

So what should I do?

Related SO questions:
Spark Callback Completion
Deploy Apache Spark from another Java application, best practices

+5

apache-spark

Yijie shen May 01, '15 at 15:44

source share

6 answers

pinkdawn · Answer 1 · 2016-06-23T01:27:49+0000

You can use shell script for this.

Deployment script:

 #!/bin/bash spark-submit --class "xx.xx.xx" \ --deploy-mode cluster \ --supervise \ --executor-memory 6G hdfs:///spark-stat.jar > output 2>&1 cat output

and you will get the result as follows:

 16/06/23 08:37:21 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://node-1:6066. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160623083722-0026. Polling submission state... 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160623083722-0026 in spark://node-1:6066. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: State of driver driver-20160623083722-0026 is now RUNNING. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160621162532-192.168.1.200-7078 at 192.168.1.200:7078. 16/06/23 08:37:22 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20160623083722-0026", "serverSparkVersion" : "1.6.0", "submissionId" : "driver-20160623083722-0026", "success" : true }

And based on this, create your kill script driver

 #!/bin/bash driverid=`cat output | grep submissionId | grep -Po 'driver-\d+-\d+'` spark-submit --master spark://node-1:6066 --kill $driverid

Make sure that permission is granted to execute the script using chmod +x

Marcosstratian · Answer 2 · 2015-07-23T11:12:26+0000

The dirty trick to kill spark applications is to kill jps called SparkSubmit. The main problem is that the application will be “killed”, but in the spark wizard’s log it will appear as “finished” ...

 user@user :~$ jps 20894 Jps 20704 SparkSubmit user@user :~$ kill 20704

Honestly, I do not like this solution, but by now I am the only way to kill the application.

Hope this helps.

Roman · Answer 3 · 2016-02-17T14:49:17+0000

That's what I'm doing:

To send applications, use the (hidden) Spark REST API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
- This way you get the DriverID (in the presentationId file), which you can use to kill your task later (you should not kill the application, especially if you use "control" in offline mode).
- This API also allows you to request driver status.
Request status for applications using the (also hidden) JI API: http: // [master-node]: [master-ui-port] / json /
- This service provides all the information available in the main user interface in JSON format.
You can also use the "public" REST API to request applications for Master or Executors for each worker, but this will not lead to driver detection (at least not from Spark 1.6)

urug · Answer 4 · 2015-05-01T17:47:19+0000

you can run the spinning commands from Processbuilder to display applications, and then filter based on your application name that is available with you, extract the application, and then use the "Yarn" commands to poll status / kill, etc.

Daniar Heri Kurniawan · Answer 5 · 2017-12-26T19:11:32+0000

kill -9 $ (jps | grep SparkSubmit | grep -Eo '[0-9] {1,7}')

okwap · Answer 6 · 2018-01-05T01:48:49+0000

The driver ID can be found in [spark] / work /. Identifier is the name of the directory. Kill the job with spark-submit.

Send & Kill Spark Application programmatically from another application

More articles: