We have a streaming application running on yarn, and we would like to ensure that it works 24/7.
Is there a way to tell yarn to automatically restart a specific application on error?
Have you tried Hadoop Yarn - Restarting ResourceManger
The yarn will restart the driver if this does not work with the function "yarn.resourcemanager.am.max-attempts", and by default it is 2.
You can specify the maximum attempt of a specific application using ApplicationSubmissionContext :: setMaxAppAttempts Here is a document for this function