It looks like the project you are referring to may be associated with this Jira ticket .
Currently, the JobControl class is pretty bare, and it lacks a few functions that can make life easier for the user. For example:
- The ability to receive notifications when a job changes: right now you call
JobControl.run and that is, but in practice it may be interesting if I can get a notification when something changes in my work. - Retry unsuccessful jobs: you can implement a tool for resubmitting a job if / if it fails, for example, you can have the maximum number of retry options in the
ControlledJob class and try again up to this point before sending a notification that it failed. - Many tasks are performed on a regular basis: weekly, daily, hourly ... This is usually done through crontab, so it would be interesting to include this function in Hadoop, for example, users could set a recurring task by specifying a period, and JobControl will run it through these regular intervals.
- Perhaps there is a user interface to visualize your workflow and each job dependency, which steps have already been completed and you donβt.
- It would be interesting to be able to not only run Map / Reduce tasks, but also Hive, Pig, for example, so that you can provide a common interface for users, so that they can send any tasks and easily track them.
In the end, I donβt think you need to invent a completely new structure, the JobControl class already provides a good starting point. Try to think from a userβs perspective what you can do to simplify and shorten the presentation and management of tasks. The ideas here and on the ticket are just an example, you can come up with your ideas.
As for Oozie , it provides a higher abstraction for controlling the flow of tasks, but it is also more complex to configure and should be reserved for more complex tasks. I know that some people hesitate to use Oozie because it adds extra overhead to your applications. The big difference is also that Oozie is the server, and JobControl runs only on the client machine, which is an additional cost. Although some of the features mentioned above are present in Oozie in one form or another, the ability to keep it simple and running on a client machine without additional work, for example, Oozie , in my opinion, is the key to your project.
Charles Menguy
source share