Persist Completed Pipeline in Luigi Visualizer

I am starting to migrate the night data pipeline from the ETL visual tool to Luigi, and I really like that there is a visualizer to see the status of tasks. However, I noticed that a few minutes after the completion of the last task (with the name MasterEnd ), all nodes disappear from the graph except MasterEnd . This is a bit uncomfortable, as I would like to see that everything is completed in the day / past days.

In addition, if in the visualizer I go directly to the last URL of the task, he cannot find any history in which he worked: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/) . I confirmed that he successfully managed this morning.

It should be noted that I have a cron that runs this pipeline every 15 minutes to check the file on S3. If it exists, it starts, otherwise it stops. I'm not sure if this causes the removal of tasks using the visualizer or not. I noticed that it generates a new PID for each run, but I could not find a way to save one PID / day in the documents.

So my questions are: is it possible to save the completed chart for the current day in the visualizer? And is there a way to see what happened in the past?

Appreciate all the help

+5
source share
2 answers

I am not 100% sure if this is correct, but this is what I will try first. When you call luigi.run , pass it --scheduler-remove-delay . I guess this is how long the planner waits before forgetting the task after completing all of his dependents. If you look at the source of luigi , the default is 600 seconds. For instance:

 luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name) 
+3
source

If you configure the remove_delay parameter in the luigi.cfg file, it will continue to perform tasks longer.

 [scheduler] record_task_history = True state_path = /x/s/hadoop/luigi/var/luigi-state.pickle remove_delay = 86400 

Please note: there is a typo in the documentation ("remove-delay" instead of remove_delay "), which is fixed in https://github.com/spotify/luigi/issues/2133

+1
source

All Articles