One possibility is that you could use hadoop fs -lsr your_path to get all the paths, and then check if the paths you are interested in are in this set.
As for your failure, perhaps this was the result of all calls to os.system , not specific to the hadoop command. Sometimes invoking an external process can lead to problems with buffers that are never freed, in particular I / O buffers (stdin / stdout).
One solution would be to make one call to the bash script, which will go through all the paths. You can create a script using a string template in your code, fill out an array of paths in a script, write it, and then execute.
It might also be a good idea to switch to the python subprocess module, which gives you more granular control over the processing of subprocesses. Here is the equivalent of os.system :
process = subprocess.check_output( args=your_script, stdout=PIPE, shell=True )
Note that you can switch stdout to something like a file descriptor if this helps you debug or make the process more reliable. You can also switch this shell=True argument to False , unless you name the actual script or use things like pipes or redirection.
source share