I have a hadoop cluster that I use to analyze data using Numpy, SciPy and Pandas. I want my hadoop jobs to be represented as a zip / tar file using the '-file' argument to the command. This zip file should have EVERYTHING that my python program should execute so that no matter what node my script is running in the cluster, I will not run ImportError at runtime.
Due to company policy, installing these libraries on each node is not entirely feasible, especially for search / agile development. I have pip and virtualenv to create a sandbox as needed.
I looked through zipimport and python packaging , but none of this fits my needs / it's hard for me to use these tools.
Did anyone manage to do this? I can't seem to find success stories on the Internet.
Thank!
source
share