Cloudera explains this really good blog post . All loans go to them.
But just to answer your question in short - no, this is not possible. Any complex third-party dependency should be installed on each node of your cluster and properly configured. For simple modules / dependencies, you can create *.egg , *.zip or *.py --py-files and provide them to the cluster with the --py-files flag in spark-submit .
However, xgboost is a numerical package that is highly dependent not only on other Python packages, but also on a specific C++ library / compiler, which is low level. If you were to deliver the compiled code to the cluster, you might run into errors due to different hardware architectures. Adding to the fact that clusters are usually heterogeneous in terms of hardware, this behavior will be very poor.
source share