Do i need to install python (or jython) on every node task tracker?
Yes, since it is done in the task tracker.
Should python (or jython) modules be installed for each node tracker task?
If you use a third-party module, it must also be installed in task trackers (for example, geoip, etc.).
Do task tracking nodes need to know how to find modules? If so, how do you specify the path (through the environment variable - how is this done for the task tracker)?
As an answer from Pig Programming :
Registeralso used to find resources for Python UDF that you use in your Pig Latin scripts. In this case, you are not registering the jar, but rather a Python script that contains your UDF. The Python script should be in your current directory.
And also this is important:
Caution, Pig does not track dependencies inside your Python scripts and send the necessary Python modules to your Hadoop cluster. You need the modules you need to be on the nodes of the task in your cluster and that the environment variable PYTHONPATH is set on these nodes so that your UDFs can find them for import. This issue has been fixed after 0.9, but has been released at the time of this writing.
And if you use jython:
The pig does not know where the Jython interpreter is used on your system, so you must include jython.jar in your class path when calling Pig. This can be done by setting the environment variable PIG_CLASSPATH.
As a result, if you use streaming, you can use the "SHIP" command in the swing, which will send your executable files to the cluster. if you use UDF, as long as it can be compiled (note the note about jython) and has no dependency on it (which you have not yet inserted into PYTHONPATH / or installed in the cluster), UDF is sent to the cluster at runtime. (As a hint, this would make your life much easier if you put your simple UDF dependencies in the same folder using a pig script when registering)
Hope this clears things up.