How to call a pig script inside another pig script

I have a file in hdfs with 100 columns that I want to process with a pig. I want to load this file into a tuple with the column names in a separate pig script and reuse this script from other pig scripts. How to do it?

Say this is a 100 column pig script - 100col.pig. How can I call it from anotherone.pig?

+4
source share
4 answers

Check the exec command (for batch processing) or the run command (for interactive scripts). Also, if you need to use shell commands (non-grunt), check the fs command. Here's a good recommendation:

http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

+5
source

You should try using the macros that are present in the 0.9 pig version.

http://pig.apache.org/docs/r0.9.1/cont.html#macros

+3
source

It is a bit late for this answer, but I recently worked on this requirement and did not find anything useful until I found it, hope this helps someone to need it:

** This excerpt is from the Pig programming book.

Over time, in Pig Latin, the entire script should be in one file. This led to some rather unpleasant diverse Pig Latin scripts. Starting at 0.9, a preprocessor can be used to turn one swallow script into another. Taken together with macros, you can now write a modular Pig Latin, which is easier to debug and reuse: import is used to include one Pig Latin script in another:

- main.pig

 import '../examples/ch6/dividend_analysis.pig'; daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float); results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close'); 

import writes the imported file directly to your Pig Latin script instead of importing. In the previous example, the contents of divend_analysis.pig will be placed immediately before the load instruction. Please note that the file cannot be imported twice. If you want to use the same function several times, you must write it as a macro and import a file with this macro.

+2
source

There are 2 options here, as indicated above. The pig gives the run and exec commands to solve your requirements.

Command

exec is designed to call a pig script, which is independent and autonomous. start command to start a pigsty and save its variables and aliases.

I suppose you need to check the run command to achieve your requirements. http://pig.apache.org/docs/r0.9.1/cmds.html#run

+1
source

All Articles