If an exception is thrown by UDF, the task will fail and will be retried.
It will work again three times (4 attempts by default), and all work will be FAILED.
If you want to register an error and do not want the task to be stopped, you can return null:
public Tuple exec(Tuple input) throws IOException { try { // do stuff with input } catch (Exception e) { System.err.println("Error with ..."); return null; } }
And filter them later in Pig:
events_all = FOREACH logs GENERATE Extractor(line) AS line; events_valid = FILTER events_all by line IS NOT null; events = FOREACH events_valid GENERATE FLATTEN(line);
In your example, the output will contain only two valid lines (but be careful with this behavior, since the error is present only in the logs and will not fail your work!).
Reply to comment # 1:
Actually, the entire resulting tuple will be empty (therefore, there will be no fields inside).
For example, if your circuit has 3 fields:
events_all = FOREACH logs GENERATE Extractor(line) AS line:tuple(a:int,b:int,c:int);
and some lines are incorrect:
() ((1,2,3)) ((1,2,3)) () ((1,2,3))
And if you do not filter the null string and try to access the field, you will get java.lang.NullPointerException :
events = FOREACH events_all GENERATE line.a;