It depends on where your work fails - if the line is damaged and an exception is thrown somewhere in your map method, you should just be able to wrap the body of your map method with try / catch and just report the error:
protected void map(LongWritable key, Text value, Context context) { try {
But if the error is caused by your InputFormat RecordReader, you need to change the mappers run(..) method, which by default performs the following actions:
public void run(Context context) { setup(context); while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } cleanup(context); }
So you can change this to try to catch an exception in the call to context.nextKeyValue() , but you have to be careful just ignoring any errors caused by the reader. For example, IOExeption cannot be “overlooked” by simply ignoring the error.
If you wrote your own InputFormat / RecordReader, and you have a specific exception that indicates the recording failed, but allows you to skip and continue parsing, maybe something like this:
public void run(Context context) { setup(context); while (true) { try { if (!context.nextKeyValue()) { break; } else { map(context.getCurrentKey(), context.getCurrentValue(), context); } } catch (SkippableRecordException sre) {
But just to repeat it again - your RecordReader should be able to recover the error, otherwise the code above could send you into an endless loop.
In your specific case - if you just want to ignore the file on the first crash, you can update the execution method to a simpler one:
public void run(Context context) { setup(context); try { while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } cleanup(context); } catch (Exception e) {
Some final words of warning:
- You need to make sure that this is not your mapper code that throws an exception, otherwise you will ignore the files for the wrong reason.
- GZip compressed files that are not GZip compressed will not actually work when initializing a reader with a writer - therefore, this type or error will not be caught above (you will need to write your own writer for reading records). This is true for any file error that occurs when creating a record.