From source code from LineRecordReader.java contructor database: I find comments:
// If this is not the first split, we always throw away first record // because we always (except the last split) read one extra line in // next() method. if (start != 0) { start += in.readLine(new Text(), 0, maxBytesToConsume(start)); } this.pos = start;
from this I believe (not confirmed) hadoop will read one additional line for each division (at the end of the current division, read the next line in the next split), and if not the first split, the first line will be thrown out. so no line record will be lost and incomplete
source share