How to parse CustomWritable from text in Hadoop

Say I have temporary values ​​for specific users in text files, for example

#userid; unix-timestamp; value
1; 2010-01-01 00:00:00; 10
2; 2010-01-01 00:00:00; 20
1; 2010-01-01 01:00:00; 11
2; 2010-01-01 01:00:00, 21
1; 2010-01-02 00:00:00; 12
2; 2010-01-02 00:00:00; 22

I have a custom class "SessionSummary" that implements readFields and writes WritableComparable. The goal is to summarize all the values ​​for each user for each calendar day.

Thus, the cartographer displays the lines to each user, the reducer sums all the values ​​per day per user and displays SessionSummary as TextOutputFormat (using toString of SessionSummary as UTF-8 lines separated by delimiters):

1; 2010-01-01; 21
2; 2010-01-01; 41
1; 2010-01-02; 12
2; 2010-01-02; 22

Map/Reduce, ? readFields write-methods ( WritableComparable), String DataInput? () :

public void map(...) {
    SessionSummary ssw = new SessionSummary();
    ssw.readFields(new DataInputStream(new ByteArrayInputStream(value.getBytes("UTF-8"))));
}

: Hadoop M/R, ?

( Hadoop - 0.20.2/CDH3u3)

+5
1

MR SequenceFileOutputFormat - Key/Values ​​ , MR, SequenceFileInputFormat, , outputKeyClass outputValueClass Job .

SessionSummary ( )

MR, HDFS:

hadoop fs -libjars my-lib.jar -text output-dir/part-r-*

/ toString() , . -libjars , hasoop Key/Value

+8

All Articles