How to parse CustomWritable from text in Hadoop

Question

How to parse CustomWritable from text in Hadoop

Say I have temporary values for specific users in text files, for example

#userid; unix-timestamp; value
1; 2010-01-01 00:00:00; 10
2; 2010-01-01 00:00:00; 20
1; 2010-01-01 01:00:00; 11
2; 2010-01-01 01:00:00, 21
1; 2010-01-02 00:00:00; 12
2; 2010-01-02 00:00:00; 22

I have a custom class "SessionSummary" that implements readFields and writes WritableComparable. The goal is to summarize all the values for each user for each calendar day.

Thus, the cartographer displays the lines to each user, the reducer sums all the values per day per user and displays SessionSummary as TextOutputFormat (using toString of SessionSummary as UTF-8 lines separated by delimiters):

1; 2010-01-01; 21
2; 2010-01-01; 41
1; 2010-01-02; 12
2; 2010-01-02; 22

Map/Reduce, ? readFields write-methods ( WritableComparable), String DataInput? () :

public void map(...) {
    SessionSummary ssw = new SessionSummary();
    ssw.readFields(new DataInputStream(new ByteArrayInputStream(value.getBytes("UTF-8"))));
}

: Hadoop M/R, ?

( Hadoop - 0.20.2/CDH3u3)

+5

java mapreduce hadoop

thomers 15 . '12 14:25

1

Chris White · Accepted Answer · 2012-03-20T01:04:55+0000

MR SequenceFileOutputFormat - Key/Values , MR, SequenceFileInputFormat, , outputKeyClass outputValueClass Job .

SessionSummary ( )

MR, HDFS:

hadoop fs -libjars my-lib.jar -text output-dir/part-r-*

/ toString() , . -libjars , hasoop Key/Value

How to parse CustomWritable from text in Hadoop

More articles: