How hasoop reads an input file?

Question

How hasoop reads an input file?

I have a csv file that needs to be analyzed using hasoop mapreduce. I am wondering if a chaop can analyze it line by line? if so, I want to use line separation by comma so that the fields can parse. or is there another better way to parse csv and pass it to hadoop? File 10 GB, comma-delimited. I want to use java with hadoop. The Tex value parameter in the method below map () contains every line that Map / Reduce parses? - here most of all confuses me.

this is my code:

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    try {
       String[] tokens = value.toString().split(",");

       String crimeType = tokens[5].trim();      
       int year = Integer.parseInt(tokens[17].trim()); 

       context.write(crimeType, year);

     } catch (Exception e) {...}
 }

+4

csv hadoop

Tonygw Oct 19 '13 at 19:56

source share

3 answers

pumuckl · Answer 1 · 2013-10-20T15:41:51+0000

, Hadoop , . - . CSV, / . , CSV, : https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop/mapreduce/lib/input/CSVNLineInputFormat.java

Tariq · Answer 2 · 2013-10-20T22:54:15+0000

"" Tex map() , Map/Reduce? - .
(, InputFormat , TextInputFormat). , . RecordReader , InputSplit, InputFormat, mapper ( /). TextInputFormat LinerecordReader, . , mapper InputSplit . , InputSplit mapper Records, .
, ? , , .
. csv. , String split(). . Java , , Context.write(), crimeType: () IntWritable.

, ?

hrv · Answer 3 · 2013-10-19T21:54:16+0000

hadoop, csv. Hadoop - .

So use something like the opencsv API to get data from a file and provide it to the Hadoop mapping class in terms of key / value.

See this link for a detailed explanation.

How hasoop reads an input file?

More articles: