I have a csv file that needs to be analyzed using hasoop mapreduce. I am wondering if a chaop can analyze it line by line? if so, I want to use line separation by comma so that the fields can parse. or is there another better way to parse csv and pass it to hadoop? File 10 GB, comma-delimited. I want to use java with hadoop. The Tex value parameter in the method below map () contains every line that Map / Reduce parses? - here most of all confuses me.
this is my code:
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
try {
String[] tokens = value.toString().split(",");
String crimeType = tokens[5].trim();
int year = Integer.parseInt(tokens[17].trim());
context.write(crimeType, year);
} catch (Exception e) {...}
}
source
share