What is the best approach to handle very large CSV files using apache Camel?

Question

What is the best approach to handle very large CSV files using apache Camel?

I am looking at camel apache as the best option for the ETL process that starts with CSV.

this file will be in millions of lines and has an unholy number of columns (~ 500)

so far I have considered several different options - unmarshalling with CSV data format , as well as with camel-bindy , but no one does what I expect.

The csv data format analyzes each line and then passes the list of lists to the next processor, so with millions of line parameters it will explode due to lack of memory / heap.

bindy's approach looked great! until I developed, I need to map each csv column to pojo, 99% of which I am not interested in.

so the question is: do I need to write an explicit line using a linear processor or component that will process the conversion for each line and pass it to the next () in the route, or is there another option that I will not meet yet?

+4

java csv apache-camel

Jason dwyer Jan 21 '14 at 22:49

source share

1 answer

Jason dwyer · Answer 1 · 2014-01-21T23:45:02+0000

aah

a very similar question was asked some time ago (did not find it during the first search)

The best strategy for handling large CSV files in Apache Camel

the answer is a splitter and passes the result.

What is the best approach to handle very large CSV files using apache Camel?

More articles: