How to combine 2 csv files in columns in a spoon, pentaho, while managing data conversion?

Question

How to combine 2 csv files in columns in a spoon, pentaho, while managing data conversion?

I ran into the following problem:

I have two inputs:
1) I have a basic csv file with 35 columns and corresponding headers.
2) I have many predefined files that are not controlled by me, which may or may not contain 35 columns, and, even worse, they may be out of order.

I need to map the columns from the second csv file to the columns in the first CSV file. If the second csv file does not have all 35 columns, I must create them in the correct order.

Once I have the correct csv file (the one that the header looks like the first csv header), I will pass in its script, which manages the data referenced by the column headers.

One possible solution would be to get the existing field inputs inside the script, however I cannot do this because the fields seem to be fixed by referring to the existing column headers of the second csv file. So when I try to access a column that does not exist, I get an exception ...

Any help would be greatly appreciated!

+4

csv pentaho kettle

wleao Jul 18 '11 at 10:57

source share

1 answer

simar · Answer 1 · 2015-09-07T14:16:10+0000

The term " fields in the second csv out of order " may have several meanings

The same source csv file, but from time to time different fields
The field position (column number) in the csv file is different in the files provided by different sources.

The first case is really strange. The same source must provide the same data, and if it is not, the decision logic can be very complex.

The second case looks more real. In this case, u can make all sources available up to 35 fields. Then you need to define the fields. There are many tools available in the teapot for detecting a data type, string manipulation, regular expression, etc.

Actually, it sounds like you need automatic field detection.

But without real data, it's hard to see a pattern. Since u implements this logic of detecting fields at the database level, then this is also possible in the teapot.

Anyway. If the logic is really complicated, use JavaStep, JavaScript.

How to combine 2 csv files in columns in a spoon, pentaho, while managing data conversion?

More articles: