I have input in a flattened file. I want to normalize this data by dividing it into tables. Can I do this neatly with pandas , that is, by reading flattened data into a DataFrame instance, and then applying some functions to get the resulting DataFrame instances?
Example:
The data is transferred to my disk as a CSV file as follows:
ItemId ClientId PriceQuoted ItemDescription 1 1 10 scroll of Sneak 1 2 12 scroll of Sneak 1 3 13 scroll of Sneak 2 2 2500 scroll of Invisible 2 4 2200 scroll of Invisible
I want to create two DataFrames:
ItemId ItemDescription 1 scroll of Sneak 2 scroll of Invisibile
and
ItemId ClientId PriceQuoted 1 1 10 1 2 12 1 3 13 2 2 2500 2 4 2200
If pandas has only a good solution for the simplest case (normalization leads to two tables with many relationships - as in the previous example), this may be enough for my current needs. However, I may need a more general solution in the future.
max
source share