I have 2 csv files with the same column names but different values.
The first column is the index ( time ), and one of the data columns is a unique identifier ( id )
The index ( time ) is different for each csv file.
I read the data on 2 data frames using read_csv , providing me with the following:
+-------+------+-------+ | id | size | price | +-------+-------+------+-------+ | time | | | | +-------+-------+------+-------+ | t0 | ID1 | 10 | 110 | | t2 | ID1 | 12 | 109 | | t6 | ID1 | 20 | 108 | +-------+-------+------+-------+ +-------+------+-------+ | id | size | price | +-------+-------+------+-------+ | time | | | | +-------+-------+------+-------+ | t1 | ID2 | 9 | 97 | | t3 | ID2 | 15 | 94 | | t5 | ID2 | 13 | 100 | +-------+-------+------+-------+
I would like to create a single large framework with entries for both and use ffill to send fill values ββfrom the previous time step.
I can achieve this using a combination of concat , sort and ffill .
However, this requires first renaming the columns of one of the data frames so that there are no name conflicts
df2.columns = [ 'id', 'id2_size', 'id2_price' ] df = pd.concat([df1, df2]).sort().ffill()
This results in the following file frame:
+------+------+-------+----------+-----------+ | id | size | price | id2_size | id2_price | +-------+------+------+-------+----------+-----------+ | time | | | | | | +-------+------+------+-------+----------+-----------+ | t0 | ID1 | 10 | 110 | nan | nan | | t1 | ID2 | 10 | 110 | 9 | 97 | | t2 | ID1 | 12 | 109 | 9 | 97 | | t3 | ID2 | 12 | 109 | 15 | 94 | | t5 | ID2 | 12 | 109 | 13 | 100 | | t6 | ID1 | 20 | 108 | 13 | 100 | +-------+------+------+-------+----------+-----------+
My current method is pretty klunky in that I need to rename the columns of one of the data files.
I believe that the best way to represent the data would be to use multiindex with the second dimension value coming from the id column.
The resulting data file will look like this:
+--------------+--------------+ | ID1 | ID2 | +------+-------+------+-------+ | size | price | size | price | +-------+------+-------+------+-------+ | time | | | | | +-------+------+-------+------+-------+ | t0 | 10 | 110 | nan | nan | | t1 | 10 | 110 | 9 | 97 | | t2 | 12 | 109 | 9 | 97 | | t3 | 12 | 109 | 15 | 94 | | t5 | 12 | 109 | 13 | 100 | | t6 | 20 | 108 | 13 | 100 | +-------+------+-------+------+-------+
Is it possible?
If so, what steps will be required to move from two data frames read from csv to the final combined multi-indexed file frame?