So, I imported and merged 4 csv into one data file called data. However, when checking the dataframe index with:
index_series = pd.Series(data.index.values)
index_series.value_counts()
I see that several index entries have 4 accounts. I want to completely reindex the DataFrame of the data, so each row now has a unique index value. I tried:
data.reindex(np.arange(len(data)))
which gave the error "ValueError: cannot be re-indexed from the duplicate axis". A Google search leads me to the idea that this error is due to the fact that there are up to 4 rows that have the same index value. Any idea how I can do this reindexing without discarding any rows? I don't really care about the order of the lines, since I can always sort it.
UPDATE: Therefore, in the end, I found a way to reindex as I wanted.
data['index'] = np.arange(len(data))
data = data.set_index('index')
As I understand it, I just added a new “index” column to my data frame, and then set that column as my index. As for my csv, they were the four csvs in the "download credit details" section of this credit club credit history page .
source
share