Re-index the framework with duplicate index values

So, I imported and merged 4 csv into one data file called data. However, when checking the dataframe index with:

index_series = pd.Series(data.index.values)
index_series.value_counts()

I see that several index entries have 4 accounts. I want to completely reindex the DataFrame of the data, so each row now has a unique index value. I tried:

data.reindex(np.arange(len(data)))

which gave the error "ValueError: cannot be re-indexed from the duplicate axis". A Google search leads me to the idea that this error is due to the fact that there are up to 4 rows that have the same index value. Any idea how I can do this reindexing without discarding any rows? I don't really care about the order of the lines, since I can always sort it.

UPDATE: Therefore, in the end, I found a way to reindex as I wanted.

data['index'] = np.arange(len(data))
data = data.set_index('index')

As I understand it, I just added a new “index” column to my data frame, and then set that column as my index. As for my csv, they were the four csvs in the "download credit details" section of this credit club credit history page .

+4
source share
1 answer

It is very easy to replicate your error using this data:

In [92]: data = pd.DataFrame( [33,55,88,22], columns=['x'], index=[0,0,1,2] )

In [93]: data.index.is_unique
Out[93]: False

In [94:] data.reindex(np.arange(len(data)))  # same error message

The problem is that it reindexrequires unique index values. In this case, you do not want to keep the old index values, you just want the new index values ​​to be unique. The easiest way to do this:

In [95]: data.reset_index(drop=True)
Out[72]: 
    x
0  33
1  55
2  88
3  22

, drop=True, .

+5

All Articles