ValueError: index contains duplicate records, cannot change form

I am trying to change the shape of a pd data frame with the following function:

ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate') 

The data set looks like this:

  Merch Ref Received acceptance_rate 0 SF 2014-08-28 15:38:00 0 1 SF 2014-08-28 15:44:00 0 2 SF 2014-08-28 16:04:00 0 3 WF 2014-08-28 16:05:00 0 4 WF 2014-08-28 16:07:00 0 5 SF 2014-08-28 16:34:00 0 6 SF 2014-08-28 16:55:00 0 7 BF 2014-08-28 17:59:00 0 8 BF 2014-08-29 15:05:00 0 9 SF 2014-08-29 21:25:00 0 10 SF 2014-08-30 10:29:00 0 ... 

I would like to receive:

  SF WF BF 2014-08-28 15:38:00 0 1 0 2014-08-28 15:44:00 0 1 0 2014-08-28 16:04:00 0 0 1 2014-08-28 16:05:00 1 1 0 2014-08-28 16:07:00 0 0 1 2014-08-28 16:34:00 1 1 0 2014-08-28 16:55:00 1 1 0 2014-08-28 17:59:00 0 1 0 2014-08-29 15:05:00 0 0 1 2014-08-29 21:25:00 0 0 1 2014-08-30 10:29:00 0 1 0 

However, I get the error message:

  ValueError: Index contains duplicate entries, cannot reshape 

This is because I have several orders placed at the same time. Is there any way to summarize / summarize these orders?

+6
source share
1 answer

As you determined, the error comes from duplicates in pairs (x, y) for x in Received and y in Merch Ref .

If you want to copy sum , then

 ar.pivot_table(index='Received', columns='Merch Ref', values='acceptance_rate', aggfunc=np.sum) 

. The default aggregation function is mean . I.e

 ar.pivot_table(index='Received', columns='Merch Ref', values='acceptance_rate') 

will rotate the table, and all records with the same (x, y) pair will be aggregated using the np.mean function.

Note. I initially got the same error, but after iterating through the pairs (x, y), I did not find duplicates. It turns out that some of the pairs had the form ( nan , nan ) and were excluded from the iteration process. Thus, for other users trying to debug what they consider to be unique pairs, consider checking nan for pd.isnull or pd.notnull .

0
source

All Articles