Assign containers to Pandas

I want to replace the None entries in a specific column in Pandas with an empty list.

Please note that some entries in this column may have an empty list in them, and I do not want to touch them.

I tried:

 indices = np.equal(df[col],None) df[col][indices] = [] 

and

 indices = np.equal(df[col],None) df[col][indices] = list() 

but both solutions fail:

 ValueError: Length of replacements must equal series length 

Why? How can I update these specific lines with an empty list?

+1
python pandas
source share
2 answers

The use of endemic lists is not allowed during the appointment and it is not recommended to do this at all.

You can do it if you create from scratch.

 In [50]: DataFrame({ 'A' : [[],[],1]}) Out[50]: A 0 [] 1 [] 2 1 [3 rows x 1 columns] 

The reason this is unacceptable is that without directions (e.g. in numpy) you can do something like this:

 In [51]: df = DataFrame({ 'A' : [1,2,3] }) In [52]: df.loc[df['A'] == 2] = [ 5 ] In [53]: df Out[53]: A 0 1 1 5 2 3 [3 rows x 1 columns] 

You can perform an assignment in which the length of the True values ​​in the mask is equal to the length of the / tuple / ndarray list on rhs (for example, the value you set). Pandas allows this, as well as a length that is exactly equal to lhs, and a scalar. Any other is expressly forbidden because it is ambiguous (for example, do you want to align it or not?)

For example, imagine:

 In [54]: df = DataFrame({ 'A' : [1,2,3] }) In [55]: df.loc[df['A']<3] = [5] ValueError: cannot set using a list-like indexer with a different length than the value 

A list with a length of 0 / tuple / ndarray is considered an error, not because it is impossible, but, as a rule, its user error, it is not clear what to do.

Bottom line, do not use lists inside the Pandas object. This is inefficient and just makes interpretation difficult / impossible.

+6
source share

Change: saved my original answer below, but I raised it without testing, and it actually does not work for me.

 import pandas as pd import numpy as np ser1 = pd.Series(['hi',None,np.nan]) ser2 = pd.Series([5,7,9]) df = pd.DataFrame([ser1,ser2]).T 

This is junki, I know. Also, apparently, the DataFrame constructor (but not the Series constructor) casts None to np.nan. I have no idea why.

 df.loc[1,0] = None 

So now we have

  0 1 0 'hi' 5 1 None 7 2 NaN 9 df.columns = ['col1','col2'] mask = np.equal(df['col1'], None) df.loc[mask, 'col1'] = [] 

But that doesn’t mean anything. The data frame looks the same as before. I follow the recommended use from the documentation and assign the basic types (strings and numbers). So for me the problem is assigning objects to data items. I have no idea what happened.


(Original answer)

Two things:

  1. I am not familiar with np.equal , but pandas.isnull() should also work if you want to capture all null values.
  2. You are doing what is called a "chain assignment." I do not fully understand the problem, but I know that it does not work. In the docs .

Try it:

 mask = pandas.isnull(df[col]) df.loc[mask, col] = list() 

Or, if you want to catch only None , not np.nan :

 mask = np.equal(df[col].values, None) df.loc[mask, col] = list() 

Note. While pandas.isnull works with None on data frames, rows, and arrays, as expected, numpy.equal only works as expected with data frames and arrays. A series of pandas None will not return True for any of them. This is because None only behaves selectively as np.nan . See ERROR: No, Not Equal No, # 20442

+1
source share

All Articles