Replacing empty values ​​(space) with NaN in pandas

I want to find all the values ​​in a Pandas data frame that contain spaces (any arbitrary amount) and replace these values ​​with NaN.

Any ideas how this can be improved?

Basically I want to include this:

ABC 2000-01-01 -0.532681 foo 0 2000-01-02 1.490752 bar 1 2000-01-03 -1.387326 foo 2 2000-01-04 0.814772 baz 2000-01-05 -0.222552 4 2000-01-06 -1.176781 qux 

In it:

  ABC 2000-01-01 -0.532681 foo 0 2000-01-02 1.490752 bar 1 2000-01-03 -1.387326 foo 2 2000-01-04 0.814772 baz NaN 2000-01-05 -0.222552 NaN 4 2000-01-06 -1.176781 qux NaN 

I managed to do this with the code below, but man is ugly. This is not Pythonic, and I'm sure this is not the most efficient use of pandas either. I iterate over each column and perform a logical replacement for the column mask created using a function that searches for regular expressions for each value, matching it with spaces.

 for i in df.columns: df[i][df[i].apply(lambda i: True if re.search('^\s*$', str(i)) else False)]=None 

This could be slightly optimized by looking only at fields that may contain empty lines:

 if df[i].dtype == np.dtype('object') 

But that is not much improvement.

And finally, this code sets the target lines to None, which works with Pandas functions like fillna() , but it would be nice for completeness if I could insert NaN directly instead of None .

+123
python pandas dataframe
Nov 18 '12 at 22:22
source share
12 answers

I think df.replace() does the job:

 df = pd.DataFrame([ [-0.532681, 'foo', 0], [1.490752, 'bar', 1], [-1.387326, 'foo', 2], [0.814772, 'baz', ' '], [-0.222552, ' ', 4], [-1.176781, 'qux', ' '], ], columns='AB C'.split(), index=pd.date_range('2000-01-01','2000-01-06')) # replace field that entirely space (or empty) with NaN print(df.replace(r'^\s*$', np.nan, regex=True)) 

Produces:

  ABC 2000-01-01 -0.532681 foo 0 2000-01-02 1.490752 bar 1 2000-01-03 -1.387326 foo 2 2000-01-04 0.814772 baz NaN 2000-01-05 -0.222552 NaN 4 2000-01-06 -1.176781 qux NaN 



As Temak pointed out, use df.replace(r'^\s+$', np.nan, regex=True) if your actual data contains spaces.

+151
Feb 21 '14 at 18:48
source share
β€” -

If you want to replace the empty string and entries with spaces only, the correct answer is:!

 df = df.replace(r'^\s*$', np.nan, regex=True) 

Accepted answer

 df.replace(r'\s+', np.nan, regex=True) 

Does not replace an empty string!, You can try yourself with a slightly updated example:

 df = pd.DataFrame([ [-0.532681, 'foo', 0], [1.490752, 'bar', 1], [-1.387326, 'fo o', 2], [0.814772, 'baz', ' '], [-0.222552, ' ', 4], [-1.176781, 'qux', ''], ], columns='AB C'.split(), index=pd.date_range('2000-01-01','2000-01-06')) 

Also note that "fo o" is not replaced with Nan, although it contains a space. Further note that this is simple:

 df.replace(r'', np.NaN) 

Doesn’t work either - try it.

+37
Dec 14 '17 at 10:20
source share

What about:

 d = d.applymap(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x) 

The applymap function applies a function to each cell of the information frame.

+32
Nov 18 '12 at 23:15
source share

I will do the following:

 df = df.apply(lambda x: x.str.strip()).replace('', np.nan) 

or

 df = df.apply(lambda x: x.str.strip() if isinstance(x, str) else x).replace('', np.nan) 

You can delete the entire line and then replace the empty line with np.nan .

+14
Apr 29 '16 at 9:34
source share

The simplest of all solutions:

 df = df.replace(r'^\s+$', np.nan, regex=True) 
+6
Mar 22 '18 at 14:44
source share

If you export data from a CSV file, it could be so simple:

 df = pd.read_csv(file_csv, na_values=' ') 

This will create a data frame and also replace empty values ​​like Na

+5
Jan 07 '18 at 16:07
source share

For a very quick and easy solution where you check equality against a single value, you can use the mask method.

 df.mask(df == ' ') 
+1
Nov 03 '17 at 22:48
source share

You can also use a filter to do this.

 df = PD.DataFrame([ [-0.532681, 'foo', 0], [1.490752, 'bar', 1], [-1.387326, 'foo', 2], [0.814772, 'baz', ' '], [-0.222552, ' ', 4], [-1.176781, 'qux', ' ']) df[df=='']='nan' df=df.astype(float) 
+1
Feb 01 '18 at 10:14
source share
 print(df.isnull().sum()) # check numbers of null value in each column modifiedDf=df.fillna("NaN") # Replace empty/null values with "NaN" # modifiedDf = fd.dropna() # Remove rows with empty values print(modifiedDf.isnull().sum()) # check numbers of null value in each column 
0
Sep 29 '18 at 20:31
source share

This is not an elegant solution, but it seems that saving to XLSX works and then importing it back. Other solutions on this page did not help me, I don’t know why.

 data.to_excel(filepath, index=False) data = pd.read_excel(filepath) 
0
Jan 14 '19 at 5:02
source share

All of them are close to the correct answer, but I would not say that this will solve the problem, remaining the most readable for others reading your code. I would say that the answer is a combination of the BrenBarn answer and the tuomasttik comment under that answer . BrenBarn's answer uses built-in isspace , but it does not support deleting blank lines as requested by the OP, and I would attribute this to the standard isspace replacing strings with zero.

I rewrote it with .apply so you can call it on pd.Series or pd.DataFrame .




Python 3:

To replace empty lines or lines with full spaces:

 df = df.apply(lambda x: np.nan if isinstance(x, str) and (x.isspace() or not x) else x) 

To replace strings with full spaces:

 df = df.apply(lambda x: np.nan if isinstance(x, str) and x.isspace() else x) 



To use this in Python 2, you need to replace str with basestring .

Python 2:

To replace empty lines or lines with full spaces:

 df = df.apply(lambda x: np.nan if isinstance(x, basestring) and (x.isspace() or not x) else x) 

To replace strings with full spaces:

 df = df.apply(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x) 
0
May 12 '19 at 4:05
source share

I tried this code and it worked for me: df.applymap (lambda x: "NaN", if x == "" otherwise x)

-one
Aug 07 '19 at 20:53 on
source share



All Articles