How to remove a carriage return in a data frame

I have a data block containing columns named id, country_name, location and total_deaths. While performing the data cleaning process, I came across a value that has '\r' . As soon as I complete the cleaning process, I store the resulting data file in destination.csv. Since the above line has \r , it always creates a new line.

 id 29 location Uttar Pradesh\r country_name India total_deaths 20 

I want to remove \r . I tried df.replace({'\r': ''}, regex=True) . This does not work for me.

Is there any other solution. Can anyone help?

Edit:

In the process described above, I repeat df to see if \r present. If there is, then you need to replace. Here row.replace() or row.str.strip() doesn't seem to work, otherwise I could have done it wrong.

I do not want to specify a column name or row number when using replace() . Because I cannot be sure that the "location" column will have \r . Please find the code below.

 count = 0 for row_index, row in df.iterrows(): if re.search(r"\\r", str(row)): print type(row) #Return type is pandas.Series row.replace({r'\\r': ''} , regex=True) print row count += 1 
+5
source share
2 answers

Another solution is to use str.strip :

 df['29'] = df['29'].str.strip(r'\\r') print df id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20 

If you want to use replace , add r and one \ :

 print df.replace({r'\\r': ''}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20 

In replace you can define the column to replace, for example:

 print df id 29 0 location Uttar Pradesh\r 1 country_name India 2 total_deaths\r 20 print df.replace({'29': {r'\\r': ''}}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths\r 20 print df.replace({r'\\r': ''}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20 

EDIT by comments:

 import pandas as pd df = pd.read_csv('data_source_test.csv') print df id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh\r\n 20 9 10 India Orissa 69 print df.replace({r'\r\n': ''}, regex=True) id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh 20 9 10 India Orissa 69 

If you need to replace only the location column:

 df['location'] = df.location.str.replace(r'\r\n', '') print df id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh 20 9 10 India Orissa 69 
+6
source

use str.replace , you need to avoid the sequence so that it treats it as a carriage return, not the literal \r :

 In [15]: df['29'] = df['29'].str.replace(r'\\r','') df Out[15]: id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20 
+1
source

All Articles