How to remove a carriage return in a data frame

Question

How to remove a carriage return in a data frame

I have a data block containing columns named id, country_name, location and total_deaths. While performing the data cleaning process, I came across a value that has '\r' . As soon as I complete the cleaning process, I store the resulting data file in destination.csv. Since the above line has \r , it always creates a new line.

 id 29 location Uttar Pradesh\r country_name India total_deaths 20

I want to remove \r . I tried df.replace({'\r': ''}, regex=True) . This does not work for me.

Is there any other solution. Can anyone help?

Edit:

In the process described above, I repeat df to see if \r present. If there is, then you need to replace. Here row.replace() or row.str.strip() doesn't seem to work, otherwise I could have done it wrong.

I do not want to specify a column name or row number when using replace() . Because I cannot be sure that the "location" column will have \r . Please find the code below.

 count = 0 for row_index, row in df.iterrows(): if re.search(r"\\r", str(row)): print type(row) #Return type is pandas.Series row.replace({r'\\r': ''} , regex=True) print row count += 1

+5

python pandas replace carriage-return data-cleaning

Saranya krishnamurthy May 11 '16 at 11:13

source share

2 answers

use str.replace , you need to avoid the sequence so that it treats it as a carriage return, not the literal \r :

 In [15]: df['29'] = df['29'].str.replace(r'\\r','') df Out[15]: id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

+1

Edchum May 11 '16 at 11:14

source share

jezrael · Accepted Answer · 2016-05-11T11:17:16+0000

Another solution is to use str.strip :

 df['29'] = df['29'].str.strip(r'\\r') print df id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

If you want to use replace , add r and one \ :

 print df.replace({r'\\r': ''}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

In replace you can define the column to replace, for example:

 print df id 29 0 location Uttar Pradesh\r 1 country_name India 2 total_deaths\r 20 print df.replace({'29': {r'\\r': ''}}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths\r 20 print df.replace({r'\\r': ''}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

EDIT by comments:

 import pandas as pd df = pd.read_csv('data_source_test.csv') print df id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh\r\n 20 9 10 India Orissa 69 print df.replace({r'\r\n': ''}, regex=True) id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh 20 9 10 India Orissa 69

If you need to replace only the location column:

 df['location'] = df.location.str.replace(r'\r\n', '') print df id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh 20 9 10 India Orissa 69

How to remove a carriage return in a data frame

Edit:

More articles: