Removing specific control characters (\ n \ r \ t) from a string

Question

Removing specific control characters (\ n \ r \ t) from a string

I have a fairly large amount of text, including control characters such as \ n \ t and \ r. I need to replace them with simple space → "". What is the fastest way to do this? Thanks

+6

python string

Hossein Feb 10 '11 at 9:44

source share

6 answers

You can also try regular expressions:

 import re regex = re.compile(r'[\n\r\t]') regex.sub(' ', my_str)

+8

Michal Chruszcz Feb 10 '11 at 9:48

source share

 >>> re.sub(r'[\t\n\r]', ' ', '1\n2\r3\t4') '1 2 3 4'

+5

Ignacio Vazquez-Abrams Feb 10 '11 at 9:48

source share

If you want to normalize spaces (replace the spaces of one or more white space characters with one space and separate the start and end white space), this can be done using string methods:

 >>> text = ' foo\tbar\r\nFred Nurke\t Joe Smith\n\n' >>> ' '.join(text.split()) 'foo bar Fred Nurke Joe Smith'

+3

John machin Feb 10 '11 at 10:57

source share

using regex

 re.sub(r'\s+', ' ', '1\n2\r3\t4')

without regular expression

 >>> ' '.join('1\n\n2\r3\t4'.split()) '1 2 3 4' >>>

+2

kurumi Feb 10 '11 at 10:50

source share

's' is the line where you want to remove certain control characters. Since strings are immutable in python, after replacing the operation, you need to assign a different string.

s = re.sub (r '[\ n \ r \ t] *', '', s)

+1

Srikanth May 31 '17 at 12:40

source share

Sven marnach · Accepted Answer · 2011-02-10T09:50:33+0000

I think the fastest way is to use str.translate() :

 import string s = "a\nb\rc\td" print s.translate(string.maketrans("\n\t\r", " "))

prints

 abcd

EDIT : Since this has again turned into a discussion about performance, here are some numbers. For long strings, translate() is faster than using regular expressions:

 s = "a\nb\rc\td " * 1250000 regex = re.compile(r'[\n\r\t]') %timeit t = regex.sub(" ", s) # 1 loops, best of 3: 1.19 s per loop table = string.maketrans("\n\t\r", " ") %timeit s.translate(table) # 10 loops, best of 3: 29.3 ms per loop

Which is about 40 times.

Removing specific control characters (\ n \ r \ t) from a string

More articles: