Removing specific control characters (\ n \ r \ t) from a string

I have a fairly large amount of text, including control characters such as \ n \ t and \ r. I need to replace them with simple space → "". What is the fastest way to do this? Thanks

+6
python string
source share
6 answers

I think the fastest way is to use str.translate() :

 import string s = "a\nb\rc\td" print s.translate(string.maketrans("\n\t\r", " ")) 

prints

 abcd 

EDIT : Since this has again turned into a discussion about performance, here are some numbers. For long strings, translate() is faster than using regular expressions:

 s = "a\nb\rc\td " * 1250000 regex = re.compile(r'[\n\r\t]') %timeit t = regex.sub(" ", s) # 1 loops, best of 3: 1.19 s per loop table = string.maketrans("\n\t\r", " ") %timeit s.translate(table) # 10 loops, best of 3: 29.3 ms per loop 

Which is about 40 times.

+22
source share

You can also try regular expressions:

 import re regex = re.compile(r'[\n\r\t]') regex.sub(' ', my_str) 
+8
source share
 >>> re.sub(r'[\t\n\r]', ' ', '1\n2\r3\t4') '1 2 3 4' 
+5
source share

If you want to normalize spaces (replace the spaces of one or more white space characters with one space and separate the start and end white space), this can be done using string methods:

 >>> text = ' foo\tbar\r\nFred Nurke\t Joe Smith\n\n' >>> ' '.join(text.split()) 'foo bar Fred Nurke Joe Smith' 
+3
source share

using regex

 re.sub(r'\s+', ' ', '1\n2\r3\t4') 

without regular expression

 >>> ' '.join('1\n\n2\r3\t4'.split()) '1 2 3 4' >>> 
+2
source share

's' is the line where you want to remove certain control characters. Since strings are immutable in python, after replacing the operation, you need to assign a different string.

s = re.sub (r '[\ n \ r \ t] *', '', s)

+1
source share

All Articles