Replace special characters in python string

I use urllib to get an html string from a website and must put each word in an html document in a list.

Here is the code I have. I keep getting the error. I also copied the error below.

import urllib.request url = input("Please enter a URL: ") z=urllib.request.urlopen(url) z=str(z.read()) removeSpecialChars = str.replace(" !@ #$%^&*()[]{};:,./<>?\|`~-=_+", " ") words = removeSpecialChars.split() print ("Words list: ", words[0:20]) 

Here is the error.

 Please enter a URL: http://simleyfootball.com Traceback (most recent call last): File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in <module> removeSpecialChars = str.replace(" !@ #$%^&*()[]{};:,./<>?\|`~-=_+", " ") TypeError: replace() takes at least 2 arguments (1 given) 
+7
python string list replace urllib
source share
5 answers

str.replace is the wrong function for what you want to do (except that it is used incorrectly). You want to replace any character in the set with a space, and not with the entire set with one space (the latter is what will replace). You can use the translation as follows:

 removeSpecialChars = z.translate ({ord(c): " " for c in " !@ #$%^&*()[]{};:,./<>?\|`~-=_+"}) 

This creates a mapping that maps each character in the special character list to a space, and then calls translate () on the string, replacing each character in the special character set with a space.

+14
source share

One way is to use re.sub , which is my preferred way.

 import re my_str = "hey th~!ere" my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str) print my_new_string 

Output:

 hey there 

Another way is to use re.escape :

 import string import re my_str = "hey th~!ere" chars = re.escape(string.punctuation) print re.sub(r'['+chars+']', '',my_str) 

Output:

 hey there 

Only a small remove_special_chars advice on parameter style in python with PEP-8 parameters should be remove_special_chars , not removeSpecialChars

Also, if you want to save , spaces just change [^a-zA-Z0-9 \n\.] To [^a-zA-Z0-9\n\.]

+24
source share

replace works with a specific string, so you need to call it like this:

 removeSpecialChars = z.replace(" !@ #$%^&*()[]{};:,./<>?\|`~-=_+", " ") 

but this is probably not what you need, as it will search for a single line containing all the characters in the same order. You can do this with a regex, as Danny Michaud pointed out.

as an additional note, you may need BeautifulSoup , which is a library for parsing dirty HTML-formatted text, similar to what you usually get from scaping websites.

+2
source share

You need to call replace on z , not str , since you want to replace the characters located in the string variable z

 removeSpecialChars = z.replace(" !@ #$%^&*()[]{};:,./<>?\|`~-=_+", " ") 

But this will not work, since the substitute is looking for a substring, you will most likely need to use the re regular expression module with the sub function:

 import re removeSpecialChars = re.sub("[ !@ #$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z) 

Do not forget [] , which indicates that it is a set of characters to replace.

+2
source share

You can replace special characters with desired characters as follows:

 import string specialCharacterText = "H#y #@w @re &*)?" inCharSet = " !@ #$%^&*()[]{};:,./<>?\|`~-=_+\"" outCharSet = " " #corresponding characters in inCharSet to be replaced splCharReplaceList = string.maketrans(inCharSet, outCharSet) splCharFreeString = specialCharacterText.translate(splCharReplaceList) 
0
source share

All Articles