Replace characters not working in python

I use a beautiful soup, and I write the cover and have the following code in it:

print soup.originalEncoding #self.addtoindex(page, soup) links=soup('a') for link in links: if('href' in dict(link.attrs)): link['href'].replace('..', '') url=urljoin(page, link['href']) if url.find("'") != -1: continue url = url.split('?')[0] url = url.split('#')[0] if url[0:4] == 'http': newpages.add(url) pages = newpages 

It is assumed that link['href'].replace('..', '') captures links that come out as .. /contact/orderform.aspx,../contact/requestconsult.aspx, etc. However, it does not work. The links still have the leading ".." Is there something I can't see?

+13
source share
3 answers

string.replace () returns a string with replaced values. It does not modify the original, so do the following:

 link['href'] = link['href'].replace("..", "") 
+43
source

string.replace() returns a copy of the string with replaced characters, since strings in Python are immutable. Try

 s = link['href'].replace("..", '') url=urljoin(page, s) 
+12
source

This is not a replacement in place. You need to do:

 link['href'] = link['href'].replace('..', '') 

Example:

 a = "abc.." print a.replace("..","") 'abc' print a 'abc..' a = a.replace("..","") print a 'abc' 
+6
source

All Articles