Replace characters not working in python

Question

Replace characters not working in python

I use a beautiful soup, and I write the cover and have the following code in it:

print soup.originalEncoding #self.addtoindex(page, soup) links=soup('a') for link in links: if('href' in dict(link.attrs)): link['href'].replace('..', '') url=urljoin(page, link['href']) if url.find("'") != -1: continue url = url.split('?')[0] url = url.split('#')[0] if url[0:4] == 'http': newpages.add(url) pages = newpages

It is assumed that link['href'].replace('..', '') captures links that come out as .. /contact/orderform.aspx,../contact/requestconsult.aspx, etc. However, it does not work. The links still have the leading ".." Is there something I can't see?

+13

python

sdiener Aug 26 '11 at 18:09

source share

3 answers

string.replace() returns a copy of the string with replaced characters, since strings in Python are immutable. Try

 s = link['href'].replace("..", '') url=urljoin(page, s)

+12

jan zegan Aug 26 '11 at 18:21

source share

This is not a replacement in place. You need to do:

 link['href'] = link['href'].replace('..', '')

Example:

 a = "abc.." print a.replace("..","") 'abc' print a 'abc..' a = a.replace("..","") print a 'abc'

+6

Urjit Aug 26 '11 at 18:17

source share

joel goldstick · Accepted Answer · 2011-08-26T18:15:29+0000

string.replace () returns a string with replaced values. It does not modify the original, so do the following:

 link['href'] = link['href'].replace("..", "")

Replace characters not working in python

More articles: