Hello there!Hi! beco...">

Using Beautiful Soup, how do I iterate over all inline texts?

Say I wanted to remove vowels from HTML:

<a href="foo">Hello there!</a>Hi! 

becomes

 <a href="foo">Hll thr!</a>H! 

I believe this is work for Beautiful Soup. How can I select text between tags and work with it as follows?

+4
source share
1 answer

Suppose the variable test_html has the following html content:

 <html> <head><title>Test title</title></head> <body> <p>Some paragraph</p> Useless Text <a href="http://stackoverflow.com">Some link</a>not a link <a href="http://python.org">Another link</a> </body></html> 

Just do the following:

 from BeautifulSoup import BeautifulSoup test_html = load_html_from_above() soup = BeautifulSoup(test_html) for t in soup.findAll(text=True): text = unicode(t) for vowel in u'aeiou': text = text.replace(vowel, u'') t.replaceWith(text) print soup 

What prints:

 <html> <head><title>Tst ttl</title></head> <body> <p>Sm prgrph</p> Uslss Txt <a href="http://stackoverflow.com">Sm lnk</a>nt lnk <a href="http://python.org">Anthr lnk</a> </body></html> 

Please note that tags and attributes are not affected.

+8
source

All Articles