Using Beautiful Soup, how do I iterate over all inline texts?

Question

Using Beautiful Soup, how do I iterate over all inline texts?

Say I wanted to remove vowels from HTML:

<a href="foo">Hello there!</a>Hi!

becomes

 <a href="foo">Hll thr!</a>H!

I believe this is work for Beautiful Soup. How can I select text between tags and work with it as follows?

+4

python beautifulsoup

mike May 06 '09 at 18:26

source share

1 answer

nosklo · Accepted Answer · 2009-05-06T20:18:58+0000

Suppose the variable test_html has the following html content:

 <html> <head><title>Test title</title></head> <body> <p>Some paragraph</p> Useless Text <a href="http://stackoverflow.com">Some link</a>not a link <a href="http://python.org">Another link</a> </body></html>

Just do the following:

 from BeautifulSoup import BeautifulSoup test_html = load_html_from_above() soup = BeautifulSoup(test_html) for t in soup.findAll(text=True): text = unicode(t) for vowel in u'aeiou': text = text.replace(vowel, u'') t.replaceWith(text) print soup

What prints:

 <html> <head><title>Tst ttl</title></head> <body> <p>Sm prgrph</p> Uslss Txt <a href="http://stackoverflow.com">Sm lnk</a>nt lnk <a href="http://python.org">Anthr lnk</a> </body></html>

Please note that tags and attributes are not affected.

Using Beautiful Soup, how do I iterate over all inline texts?

More articles: