Using Beautiful Soup, how do I iterate over all inline texts?
Suppose the variable test_html has the following html content:
<html> <head><title>Test title</title></head> <body> <p>Some paragraph</p> Useless Text <a href="http://stackoverflow.com">Some link</a>not a link <a href="http://python.org">Another link</a> </body></html> Just do the following:
from BeautifulSoup import BeautifulSoup test_html = load_html_from_above() soup = BeautifulSoup(test_html) for t in soup.findAll(text=True): text = unicode(t) for vowel in u'aeiou': text = text.replace(vowel, u'') t.replaceWith(text) print soup What prints:
<html> <head><title>Tst ttl</title></head> <body> <p>Sm prgrph</p> Uslss Txt <a href="http://stackoverflow.com">Sm lnk</a>nt lnk <a href="http://python.org">Anthr lnk</a> </body></html> Please note that tags and attributes are not affected.