Is it possible to remove script tags using BeautifulSoup?

Question

Is it possible to remove script tags using BeautifulSoup?

Is it possible to remove script tags and all their contents from HTML using BeautifulSoup, or do I need to use regular expressions or something else?

+55

python html beautifulsoup

Sam Apr 08 2018-11-11T00:

source share

3 answers

As stated in the official documentation ( ), you can use the extract method to remove the entire subtree that matches the search.

 import BeautifulSoup a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>") [x.extract() for x in a.findAll('script')]

+12

Santiago Alessandri Apr 08 '11 at 17:33

source share

An updated answer for those who may need it for future reference: The correct answer. decompose() You can use different methods, but decompose works in place.

Usage example:

 soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>') soup.i.decompose() print str(soup) #prints '<p>This is a slimy text and</p>'

It’s pretty useful to get rid of detritus, like 'script', 'img', etc.

+10

Vangel Oct 09 '16 at 15:11

source share

Fábio Diniz · Accepted Answer · 2011-04-08 17:31

>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'lxml') >>> [s.extract() for s in soup('script')] >>> soup baba

Is it possible to remove script tags using BeautifulSoup?

More articles: