How to remove all script tags in BeautifulSoup?

I am scanning a table from a web link and would like to rebuild the table by removing all script tags. Here are the source codes.

response = requests.get(url) soup = BeautifulSoup(response.text) table = soup.find('table') for row in table.find_all('tr') : for col in row.find_all('td'): #remove all different script tags #col.replace_with('') #col.decompose() #col.extract() col = col.contents 

How to remove all script tags? Take the following cell as an example, which includes the tag a , br and td .

 <td><a href="http://www.irit.fr/SC">Signal et Communication</a> <br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a> </td> 

Expected Result:

 Signal et Communication Ingénierie Réseaux et Télécommunications 
+5
source share
2 answers

You are asking about get_text() :

If you only need the text part of the document or tag, you can use get_text() . It returns all the text in the document or under the tag as a single Unicode string

 td = soup.find("td") td.get_text() 

Note that .string will return None in this case, since td has several children:

If the tag contains several things, then it is not clear that the .string should reference, therefore .string is defined as None

Demo:

 >>> from bs4 import BeautifulSoup >>> >>> soup = BeautifulSoup(u""" ... <td><a href="http://www.irit.fr/SC">Signal et Communication</a> ... <br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a> ... </td> ... """) >>> >>> td = soup.td >>> print td.string None >>> print td.get_text() Signal et Communication Ingénierie Réseaux et Télécommunications 
+5
source

Try calling col.string. This will give you only text.

+1
source

All Articles