How to write output to an HTML file using Python BeautifulSoup

I modified the HTML file by removing some tags using beautifulsoup . Now I want to write the results back to an HTML file. My code is:

 from bs4 import BeautifulSoup from bs4 import Comment soup = BeautifulSoup(open('1.html'),"html.parser") [x.extract() for x in soup.find_all('script')] [x.extract() for x in soup.find_all('style')] [x.extract() for x in soup.find_all('meta')] [x.extract() for x in soup.find_all('noscript')] [x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))] html =soup.contents for i in html: print i html = soup.prettify("utf-8") with open("output1.html", "wb") as file: file.write(html) 

Since I used soup.prettify, it generates HTML as follows:

 <p> <strong> BATAM.TRIBUNNEWS.COM, BINTAN </strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> Polres </a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"> Bintan </a> , Senin (3/10/2016). </p> 

I want to get the result as print i do:

 <p><strong>BATAM.TRIBUNNEWS.COM, BINTAN</strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">Polres</a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">Bintan</a>, Senin (3/10/2016).</p> <p>Empat perwira baru Senin itu diminta cepat bekerja. Tumpukan pekerjaan rumah sudah menanti di meja masing masing.</p> 

How can I get the result in the same way as print i (i.e. the tag and its contents are displayed on the same line)? Thank you

+22
python html beautifulsoup bs4
source share
3 answers

Just convert the soup instance to a string and write:

 with open("output1.html", "w") as file: file.write(str(soup)) 
+34
source share

Use Unicode to be safe:

 with open("output1.html", "w") as file: file.write(unicode(soup)) 
+7
source share

For Python 3, unicode was renamed to str , but I had to pass an encoding argument to open the file to avoid a UnicodeEncodeError .

 with open("output1.html", "w", encoding='utf-8') as file: file.write(str(soup)) 
0
source share

All Articles