I just started redoing scrapy in combination with BeautifulSoup , and I wonder if I am missing something very obvious, but I canβt figure out how to get the doctype of the returned html document from the resulting soup object.
Given the following html:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html lang="en"> <head> <meta charset=utf-8 /> <meta name="viewport" content="width=620" /> <title>HTML5 Demos and Examples</title> <link rel="stylesheet" href="/css/html5demos.css" type="text/css" /> <script src="js/h5utils.js"></script> </head> <body> <p id="firstpara" align="center">This is paragraph <b>one</b> <p id="secondpara" align="blah">This is paragraph <b>two</b>. </html>
Can someone tell me if there is a way to extract the declared doctype from it using BeautifulSoup?
python parsing scrapy beautifulsoup
Steerpike
source share