Beautiful soup and table scraper - lxml vs html parser

Question

Beautiful soup and table scraper - lxml vs html parser

I am trying to extract the HTML of a table from a webpage using BeautifulSoup.

<table class="facts_label" id="facts_table">...</table>

I would like to know why the code below works with "html.parser"and prints noneif I change "html.parser"for "lxml".

#! /usr/bin/python

from bs4 import BeautifulSoup
from urllib import urlopen

webpage = urlopen('http://www.thewebpage.com')
soup=BeautifulSoup(webpage, "html.parser")
table = soup.find('table', {'class' : 'facts_label'})
print table

+4

python html-parsing web-scraping lxml beautifulsoup

Laguille Sep 7 '14 at 20:23

source share

1 answer

alecxe · Accepted Answer · 2014-09-07T22:53:09+0000

The documentation BeautifulSouphas a special paragraph, Differences between parsers , it states that:

Beautiful Soup , . . HTML XML.

HTML-.

, , .

, , . .

Beautiful soup and table scraper - lxml vs html parser

More articles: