SelfClosingTags in BeautifulSoup

Question

SelfClosingTags in BeautifulSoup

Using BeautifulSoup to parse my XML

import BeautifulSoup soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan']) print soup.prettify()

This will output:

 <alan x="y"> <anne> hello </anne> </alan>

those. the anne tag is a child of the alan tag.

If I pass selfClosingTags = ['alan'] when I create the soup, I get:

 <alan x="y" /> <anne> hello </anne>

Fine!

My question is: why can't I use the /> symbol to indicate a closing tag?

+4

python xml beautifulsoup

Alan Feb 06 '10 at 1:24

source share

2 answers

I don’t have a why, but that might interest you. If you use BeautifulSoup (no Stone) to parse XML with a self-closing tag, it works. Sorting:

 >>> soup = BeautifulSoup.BeautifulSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan']) >>> print soup.prettify() <alan x="y"> </alan> <anne> hello </anne>

Nesting is correct even if alan displayed as the start and end tag.

+1

Blair conrad Feb 06 '10 at 1:57

source share

John machin · Accepted Answer · 2010-02-06T01:57:54+0000

You ask what was in the author’s mind, noting that he gives names like Beautiful [Stone] Soup for classes / modules :-)

Here are two more BeautifulStoneSoup behavior examples:

 >>> soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" ><anne>hello</anne>""" ) >>> print soup.prettify() <alan x="y"> <anne> hello </anne> </alan> >>> soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" ><anne>hello</anne>""", selfClosingTags=['alan']) >>> print soup.prettify() <alan x="y" /> <anne> hello </anne> >>>

My answer is: a self-closing tag is not legal if it is not defined for the parser. Thus, the author had options when deciding how to process the illegal fragment, for example, <alan x="y" /> ... (1), suppose that the error / was an error (2) to consider alan as a self-closing tag completely regardless of how it can be used elsewhere at the entrance (3) to make 2 passes over the input fins in the first pass, as each tag was used. Which choice do you prefer?

SelfClosingTags in BeautifulSoup

More articles: