SelfClosingTags in BeautifulSoup

Using BeautifulSoup to parse my XML

import BeautifulSoup soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan']) print soup.prettify() 

This will output:

 <alan x="y"> <anne> hello </anne> </alan> 

those. the anne tag is a child of the alan tag.

If I pass selfClosingTags = ['alan'] when I create the soup, I get:

 <alan x="y" /> <anne> hello </anne> 

Fine!

My question is: why can't I use the /> symbol to indicate a closing tag?

+4
python xml beautifulsoup
source share
2 answers

You ask what was in the author’s mind, noting that he gives names like Beautiful [Stone] Soup for classes / modules :-)

Here are two more BeautifulStoneSoup behavior examples:

 >>> soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" ><anne>hello</anne>""" ) >>> print soup.prettify() <alan x="y"> <anne> hello </anne> </alan> >>> soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" ><anne>hello</anne>""", selfClosingTags=['alan']) >>> print soup.prettify() <alan x="y" /> <anne> hello </anne> >>> 

My answer is: a self-closing tag is not legal if it is not defined for the parser. Thus, the author had options when deciding how to process the illegal fragment, for example, <alan x="y" /> ... (1), suppose that the error / was an error (2) to consider alan as a self-closing tag completely regardless of how it can be used elsewhere at the entrance (3) to make 2 passes over the input fins in the first pass, as each tag was used. Which choice do you prefer?

+3
source share

I don’t have a why, but that might interest you. If you use BeautifulSoup (no Stone) to parse XML with a self-closing tag, it works. Sorting:

 >>> soup = BeautifulSoup.BeautifulSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan']) >>> print soup.prettify() <alan x="y"> </alan> <anne> hello </anne> 

Nesting is correct even if alan displayed as the start and end tag.

+1
source share

All Articles