Do you mean something like this?
from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup('<?xml version="1.0" encoding="UTF-8"?>') # make some more soup
Or,
soup = BeautifulStoneSoup() # make some more soup soup.insert(0, '<?xml version="1.0" encoding="UTF-8"?>')
From BeautifulSoup documentation :
Beautiful Soup tries to turn your document into Unicode in coding order:
- The encoding you pass as the fromEncoding argument to the soup constructor.
- The encoding found in the document itself: for example, in the XML declaration or (for HTML documents) the META-http-equiv tag. If Beautiful Soup finds this kind of encoding inside the document, it again analyzes the document from the very beginning and gives a new encoding. The only exception is if you explicitly specified an encoding, and this encoding really worked: then it will ignore any encoding found in the document.
- The coding snorted, looking at the first few bytes of the file. If encoding is detected at this stage, it will be one of the encodings UTF- *, EBCDIC or ASCII.
- The encoding sniffed by the font library, if installed.
- Utf-8
- Windows-1252
A beautiful soup will almost always guess correctly, if at all. But for documents without declarations and in strange encodings, he often will not be able to guess.
NB item # 2, which I read as: BeautifulSoup will automatically use the encoding in the xml declaration unless you explicitly specify one of the arguments fromEncoding. YMMV.
The previous referenced documentation has other, potentially useful, unicode examples.
Edit : @TorelTwiddler, if there is another way to add an xml declaration using BeautifulSoup without passing the tag as a string directly, I don't know about that.
However, consider the following:
soup = BeautifulStoneSoup('<?xml version="1.0" encoding=""?>') # <- no encoding mytag = Tag(soup, 'mytag') soup.append(mytag) print str(soup) # "<?xml version='1.0' encoding='utf-8'?><mytag></mytag>" # Wha!? :) print soup.prettify(encoding='euc-jp') # <?xml version='1.0' encoding='euc-jp'?> # <mytag> # </mytag>
Perhaps this will help you get where you want to go.
Marty source share