How to render doctype using Python xml.dom.minidom?

I tried:

document.doctype = xml.dom.minidom.DocumentType('html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"') 

There is no doctype in the output. How to fix without inserting it manually?

+6
python xml doctype
source share
1 answer

You should not directly create classes from minidom . This is not a supported part of the API, ownerDocument s will not bind, and you may get some strange errors. Use the correct DOM Level 2 Core methods instead:

 >>> imp= minidom.getDOMImplementation('') >>> dt= imp.createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd') 

('DTD / xhtml1-strict.dtd is the commonly used but incorrect SystemId . This relative URL will only be valid in the xhtml1 folder in w3.org.)

Now that you have a DocumentType node, you can add it to the document. According to the standard, the only guaranteed way to do this is during document creation:

 >>> doc= imp.createDocument('http://www.w3.org/1999/xhtml', 'html', dt) >>> print doc.toxml() <?xml version="1.0" ?><!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'><html/> 

If you want to modify the doctype of an existing document, this is more of a problem. The DOM standard does not require DocumentType nodes without ownerDocument inserted into the document. However, some DOMs allow this, for example. pxdom . minidom this allows you to:

 >>> doc= minidom.parseString('<html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html>') >>> dt= minidom.getDOMImplementation('').createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd') >>> doc.insertBefore(dt, doc.documentElement) <xml.dom.minidom.DocumentType instance> >>> print doc.toxml() <?xml version="1.0" ?><!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'><html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html> 

but with errors:

 >>> doc.doctype # None >>> dt.ownerDocument # None 

which may or may not matter to you.

Technically, the only reliable way for a standard doctype in an existing document is to create a new document and import the entire old document into it!

 def setDoctype(document, doctype): imp= document.implementation newdocument= imp.createDocument(doctype.namespaceURI, doctype.name, doctype) newdocument.xmlVersion= document.xmlVersion refel= newdocument.documentElement for child in document.childNodes: if child.nodeType==child.ELEMENT_NODE: newdocument.replaceChild( newdocument.importNode(child, True), newdocument.documentElement ) refel= None elif child.nodeType!=child.DOCUMENT_TYPE_NODE: newdocument.insertBefore(newdocument.importNode(child, True), refel) return newdocument 
+11
source share

All Articles