Unable to find private element in XML

I have a large XML file (~ 18 MB). Apparently there is a tag in it that is not closed. I know this because when I ran the W3C markup verifier (validator.w3.org), I get the following error:

You may have neglected to close an element, or perhaps you meant to "self-close" an element, that is, ending it with "/>" instead of ">".

My question is how can I find this missing private item among the 500,000 lines in the file. Is there a tool that I could use to suggest places where a problem might arise - for example, an element that was not closed after a certain number of lines?

Any ideas would be highly appreciated.

+8
xml
source share
4 answers

I am using Notepad ++ , which has a great XML Tools plugin that allows you to check the XML syntax and displays you in a string that is problematic. It also has useful utilities.

enter image description here

+7
source share

I just opened the XML file in VS 2010 (with ReSharper), broke the XML and what do you know? The error was immediately highlighted. If you have access to the same, it's simple.

+3
source share

xmllint is the standard tool for this. On the Verify and DTD Page page :

The easiest way is to use the xmllint program included in libxml. The --valid parameter enables the verification of files specified as input. For example, the following example verifies a copy of the first version of the XML 1.0 specification:

 xmllint --valid --noout test/valid/REC-xml-19980210.xml 

noout is used to disable output tree output.

-dtdvalid dtd allows you to check the document against this DTD.

Libxml2 exports the API for DTD processing and validation, check the corresponding description.

If your document is not "pretty printed," it's still hard to find an offensive node, so you might want to use xmllint to overwrite the indented file.

+3
source share

Since you do not have an XML schema, there is no reliable way to find the intruder code; for example, XML allows recursive structures. But you CAN write your own XML schema, although this could potentially be a lot to learn. Alternatively, I would create a simple, silly, node-level validator and element name:

 private void parseAndCheckStructure(XMLStreamReader reader) throws XMLStreamException { // first read header, this is probably not the offending element (?) int event = -1; while (reader.hasNext()) { event = reader.next(); if (event == XMLStreamConstants.START_ELEMENT){ break; } else if (event == XMLStreamConstants.END_DOCUMENT) { throw new XMLStreamException(); } } // read the rest of the document. int level = 1; do { event = reader.next(); if (event == XMLStreamConstants.START_ELEMENT){ level++; String localName = reader.getLocalName(); if(localName.equals("FirstElement")) { parseFirstElementWithALoopLikeTheCurrent(reader); level--; } else if(localName.equals("SecondElement")) { parseSecondElementWithALoopLikeTheCurrent(reader); level--; } else throw new RuntimeException("Unknown element " + localName + " at level " + level + " and location " + reader.getLocation()); } else if(event == XMLStreamConstants.END_ELEMENT) { // keep track of level level--; } } while(level > 0); } 

Alternatively, parse the entire document in the above do-while loop and do type checks

 if(level == 4 && localName.equals("MyElement")) { // ok } else { // throw exception with the location } 

It sucks, but it works.

0
source share

All Articles