How to parse an XML string and retrieve the char index of elements?

As with the title, I am currently dealing with the need to parse a string in XML format, storing information about the character index of the beginning of the element tag and the end of the element tag in the original string. I watched SAX and DOM, and I can not find anything that would provide me with this data. Any suggestions?

Thanks.

+4
source share
2 answers

Not sure if useful, but considering SAX is sequential, could you keep the character count? The only problem is that some spaces can be ignored.

0
source

You can look at Locator , DefaultHandler and SAXParser . As an example, which gives the row number and column number:

 public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException { String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + "<project \n"+ ">\n"+ " <description>A description</description>\n"+ "</project>\n"; SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); InputSource inps = new InputSource(new StringReader(xml)); DefaultHandler df = new XDefaultHandler(); sp.parse(inps, df); } static class XDefaultHandler extends DefaultHandler { Locator l = null; @Override public void setDocumentLocator(Locator locator) { l = locator; } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { System.out.println("element: " + qName); System.out.println("locator: " + l.getLineNumber() + "/" + l.getColumnNumber()); } } 

Conclusion:

 element: project locator: 3/2 element: description locator: 4/18 

Override other methods in XDefaultHandler to receive calls for end items, etc.

EDIT: (click too soon)

From the Locator.getLineNumber() documentation:

Returns the line number where the current document event ends. Lines are bounded by the ends of the lines that are defined in the XML specification.

Note: the return value of the method is intended only as an approximation for diagnostics; it is not intended to provide sufficient information to edit the contents of a character in an XML source document. In some cases, these "string" numbers correspond to what will be displayed in the form of columns, while in others they may not correspond to the source text due to the expansion of the internal essence.

The return value is the approximation of the line number in the document object or in the external parsed object, where markup appears that triggers the event.

0
source

All Articles