Is there a reverse XML parser for .NET?

In my application, I have a known bias of interest in an XML string and you want to answer questions such as "what is my parent element?" indiscriminately throughout the document.

This article mentions a library that appears to be in Objective-C for "reverse" parsing XML. My application does not require full XML support, so I’m happy to come to terms with all the warnings about the impossibility of parsing completely. Is there something similar for C # /. NET?

Clarification: I am not asking about parsing solutions or compromises of performance in general, I am interested in specific situations when I am at some point halfway through a text stream and just want to know something about the local structure. Imagine a situation where I do not want to receive the top of the document, because access has a very high delay.

+4
source share
4 answers

This cannot be done without making substantial assumptions about the nature of your text. First of all, you should assume that it is well-formed XML and that it contains neither CDATA sections nor namespaces.

If you start at any position in the middle of the stream and back up until you click on what appears to be the beginning of an element, you cannot understand that the text you are looking at is actually the beginning of the element. It could be CDATA. And you cannot say that this is not CDATA until you have diverted the entire stream looking for <![CDATA[ , and found it.

Namespaces pose a similar problem. If you find the start tag, for example <Foo , you cannot know for sure that Foo is in the default namespace until you return completely to the root element of the document and make sure that the ancestor element does not have a namespace declaration. If you find <x:Foo , you need to back off until you find a private element with xmlns:x declaration.

If you know for sure that the text is well-formed XML, that it does not contain CDATA, and that its use of namespaces is limited (that is, you can specify which namespace is in the element by simply looking at its beginning tag), then some of that what you are trying to do is at least possible.

You can back up to the first start tag that you encounter, create a StreamReader whose origin is this position, and use it to create an XPathDocument that is configured to process document fragments. Please note that, by the way, you are not sure that XPathDocument will not read all the way to the end of the text the first time you use it, unless, again, you are aware of the nature of the text and you know that the corresponding end tag will be present.

But this will not handle the specific case you mentioned, i.e. find the parent element. To find the parent element, you will need to find the start tag, which is not preceded (as you move backward) by the corresponding end tag. This is not so difficult to do - each < character you find will be the beginning of either a start tag, or an end tag, or an empty element, and you can simply put end tags on the stack and place them when you find their corresponding start tag. When you click on the start tag and the stack is empty, you are at the beginning of the parent element.

But this is also a process that can lead you to completely go to the source of the stream, especially in the trivial case where the XML you are looking for is a classically crazy XML log format:

 <log> <entry>...</entry> <entry>...</entry> 

... repeated to infinity

+3
source

Looks like XPathDocument might be what you are looking for. This class provides a fast, read-only representation of the XML format of an XML document. It does not create a DOM and is optimized for XPath queries.

XPathDocument can also be used to parse XML fragments. To do this, you need to create it from XmlReader , which has a matching level set to fragment.

The following code example first selects a set of XML nodes from an XML fragment, and then selects the parent element of each node based on an XPath expression:

 using System; using System.IO; using System.Xml; using System.Xml.XPath; class Program { static void Main(string[] args) { string xml = File.ReadAllText(@"C:\tmp\smplInput.xml"); XmlReaderSettings xrs = new XmlReaderSettings(); xrs.ConformanceLevel = ConformanceLevel.Fragment; using (TextReader textReader = new StringReader(xml)) { using (XmlReader xmlReader = XmlReader.Create(textReader, xrs)) { // Create a new XPathDocument XPathDocument doc = new XPathDocument(xmlReader); // Create navigator XPathNavigator navigator = doc.CreateNavigator(); // Set up namespace manager for XPath XmlNamespaceManager ns = new XmlNamespaceManager(navigator.NameTable); ns.AddNamespace("w", "http://www.example.com/2010/"); // Select nodes XPathNodeIterator users = navigator.Select("//w:user", ns); while (users.MoveNext()) { XPathNavigator user = users.Current; XPathNavigator department = user.SelectSingleNode("parent::node()", ns); Console.WriteLine(string.Format("User {0} is in department {1}", user.GetAttribute("name", ns.DefaultNamespace), department.GetAttribute("type", ns.DefaultNamespace))); } } } } } 

To try the code, you can use the following XML input document:

 <?xml version="1.0" encoding="utf-8" ?> <w:departments xmlns:w="http://www.example.com/2010/"> <w:department type="A"> <w:user name="w" /> <w:user name="x" /> <w:department type="B"> <w:user name="x" /> <w:user name="y" /> </w:department> <w:department type="C"> <w:user name="x" /> <w:user name="y" /> <w:user name="z" /> </w:department> </w:department> <w:department type="D"> <w:user name="w" /> </w:department> </w:departments> 
+2
source

Another approach is to parse the XML once, then generate the XML index so that the next time you load the index and you don't need to parse the XML many times ... see article below

http://xml.sys-con.com/node/453082

+1
source

Xponentsoftware's CAX does exactly what you want.

0
source

All Articles