XDocument: is it possible to force download a file with a malformed XML file?

I have an invalid xml file. The root tag is not closed by the tag. The last tag is missing.

When I try to load my garbled XML file in C #

StreamReader sr = new StreamReader(path); batchFile = XDocument.Load(sr); // Exception 

I get the exception "Unexpected end of file. The following items are not closed: batch line. Line 54, position 1."

Is it possible to ignore the close tag or force download? I noticed that all my XML tools (like XML notepad) automatically fix or ignore the problem. I can not fix the XML file. This is one of the third-party software programs, and sometimes the file is correct.

+4
source share
3 answers

You cannot do this with XDocument , because this class loads the entire document into memory and parses it completely.
But it can be processed using XmlReader so that you can read and process the full document, and in the end you will get missing access to tags.

+3
source

I suggest using Tidy.NET to clear messy input

Tidy.NET has a good API for getting a list of problems ( MessageCollection ) in your "XML", and you can use it to fix a text stream in memory. The simplest thing would be to fix one error at a time, I thought that it would not work too well with many errors. Otherwise, you can correct errors in the reverse order of the document so that message offsets remain valid when performing corrections.

Here is an example to convert HTML input to XHTML:

Tidy tidy = new Tidy ();

 /* Set the options you want */ tidy.Options.DocType = DocType.Strict; tidy.Options.DropFontTags = true; tidy.Options.LogicalEmphasis = true; tidy.Options.Xhtml = true; tidy.Options.XmlOut = true; tidy.Options.MakeClean = true; tidy.Options.TidyMark = false; /* Declare the parameters that is needed */ TidyMessageCollection tmc = new TidyMessageCollection(); MemoryStream input = new MemoryStream(); MemoryStream output = new MemoryStream(); byte[] byteArray = Encoding.UTF8.GetBytes("Put your HTML here..."); input.Write(byteArray, 0 , byteArray.Length); input.Position = 0; tidy.Parse(input, output, tmc); string result = Encoding.UTF8.GetString(output.ToArray()); 
+3
source

What you can do is add the closing tag in xml to memory and load it.

So, after loading the xml in streamreader, manipulate the data before loading the xml

+1
source

All Articles