How to add to large XML files in C # using memory efficiently

Is there a way to combine two XmlDocuments without holding the first in memory?

I need to view a list of up to one hundred large (~ 300 MB) XML files, adding up to 1000 nodes to each, repeating the whole process several times (since the new node list is cleared to save memory). I am currently loading the entire XmlDocument into memory before adding new nodes that are currently not valid.

What would you say is the best way to do this? I have a few ideas, but I'm not sure which is best:

  • Never load all of the XmlDocument , instead using the XmlReader and XmlWriter at the same time to write to a temporary file, which is later renamed.
  • Make an XmlDocument only for new nodes, and then manually write it to an existing file (ie file.WriteLine( "<node>\n" )
  • Something else?

Any help would be greatly appreciated.

Change A few more details in response to some of the comments:

The program analyzes several large logs in XML, grouping them into different files by source. It should only run once a day, and once XML is written, there is a lightweight proprietary reader program that reports data. A program needs to be run only once a day, so it can be slow, but it runs on a server that performs other actions, mainly compressing and transferring files, which cannot be done too much.

The database is likely to be simpler, but the company is not going to do this any time soon!

As in the case, the program runs on the dev machine, using a maximum of several GB of memory, but excludes from the memory exceptions when running on the server.

Final Edit The task is pretty low, so getting the database will only require an additional fee (although I will look at mangoes).

The file will be added and will not grow indefinitely - each final file will cost only a day, and then new files are created the next day.

I will probably use the XmlReader / Writer method, as this will be the easiest way to guarantee XML is correct, but I took into account all your comments / answers. I know that having XML files is a big, not very good solution, but it is something that I am limited, so thanks for all the help provided.

+4
source share
1 answer

If you want to be fully convinced of the structure of XML, XMLWriter and XMLReader are the best options.

However, with absolutely high performance, you can quickly create this code using direct string functions. This can be done, although you lose the ability to check the XML structure - if an error occurs in one file, you cannot fix it:

 using (StreamWriter sw = new StreamWriter("out.xml")) { foreach (string filename in files) { sw.Write(String.Format(@"<inputfile name=""{0}"">", filename)); using (StreamReader sr = new StreamReader(filename)) { // Using .NET 4 CopyTo(); alternatively try http://bit.ly/RiovFX if (max_performance) { sr.CopyTo(sw); } else { string line = sr.ReadLine(); // parse the line and make any modifications you want sw.Write(line); sw.Write("\n"); } } sw.Write("</inputfile>"); } } 

Depending on how your input XML files are structured, you can remove XML headers, perhaps a document element or several other unnecessary structures. This can be done by analyzing the file line by line

+2
source

All Articles