The DOM object model relies on the fact that all data is loaded into memory. Even if you find an implementation that delays stock loading, you will run out of memory anyway if DOM api users cross the entire DOM tree.
Essentially, you would save memory when you make XMemorySavingXDocument .Load ("big.xml") `, but the first XPath or LINQ request will still throw an OutOfMemoryException. This is true if any of the requests goes through the complete DOM tree. If you can make sure this never happens, you could leave with such a lazy DOM tree.
I do not know such an implementation, but I doubt that this will help in your case anyway. As you said, a large number of Api DOM users will travel through the DOM tree for all nodes, and you will get an OutOfMemoryException in a few minutes with this solution.
The XML DOM object model “decompresses” an XML file into an in-memory representation that consumes 7 times more memory (x64) than the original file. For 32 bits, it's still about 3.5.
The reason the XML DOM model is so bloated is because every dom node knows its children, parents and attributes. These are object references for each DOM node, which are very expensive for you.
A managed class object consumes at least 12/24 bytes per instance. Since each node pointer adds another 4/8 bytes (x86, x64) to the total memory consumption, you run out of memory with a large XML file. See the article for more information on .NET object sizes.
Since the DOM is not a good idea for large XML files, but your current architecture requires a DOM, I am afraid that you will need to distract the DOM and replace it with an API that extracts (and potentially modifies) the material you are interested in. In a large organization, you can bring This topic is up to the architects and present them as a major redesign with the obligatory presence of prio.
If you are still lucky that you have made a commitment on the part of architects and managers, then some third-party programmers in countries you have never been to get your next big blank element to work ;-).
To give you a few digits, how much the data format affects performance, I created a file with 1 million integers. I used 3 different data formats
- Binary 40 MB
- ASCII text file 80 MB (ddd \ r \ nddd \ r \ n ...)
- Xml file 170 MB ( 1 \ r \ n 2 ....)
Then I read them in a 64 bit process
- 0.1s Binary file via memory mapped file
- 0.5s BinaryReader
- 2.5s text file
- 5.3s XmlReader (Streaming)
- 8.6s XDocument.Load
Memory consumption was flat at ~ 200 MB, with the exception of XDocument.Load, which led to a peak memory of 1.2 GB. Your first goal may be different, but I would first convert the Xml material via streaming XmlReader to a binary format that can be downloaded much faster.