How to overcome an OutOfMemoryException to pull large XML documents from an API?

I am extracting 1M + entries from the API. The pull operation works fine, but I get an exception in memory when I try ReadToEnd in a string variable.

Here is the code:

  XDocument xmlDoc = new XDocument(); HttpWebRequest client = (HttpWebRequest)WebRequest.Create(uri); client.Timeout = 2100000;//35 minutes WebResponse apiResponse = client.GetResponse(); Stream receivedStream = apiResponse.GetResponseStream(); StreamReader reader = new StreamReader(receivedStream); string s = reader.ReadToEnd(); 

Stack trace:

 at System.Text.StringBuilder.ToString() at System.IO.StreamReader.ReadToEnd() at MyApplication.DataBuilder.getDataFromAPICall(String uri) in c:\Users\RDESLONDE\Documents\Projects\MyApplication\MyApplication\DataBuilder.cs:line 578 at MyApplication.DataBuilder.GetDataFromAPIAsXDoc(String uri) in c:\Users\RDESLONDE\Documents\Projects\MyApplication\MyApplication\DataBuilder.cs:line 543 

What can I do to get around this?

+4
source share
3 answers

XMLReader is the way to go when memory issue. It is also the fastest.

+2
source

It looks like your file is too large for your environment. Downloading the DOM for a large file can be problematic, especially when using the win32 platform (you did not specify whether this is true).

You can combine the speed and efficiency of XmlReader memory with the convenience of XElement / Xnode, etc. and use XStreamingElement to save the converted content after processing. This is significantly more efficient for storing large files.

Here is an example in pseudo code:

  // use a XStreamingElement for writing var st = new XStreamingElement("root"); using(var xr = new XmlTextReader(stream)) { while (xr.Read()) { // whatever you're interested in if (xr.NodeType == XmlNodeType.Element) { var node = XNode.ReadFrom(xr) as XElement; if (node != null) { ProcessNode(node); st.Add(node); } } } } st.Save(outstream); // or st.WriteTo(xmlwriter); 
+6
source

Unfortunately, you did not show your code, but it seems that the entire file is being loaded into memory. This is what you need to avoid.

Best of all, if you can use a stream to process a file without loading all the contents in memory.

+1
source

All Articles