How to reduce memory size with large data sets in EF5?

I am trying to pull a massive dataset (1.4 million records) from SQL Server and dump into a file in a WinForms application. I tried to do this with swap, so I donโ€™t do too much memory, but the process continues to increase the amount of memory when it starts. At about 25%, he occupied 600,000 K. Am I really paging wrong? Can I get some suggestions on how to increase memory usage because of this?

var query = (from organizations in ctxObj.Organizations where organizations.org_type_cd == 1 orderby organizations.org_ID select organizations); int recordCount = query.Count(); int skipTo = 0; int take = 1000; if (recordCount > 0) { while (skipTo < recordCount) { if (skipTo + take > recordCount) take = recordCount - skipTo; foreach (Organization o in query.Skip(skipTo).Take(take)) { writeRecord(o); } skipTo += take; } } 
+7
c # entity-framework-5
source share
4 answers

Get AsNoTracking of paging and use AsNoTracking .

Test code

  static void Main(string[] args) { var sw = new Stopwatch(); sw.Start(); using (var context = new MyEntities()) { var query = (from organizations in context.LargeSampleTable.AsNoTracking() where organizations.ErrorID != null orderby organizations.ErrorID select organizations);//large sample table, 146994 rows foreach (MyObject o in query) { writeRecord(o); } } sw.Stop(); Console.WriteLine("Completed after: {0}", sw.Elapsed); Console.ReadLine(); } private static void writeRecord(ApplicationErrorLog o) { ; } 

Test result:

Reduced memory consumption: 96%
Reduced lead time: 50%

Interpretation

AsNoTracking provides the benefits of using memory for obvious reasons; we donโ€™t need to maintain references to objects when we load them into memory. GC objects are ligated almost immediately. The combination of lazy evaluation and AsNoTracking, and there is no need for paging, and the destruction of the context may be delayed.

Although this is the only test, the large number of lines and the exclusion of most external factors make this a good representation for the general case.

+5
source share

An object context will store objects in memory until it is allocated. I would recommend getting rid of context after each batch to prevent memory growth.

You can also use AsNoTracking() ( http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx ), since you are not returning back to the database.

+8
source share

A few things.

  • Calling Count() launches your request. Then you run it a second time to get the results. You do not need to do this.

  • The memory you see is related to loading objects into memory. If you only need a subset of the fields, project onto an anonymous type (or a simpler named type). This avoids tracking changes and overhead.

Used this way, EF can be a good typed API for lightweight SQL queries.

Something like this should do the trick:

 var query = from organizations in ctxObj.Organizations where organizations.org_type_cd == 1 orderby organizations.org_ID select new { o.Id, o.Name }; foreach (var org in query) { write(org.Id, org.Name); } 
+1
source share

Why don't you just use the standard System.Data.SqlClient.SqlConnection class? You can read the command line results line by line using the SqlDataReader class and write each line to a file. You have full control over your code referencing only one row of records at a time.

 using (var writer = new System.IO.StreamWriter(fileName)) using (var conn = new SqlConnection(connectionString)) { using (var cmd = new SqlCommand()) { cmd.CommandText = "SELECT * FROM Organizations WHERE org_type_cd = 1 ORDER BY org_ID"; using (var reader = cmd.ExecuteReader()) { while (reader.Read()) { int id = (int)reader["org_ID"]; int org_type_cd = (int)reader["org_type_cd"]; writer.WriteLine(...); } } } } 

The Entity Framework is not designed to solve every problem or your exclusive data access infrastructure. This meant that it was easier to write for simple CRUD operations. Working with millions of lines is a good option for a more specialized solution.

0
source share

All Articles