Convert a large dataset to C # objects

I am creating a complex application (planning, which includes: articles, sales, customers, production, machines ...) that uses the information provided by the SQL Server ERP database.

I use about 30 different related objects, each of which has its own information stored in a table / view. Some of these tables have from 20 to 100 thousand records.

I need to convert all these tables to a C # object for future processing that cannot be processed in SQL. I do not need all the lines, but there is no way to determine which ones I will need, as this will depend on the runtime events.

The question is how to do this. I tried the following approaches:

  • Extract all the data and save it in the DataSet using the SqlDataAdapter , which takes up about 300 MB of RAM. The first problem here is synchronization, but it is acceptable, because the data does not go to significantly change this at runtime.

    Then I ran through each line and converted it to C # objects , stored in static dictionaries for quick access through the key. The problem is that creating so many objects (millions) takes up memory using up to 1.4 GB, which is too much. In addition, data access memory is very fast.

So, if everything took up too much memory, I thought I needed some kind of line download, so I tried:

  1. Another option that I considered is to directly query the database through SqlDataReader filtering by the element I need only the first time it is needed, then it is saved in a static Dictionary. Thus, the memory usage is minimal, but thus slow (minute order), as this means that I need to make as many millions of different requests that the server does not seem to like (low performance).

Finally, I tried an intermediate approach of this kind of work, but I'm not sure if it is optimal, I suspect that it is not:

  1. The third option is to fill out a DataSet containing all the information and save a local static copy, but not convert all the rows to objects, just do it on demand (lazy), something like this:

     public class ProductoTerminado : Articulo { private static Dictionary<string, ProductoTerminado> productosTerminados = new Dictionary<string, ProductoTerminado>(); public PinturaTipo pinturaTipo { get; set; } public ProductoTerminado(string id) : base(id) { } public static ProductoTerminado Obtener(string idArticulo) { idArticulo = idArticulo.ToUpper(); if (productosTerminados.ContainsKey(idArticulo)) { return productosTerminados[idArticulo]; } else { ProductoTerminado productoTerminado = new ProductoTerminado(idArticulo); //This is where I get new data from that static dataset var fila = Datos.bd.Tables["articulos"].Select("IdArticulo = '" + idArticulo + "'").First(); //Then I fill the object and add it to the dictionary. productoTerminado.descripcion = fila["Descripcion"].ToString(); productoTerminado.paletizacion = Convert.ToInt32(fila["CantidadBulto"]); productoTerminado.pinturaTipo = PinturaTipo.Obtener(fila["PT"].ToString()); productosTerminados.Add(idArticulo, productoTerminado); return productoTerminado; } } } 

So, is this a good way to continue or should I look into the Entity Framework or something like a strongly typed DataSet ?

+5
source share
1 answer

I use the relationships between about 30 different objects, each of which has its own information stored in a table / view. Some of these tables have from 20 to 100 thousand records.

I suggest making a different decision for different types of objects. Typically, tables with thousands of records are more likely to change. Tables with fewer records are less likely. In the project I was working on, it was decided to cache in the List<T> objects that do not change (at startup). For several hundred copies, this will take much less than a second.

If you are using linq-to-sql, you have a local object in List<T> and have defined the FK constraints correctly, you can make obj.Items to access the Items table filtered by obj identifier. (In this example, obj is PK, and Items is the FK table).

This design will also give users the expected performance. When working on small sets, everything is instant (cached). When working with large sets, but making small selections or inserts, the performance is good (fast queries that use PK). You really only suffer when you run queries that combine several large tables; and in these cases, users are likely to expect this (although I cannot be sure without knowing more about the use case).

+1
source

All Articles