Cascade and DataSet Memory Consumption

I have an application that uses DataSet.WriteXML to export data and DataSet.ReadXML to import data. During the import process, I need to change some primary keys as part of the application logic.

When there are records over 500K, it writes to XML and successfully reads from XML. After changing the primary key, it waits for a while and throws an OutOfMemory exception. The reason, in my opinion, is that it has to do a lot of cascading updates. I tried BeginEdit and EndEdit while changing the primary key, but in this case it still does not work in EndEdit.

As I understand it, DataSets stores some previous data also in memory. Is there a way to optimize DataSet update operations so that it consumes minimal memory?

+7
source share
4 answers

If you need more control, you will need to remove some of the features that the dataset provides. One way to reduce memory caused by cascades is simply not a cascade. Manually update the table ID using the table schema.

The idea is that you can control which rows are updated, AcceptChanges at any time, force a GC update, or something else that you may want to control.

I created a simple test script that shows what I mean:

enter image description here

Scheme:

<?xml version="1.0"?> <xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:UseCurrentLocale="true"> <xs:complexType> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="Planet"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Continent"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="PlanetID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Country"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="ContinentID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="County"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="CountryID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="City"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="CountyID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Street"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="CityID" type="xs:int" minOccurs="0" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="People"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="StreetID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Job"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="PeopleID" type="xs:int" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Pets"> <xs:complexType> <xs:sequence> <xs:element name="ID" type="xs:int" /> <xs:element name="PeopleID" type="xs:int" minOccurs="0" /> <xs:element name="Name" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> <xs:unique name="Constraint1"> <xs:selector xpath=".//Planet" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="Continent_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//Continent" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="Country_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//Country" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="County_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//County" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="City_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//City" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="Street_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//Street" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="People_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//People" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="Job_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//Job" /> <xs:field xpath="ID" /> </xs:unique> <xs:unique name="Pets_Constraint1" msdata:ConstraintName="Constraint1"> <xs:selector xpath=".//Pets" /> <xs:field xpath="ID" /> </xs:unique> <xs:keyref name="Relation8" refer="People_Constraint1"> <xs:selector xpath=".//Pets" /> <xs:field xpath="PeopleID" /> </xs:keyref> <xs:keyref name="Relation7" refer="People_Constraint1"> <xs:selector xpath=".//Job" /> <xs:field xpath="PeopleID" /> </xs:keyref> <xs:keyref name="Relation6" refer="Street_Constraint1"> <xs:selector xpath=".//People" /> <xs:field xpath="StreetID" /> </xs:keyref> <xs:keyref name="Relation5" refer="City_Constraint1"> <xs:selector xpath=".//Street" /> <xs:field xpath="CityID" /> </xs:keyref> <xs:keyref name="Relation4" refer="County_Constraint1"> <xs:selector xpath=".//City" /> <xs:field xpath="CountyID" /> </xs:keyref> <xs:keyref name="Relation3" refer="Country_Constraint1"> <xs:selector xpath=".//County" /> <xs:field xpath="CountryID" /> </xs:keyref> <xs:keyref name="Relation2" refer="Continent_Constraint1"> <xs:selector xpath=".//Country" /> <xs:field xpath="ContinentID" /> </xs:keyref> <xs:keyref name="Relation1" refer="Constraint1"> <xs:selector xpath=".//Continent" /> <xs:field xpath="PlanetID" /> </xs:keyref> </xs:element> </xs:schema> 

And some code that generates a test case

  private void CreateRows(Int32 MaxBaseRows, Int32 MaxChildRows) { dataSet1.Clear(); Int32 RowCount = 0; Random R = new Random(); foreach (DataTable DT in dataSet1.Tables) { Int32 NewCount = R.Next(1, MaxBaseRows); foreach (var FK in DT.Constraints.OfType<ForeignKeyConstraint>()) { NewCount = NewCount * R.Next(1, MaxChildRows); } for (int i = 0; i < NewCount; i++) { DataRow DR = DT.NewRow(); foreach (DataColumn DC in DT.Columns) { if (DC.ColumnName == "ID") { DR[DC] = DT.Rows.Count; } else if (DC.DataType == typeof(Int32)) { Boolean ValueSet = false; foreach (var FK in DT.Constraints.OfType<ForeignKeyConstraint>()) { if (FK.Columns.Contains(DC)) { DR[DC] = R.Next(0, FK.RelatedTable.Rows.Count); ValueSet = true; } } if (!ValueSet) { DR[DC] = R.Next(0, 10000); } } else if (DC.DataType == typeof(String)) { DR[DC] = String.Format("{0}{1}", DT.TableName, DT.Rows.Count); } } DT.Rows.Add(DR); RowCount++; } } label19.Text = RowCount.ToString(); dataSet1.AcceptChanges(); } private void UpdateUsingCascade() { EnableRelations(); GC.Collect(); long Mem = System.GC.GetTotalMemory(false); if (dataSet1.Tables["Planet"].Rows.Count > 0) { dataSet1.Tables["Planet"].Rows[0]["ID"] = new Random().Next(BaseRowCount, BaseRowCount + 10); } Mem = System.GC.GetTotalMemory(false) - Mem; DataSet ds = dataSet1.GetChanges(); Int32 Changes = ds.Tables.OfType<DataTable>().Sum(DT => DT.Rows.Count); label19.Text = Changes.ToString(); label21.Text = Mem.ToString(); dataSet1.AcceptChanges(); } private void UpdateManually() { DisableRelations(); GC.Collect(); long Mem = System.GC.GetTotalMemory(false); DataTable DT = dataSet1.Tables["Planet"]; Int32 ChangeCount = 0; if (DT.Rows.Count > 0) { DataColumn DC = DT.Columns["ID"]; Int32 oldValue = Convert.ToInt32(DT.Rows[0][DC]); DT.Rows[0][DC] = new Random().Next(BaseRowCount + 20,BaseRowCount + 30); Int32 newValue = Convert.ToInt32(DT.Rows[0][DC]); foreach (DataRelation Relation in DT.ChildRelations) { if (Relation.ParentColumns.Contains(DC)) { foreach (DataColumn CC in Relation.ChildColumns) { foreach (DataRow DR in Relation.ChildTable.Rows) { if (Convert.ToInt32(DR[CC]) == oldValue) { DR[CC] = newValue; ChangeCount++; dataSet1.AcceptChanges(); GC.Collect(); } } } } } } Mem = System.GC.GetTotalMemory(false) - Mem; label20.Text = ChangeCount.ToString(); label22.Text = Mem.ToString(); dataSet1.AcceptChanges(); } private void EnableRelations() { dataSet1.EnforceConstraints = true; foreach (DataRelation Relation in dataSet1.Relations) { Relation.ChildKeyConstraint.UpdateRule = Rule.Cascade; } } private void DisableRelations() { dataSet1.EnforceConstraints = false; foreach (DataRelation Relation in dataSet1.Relations) { Relation.ChildKeyConstraint.UpdateRule = Rule.None; } } 
+1
source

SHCJ - you should use BufferedStream :

 DataSet dataSet = new DataSet(); FileStream fileStream = File.OpenRead(pathToYourFile); BufferedStream bufferedStream = new BufferedStream(fileStream); dataSet.ReadXml(bufferedStream); 

Update

Try this for your write operations:

 using (XmlWriter xmlWriter = XmlWriter.Create(_pathToYourFile)) { /* write oprations */ } 
0
source

Try the following:

 try { //Logic to load your file var xelmOriginal = new XElement("Root"); for (int i = 0; i < 500000; i++) { var item = new XElement("Item"); item.SetAttributeValue("id", i); xelmOriginal.Add(item); } // Logic to transform each element var xelmRootTransformed = new XElement("Root"); foreach (var element in xelmOriginal.Elements()) { var transformedItem = new XElement("Transformed", element. Attributes() .Single(x => x.Name.LocalName.Equals("id"))); xelmRootTransformed.Add(transformedItem); } //Logic to save your transformed file }catch(Exception e) { Console.WriteLine("Failed"); return; } Console.WriteLine("Success"); 

The key point here is the separation of input and output. That is, you do not convert the file and immediately write to it; You ruin your listing.

Instead, read your file one element at a time and write to the temporary output one element at a time; in theory you will have only one active element.

0
source

DataSets are intelligent animals. They can not only read / write / hold / filter data, but also perform CHANGE TRACKING, so later updates / writes / deletions are faster (when working with a database, and not just with XML files).

It may have happened that your DataSet enabled change tracking, which would make it always remember not only what the current data is, but also how the data looked before and how the new data relates to the old ones. If you just keep the DataSet as a "container" for the current workload, you do not need caching / changetracking - just disable it. I mean, if itโ€™s possible, I donโ€™t remember now if and how to do it. However, I am sure that you can discard the changes by calling .AcceptChanges () or by overturning the old DS and creating new DS for each new batch of downloaded data. The latter, of course, will NOT help for OOMs thrown during successive updates of the current batch. AcceptChanges cannot help if OOM is thrown at the time of the very first PK update. You can โ€œacceptโ€ changes only after the completion of one complete operation, and even in this case there will be โ€œnoโ€ when you can publish it. But, if OOM is called after several PK changes, then calling AcceptChanges after each of them or after every few can help.

Please note that I guess. Your DSs are not connected to the database, so change tracking may be disabled by default. But I doubt that, I remember that even for an XML file, you can ask DS to flush the data, followed by the change log. I think it is enabled by default.

0
source

All Articles