.NET XML compression for storage in SQL Server database

Currently, our .NET application creates XML data in memory that we store in a SQL Server database. The XElement object is converted to a string using ToString (), and then stored in the varchar (MAX) column in the database. We do not want to use the SQL XML data type, since we do not need any validation, and SQL does not need to query XML at any stage.

Although this implementation works fine, we want to reduce the size of the database by compiling the XML before saving it and unpacking it after retrieving it. Does anyone have some sample code for compressing an XElement object (and unpacking would be fine too)? Also, what changes do I need to make to the data type of the database column so that we can take full advantage of this compression?

I again examined the SQL Server 2005 XML data type, and the overhead of validation that it offers is too high for us to use. Also, although it compresses XML a bit, it is not so much compressed as the .NET DeflateStream class.

I tested the DeflateStream class by writing the XML that we use to disk and then save the comrpessed version as a new file. The results are great: a file with a size of 16 KB goes into a 3kb file, so it allows you to get this to work in memory and save the data in the database. Does anyone have sample code for compression, and do I need to change varcahr (MAX) colum to type, possibly to varbinary?

Thank you in advance

+4
source share
4 answers

This article can help you get started.

The following snippet can compress a string and return a base-64 encoded result:

public static string Compress(string text) { byte[] buffer = Encoding.UTF8.GetBytes(text); MemoryStream ms = new MemoryStream(); using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true)) { zip.Write(buffer, 0, buffer.Length); } ms.Position = 0; MemoryStream outStream = new MemoryStream(); byte[] compressed = new byte[ms.Length]; ms.Read(compressed, 0, compressed.Length); byte[] gzBuffer = new byte[compressed.Length + 4]; System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length); System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4); return Convert.ToBase64String (gzBuffer); } 

EDIT: As an aside, you can use CLOB formats even when storing XML as text, since varchars have a very limited length - which XML can often exceed quickly.

+3
source

I think you should also re-test the XML column. It is stored in binary format, I know, not as text. It may be smaller and may not work poorly, even if you really do not need additional features.

+2
source

In addition to being able to compress the string itself (possibly using the LBushkin Base64 method above), you probably want to start by killing all the spaces. The XElement.ToString () method by default saves an indented element. You need to use the ToString (SaveOptions options) method (using SaveOptions.DisableFormatting) if you want to make sure that you have only tags and data.

+1
source

I know you noted the SQL 2005 question, but you should consider upgrading to SQL 2008 and take advantage of the great new compression features that come with This. Out of the box, transparent to your application and will save you a huge cost of implementation / testing / support.

-2
source

All Articles