Here is the code I used
And the class that implements it
public class GZipStringComplexity : IStringComplexity { public double GetCompressionRatio(string input) { if (string.IsNullOrEmpty(input)) throw new ArgumentNullException(); byte[] inputBytes = Encoding.UTF8.GetBytes(input); byte[] compressed; using (MemoryStream outStream = new MemoryStream()) { using (var zipStream = new GZipStream( outStream, CompressionMode.Compress)) { using (var memoryStream = new MemoryStream(inputBytes)) { memoryStream.CopyTo(zipStream); } } compressed = outStream.ToArray(); } return (double)inputBytes.Length / compressed.Length; }
Here is how you can use it.
class Program { static void Main(string[] args) { IStringComplexity c = new GZipStringComplexity(); string input1 = "HHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFHAAAAHHHFHHFFHHFHHFFHHFHHHFH"; string input2 = "mlcllltlgvalvcgvpamdipqtkqdlelpklagtwhsmamatnnislmatlkaplrvhitsllptpednleivlhrwennscvekkvlgektenpkkfkinytvaneatlldtdydnflflclqdtttpiqsmmcqylarvlveddeimqgfirafrplprhlwylldlkqmeepcrf"; string inputMax = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"; double ratio1 = c.GetCompressionRatio(input1);
Some additional information I found helpful.
You can try using LZMA, LZMA2 or PPMD ββfrom the 7zip library . They are relatively easy to set up and provide you with an interface in which you can implement several compression algorithms. I found that these algorithms perform much better compression than GZip, but if you set the compression ratio on a scale, it doesn't really matter.
If you need a normalized value, for example, from 0 to 1, you will need to first calculate the compression ratio for all sequences. This is because you cannot be sure how maximum compression is possible.
oleksii
source share