How to measure string complexity?

I have a few long lines (~ 1.000.000 characters). Each line contains only characters from a specific alphabet, e.g.

A = {1,2,3} 

String Examples

 string S1 = "1111111111 ..."; //[meta complexity] = 0 string S2 = "1111222333 ..."; //[meta complexity] = 10 string S3 = "1213323133 ..."; //[meta complexity] = 100 

Q What measures can I use to quantify the complexity of these lines? I see that S1 is less complicated than S3, but how can I do this programmatically from .NET? Any algorithm or reference to a tool / literature would be appreciated.

Edit

I tried Shannon's entropy, but it turned out that this is not very useful for me. I will have the same H value for these sequences AAABBBCCC and ABCABCABC and ACCCBABAB and BBACCABAC . strong>


This is what I finished .
+8
string algorithm complexity-theory approximation
source share
1 answer

Compressing strings using standard methods such as zip gives a good sign of complexity.

Good compression ratio and asymptotic behavior; low complexity
Poor compression ratio and asymptotic behavior; higher complexity

+11
source share

All Articles