I have a few long lines (~ 1.000.000 characters). Each line contains only characters from a specific alphabet, e.g.
A = {1,2,3}
String Examples
string S1 = "1111111111 ..."; //[meta complexity] = 0 string S2 = "1111222333 ..."; //[meta complexity] = 10 string S3 = "1213323133 ..."; //[meta complexity] = 100
Q What measures can I use to quantify the complexity of these lines? I see that S1 is less complicated than S3, but how can I do this programmatically from .NET? Any algorithm or reference to a tool / literature would be appreciated.
Edit
I tried Shannon's entropy, but it turned out that this is not very useful for me. I will have the same H value for these sequences AAABBBCCC and ABCABCABC and ACCCBABAB and BBACCABAC . strong>
This is what I finished .
oleksii
source share