How to measure string complexity?

Question

How to measure string complexity?

I have a few long lines (~ 1.000.000 characters). Each line contains only characters from a specific alphabet, e.g.

A = {1,2,3}

String Examples

 string S1 = "1111111111 ..."; //[meta complexity] = 0 string S2 = "1111222333 ..."; //[meta complexity] = 10 string S3 = "1213323133 ..."; //[meta complexity] = 100

Q What measures can I use to quantify the complexity of these lines? I see that S1 is less complicated than S3, but how can I do this programmatically from .NET? Any algorithm or reference to a tool / literature would be appreciated.

Edit

I tried Shannon's entropy, but it turned out that this is not very useful for me. I will have the same H value for these sequences AAABBBCCC and ABCABCABC and ACCCBABAB and BBACCABAC . strong>

This is what I finished .

+8

string algorithm complexity-theory .net approximation

oleksii May 21 '11 at 20:55

source share

1 answer

aioobe · Accepted Answer · 2011-05-21T20:57:27+0000

Compressing strings using standard methods such as zip gives a good sign of complexity.

Good compression ratio and asymptotic behavior; low complexity
Poor compression ratio and asymptotic behavior; higher complexity

How to measure string complexity?

Edit

More articles: