How to determine the randomness of a binary string?

How to determine the randomness of a binary string?

Ergo, how can I encode my own test and return a single value corresponding to statistical randomness, a value from 0 to 1.0 (0 is not random, 1.0 is random)?

The test should work on binary strings of any size.

When you do this with a pen and paper, you can examine the lines as follows:
0 (randomness, the only other choice is 1)
00 (not by chance, its repetition corresponds to the size)
01 (better, two different values)
010 (less random, palindrome)
011 (less random, more than 1, still acceptable)
0101 (less random, pattern)
0100 (better, less, but any other distribution causes patterns)

Examples:

Size: 1, Opportunities: 2
0: 1.0 (random)
1: 1.0 (random)

Size: 2, P: 4
00 :?
01: 1.0 (random)
10: 1.0 (random)
eleven:?

S: 3, P: 8
000 :? nonrandom
001: 1.0 (random)
010:? less random
011: 1.0 (random)
100: 1.0 (random)
101:? less random
110 1.0 (random)
111:? nonrandom

And so on.

I feel that this can play a lot in breaking a string into all possible substrings and comparing frequencies, but it looks like this foundation was already made in the early days of computer science.

+4
c ++ algorithm random entropy
source share
5 answers

This will give you an entropy number from 0 to 1.0:

You might want to study Shannon Entropy , which is a measure of entropy for data and information. In fact, it is actually an almost direct analogue of the Physical Formula for Entropy, as defined by the most acceptable interpretations of thermodynamics.

In particular, in your case with a binary string, you can see the Binary Entropy Function , which is a special case related to randomness in binary data bits.

This is calculated using

H(p) = -p*log(p) - (1-p)*log(1-p) 

(logarithms in base 2, suppose 0*log(0) is 0)

Where p is your percentage of 1 (or 0, the graph is symmetrical, so your answer is the same anyway)

Here is what the function gives:

Binary entry function

As you can see, if p is 0.5 (the same amount from 1 to 0), your entropy will be maximum (1.0). If p is 0 or 1.0, the entropy is 0.

It looks like what you want, right?

The only exception is your size 1 , which can be simply excluded. However, 100% 0 and 100% 1 do not seem too entropic to me. But implement them as you would like.

In addition, this does not take into account the "ordering" of bits. Only their total amount. Thus, repeat / palindromes will not receive any enhancement. You can add additional heuristics for this.

Here are your other examples:

 00: -0 * log (0) - (1-0) * log (1-0) = 0.0
 01: -0.5 * log (0.5) - (1-0.5) * log (1-0.5) = 1.0
 010: - (1/3) * log (1/3) - (2/3) * log (2/3) = 0.92
 0100: -0.25 * log (0.25) - (1-0.25) * log (1-0.25) = 0.81
+8
source share

It seems you are asking for a way to find the Kolmogorov complexity of a binary string. Unfortunately, this is uncompromising . The size of your string after it is launched using the compression algorithm will give you an idea of ​​how arbitrary it is, since more random strings are less compressible.

+10
source share

Some time ago, I developed a simple heuristic that worked for my purposes.

You simply calculate the "parity" of 0s and 1s not only in the string itself, but also in derivatives of the string. For example, the first derivative of 01010101 is 11111111, because each bit is changed, and the second derivative is 00000000, since no bit in the first derivative is changed. Then you just need to weigh these β€œparities” according to your taste.

Here is an example:

 #include <string> #include <algorithm> float variance(const std::string& x) { int zeroes = std::count(x.begin(), x.end(), '0'); float total = x.length(); float deviation = zeroes / total - 0.5f; return deviation * deviation; } void derive(std::string& x) { char last = *x.rbegin(); for (std::string::iterator it = x.begin(); it != x.end(); ++it) { char current = *it; *it = '0' + (current != last); last = current; } } float randomness(std::string x) { float sum = variance(x); float weight = 1.0f; for (int i = 1; i < 5; ++i) { derive(x); weight *= 2.0f; sum += variance(x) * weight; } return 1.0f / sum; } int main() { std::cout << randomness("00000000") << std::endl; std::cout << randomness("01010101") << std::endl; std::cout << randomness("00000101") << std::endl; } 

In your examples, enter "randomness" 0.129032, 0.133333 and 3.2, respectively.

On a side note, you can get cool fractal graphics by printing lines;)

 int main() { std::string x = "0000000000000001"; for (int i = 0; i < 16; ++i) { std::cout << x << std::endl; derive(x); } } 0000000000000001 1000000000000001 0100000000000001 1110000000000001 0001000000000001 1001100000000001 0101010000000001 1111111000000001 0000000100000001 1000000110000001 0100000101000001 1110000111100001 0001000100010001 1001100110011001 0101010101010101 1111111111111111 
+4
source share

it looks like you have a bunch of heuristics for randomness. Just do something that goes through these heuristics and estimates the bitstream on average for all heuristics?

0
source share

You can try the string compression algorithm. The more repetitions (less chance), the more string can be compressed.

0
source share

All Articles