Here's the Python 3 code, which generates 1,000,000 "random" lowercase letters in 0.28 seconds (see also the 0.11 second solution at the end, the @Ashwini Chaudhary code from the question takes 0.55 seconds on my machine, @ Markku K. code is 0.53 ):
#!/usr/bin/env python3 import os import sys def write_random_lowercase(n): min_lc = ord(b'a') len_lc = 26 ba = bytearray(os.urandom(n)) for i, b in enumerate(ba): ba[i] = min_lc + b % len_lc
% len_lc crosses out the distribution (see the end on how to fix it), although it still satisfies the conditions (ascii, lower case, frequencies 1, 2, 3 letter sequences):
$ python3 generate-random.py | python3 check-seq.py
where check-seq.py :
#!/usr/bin/env python3 import sys from collections import Counter from string import ascii_lowercase def main(): limits = [40000, 2000, 100] s = sys.stdin.buffer.readline()
Note: on acm.timus.ru generate-random.py is given "Exceeding the output limit".
To improve performance, you can use bytes.translate() method ( 0.11 seconds):
#!/usr/bin/env python3 import os import sys # make translation table from 0..255 to 97..122 tbl = bytes.maketrans(bytearray(range(256)), bytearray([ord(b'a') + b % 26 for b in range(256)])) # generate random bytes and translate them to lowercase ascii sys.stdout.buffer.write(os.urandom(1000000).translate(tbl))
How to fix % len_lc skew
256 (the number of bytes) is not evenly divided by 26 (the number of Latin letters is lower), so the formula min_lc + b % len_lc makes some values ββless rare than others, for example:
#!/usr/bin/env python3 """Find out skew: x = 97 + y % 26 where y is uniform from [0, 256) range.""" from collections import Counter, defaultdict def find_skew(random_bytes): char2freq = Counter(chr(ord(b'a') + b % 26) for b in random_bytes) freq2char = defaultdict(set) for char, freq in char2freq.items(): freq2char[freq].add(char) return {f: ''.join(sorted(c)) for f, c in freq2char.items()} print(find_skew(range(256)))
Here, the input range(256) evenly distributed (each byte occurs exactly once), but the 'wxyz' letters on the output are less common than the other 9 vs. 10 occurrences. To fix this, you can discard unaligned bytes:
print(find_skew(range(256 - (256 % 26))))
Here, the input of uniformly distributed bytes in the range [0, 234) output evenly distributed over the strings of letters ascii.
bytes.translate() takes a second argument to specify the bytes to delete:
#!/usr/bin/env python3 import os import sys nbytes = 256 nletters = 26 naligned = nbytes - (nbytes % nletters) tbl = bytes.maketrans(bytearray(range(naligned)), bytearray([ord(b'a') + b % nletters for b in range(naligned)])) bytes2delete = bytearray(range(naligned, nbytes)) R = lambda n: os.urandom(n).translate(tbl, bytes2delete) def write_random_ascii_lowercase_letters(write, n): """*write* *n* random ascii lowercase letters.""" while n > 0:
If a random generator ( os.urandom here) creates long sequences of bytes that are outside the aligned range ( >=234 ), then the while can execute many times.
Time productivity can be improved by another order if instead of random.getrandbits(8*n).to_bytes(n, 'big') "rel =" NOFOLLOW "> os.urandom(n) . The first uses Mersenne Twister as the main generator , which may be faster than os.urandom() , which uses sources provided by the operating system, which is more secure if you use a random string for secrets.