Another possibility might be to generate it in a package (i.e., compute a lot of submatrices and put them together at the very end). But you should not update one array ( mask ) in a for loop, as the OP does. This will cause the entire array to load in main memory during each index update.
Instead, for example: to get 30000x30000 , you have 9000 100x100 separate arrays, update each of these arrays 100x100 respectively in a for loop, and finally connect these 9000 arrays together in a giant array. This, of course, should be no more than 4 GB of RAM and will be very fast.
Minimal example:
In [9]: a Out[9]: array([[0, 1], [2, 3]]) In [10]: np.hstack([np.vstack([a]*5)]*5) Out[10]: array([[0, 1, 0, 1, 0, 1, 0, 1, 0, 1], [2, 3, 2, 3, 2, 3, 2, 3, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1], [2, 3, 2, 3, 2, 3, 2, 3, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1], [2, 3, 2, 3, 2, 3, 2, 3, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1], [2, 3, 2, 3, 2, 3, 2, 3, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1], [2, 3, 2, 3, 2, 3, 2, 3, 2, 3]]) In [11]: np.hstack([np.vstack([a]*5)]*5).shape Out[11]: (10, 10)