I think that you can more efficiently calculate your results using mathematical operations with numpy, rather than string operations. Try:
shift = np.arange(n*n).reshape(n, n)
for j in range(2**(n*n)):
yield j >> shift & 1
Perhaps you can use numpy to parallelize the loop j, but this can use a lot more memory than the current generator version.