I'm looking for memory optimization np.packbits(A==A[:, None], axis=1), where Ais a dense array of integers long n. A==A[:, None]is a hungry memory for large ones n, since the resulting Boolean array is stored inefficiently with each 1-byte Boolean value.
I wrote a script below to achieve the same result while packing bits in one section. This, however, is about 3 times slower, so I'm looking for ways to speed it up. Or, alternatively, a better algorithm with low memory overhead.
Note: this is the next question I asked earlier; Comparing a numpy array with itself over an element is efficient .
The reproducible code below is for benchmarking.
import numpy as np
from numba import jit
@jit(nopython=True)
def bool2int(x):
y = 0
for i, j in enumerate(x):
if j: y += int(j)<<(7-i)
return y
@jit(nopython=True)
def compare_elementwise(arr, result, section):
n = len(arr)
for row in range(n):
for col in range(n):
section[col%8] = arr[row] == arr[col]
if ((col + 1) % 8 == 0) or (col == (n-1)):
result[row, col // 8] = bool2int(section)
section[:] = 0
return result
n = 10000
A = np.random.randint(0, 1000, n)
result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)
packed = compare_elementwise(A, result_arr, selection_arr)
packed2 = np.packbits(A == A[:, None], axis=1)
assert (packed == packed2).all()
%timeit compare_elementwise(A, result_arr, selection_arr)
%timeit np.packbits(A == A[:, None], axis=1)