What is the time complexity of indexing a numpy array directly

Question

What is the time complexity of indexing a numpy array directly

I assume that with a numpy array, let's say

>>>>nArray array([[ 23425. , 521331.40625], [ 23465. , 521246.03125], [ 23505. , 528602.8125 ], [ 23545. , 531934.75 ], [ 23585. , 534916.375 ], [ 23865. , 527971.1875 ]])

direct indexing should be quite effective.

I suppose something like this nArray[0, 1] = 69696420 should use a hash table that will give time complexity close to O (1). Is it correct?

Update

As noted in both answers, hashing is not related to indexing a numpy array. Both answers provide a clear explanation of how indexing occurs.

update 2

I added simple benchmarking to prove the answers are correct

+7

python big-o numpy

LetsPlayYahtzee Feb 18 '16 at 14:57

source share

3 answers

No hash table. Numpy arrays are arrays, as the name implies, and the address is calculated as follows:

 address of nArray[x, y] = base address + A * x + B * y

+3

Dietrich epp Feb 18 '16 at 15:03

source share

To add additional verification by testing in Ami Answer, I made a simple ring buffer from a numpy array that uses only direct indexing to create inserts. Basically, each insertion simply changes the values of the oldest element in the queue.

The code is not completely trouble-free, but it can serve as the basis for simple performance testing.

 import math import numpy as np class CircFifo(): """ helper class, uses a numpy array to provide a circular fixed size fifo interface. put(element): removes the oldest element and places a new one get(): returns the oldest entry empty(): returns true if fifo is empty full(): returns true if fifo is full """ def __init__(self, size): self.array = np.empty(shape=(size, 2)) self.size = size self.array[:] = np.NAN self.top = 0 self.bottom = 0 def put(self, row): self.array[self.top, :] = row self.top += 1 if self.top == self.size: self.top = 0 def get(self): if not math.isnan(self.array[self.bottom, 0]): row = copy.deepcopy(self.array[self.bottom, :]) self.array[self.bottom, :] = float('NaN') self.bottom += 1 if self.bottom == self.size: self.bottom = 0 if math.isnan(self.array[self.bottom, 0]): self.bottom = 0 self.top = 0 return row def empty(self): if math.isnan(self.array[self.bottom, 0]): return True else: return False def full(self): if self.size - np.count_nonzero( np.isnan(self.array[:, 0])) == self.size: return True else: return False

The correctness of the responses in the message seems to be confirmed by a simple test that I run. I checked the performance of the insert against the deque object. Even for 1000 attachments, deque, which is also a server as a dynamic rather than a static data structure (unlike my static circular spot), clearly outperforms the circular fifo.

Here is a simple test that I run

 In [5]: import time In [6]: circFifo = CircFifo(300) In [7]: elapsedTime = 0 In [8]: for i in range(1, 1000): ...: start = time.time() ...: circFifo.put(np.array([[52, 12]])) ...: elapsedTime += time.time() - start ...: In [9]: elapsedTime Out[9]: 0.010616540908813477 In [21]: queue = deque() In [22]: elapsedTime = 0 In [23]: for i in range(1, 1000): ....: start = time.time() ....: queue.append(np.array([[52, 12]])) ....: elapsedTime += time.time() - start ....: In [24]: elapsedTime Out[24]: 0.00482630729675293

I know that this test is not informative, but it becomes quite obvious that deque is much faster. At least this is the number of inserts.

I would expect that if this circular fifo were implemented in C with a static array, it could not be easily beaten. Since basically a static array of C is the simplest and has a smaller data structure available.

+1

LetsPlayYahtzee Feb 19 '16 at 11:44

source share

Ami tavory · Accepted Answer · 2016-02-18T15:07:05+0000

One side

should use a hash table that will give a time complexity close to O (1). Is it correct?

not quite right. The Numpy array is basically contiguous blocks of uniform memory , with additional information about the side in size and the like. Therefore, access is O (1) and simply includes some trivial math to determine the position inside the memory.

On the other hand,

indexing should be quite effective.

Unfortunately, this is not true at all. Based on bounds checking (what arrays do), everything related to pure python is extremely inefficient (and access involves pure-python calls). Access to an array of arrays is an exception . You should try to use vector operations whenever possible.

What is the time complexity of indexing a numpy array directly

More articles: