Why is this giant (unsharp) zero-value matrix in RAM

Question

Why is this giant (unsharp) zero-value matrix in RAM

I am very confused about what numpy.ndarray.nbytes is numpy.ndarray.nbytes .

I just created a 1 million identification matrix (10 ^ 6), which therefore has 1 trillion rows (10 ^ 12). Numpy reports that this array is 7.28TB, but the python process uses only 3.98 GB of memory, as reported by the OSX activity monitor.

Is the entire array contained in memory?
How does Numpy somehow compress its view or is it handled by the OS?
If I just calculated y = 2 * x , which should be the same size as x , the process memory will increase to about 30 GB until it is killed by the OS. Why and what operations can I carry out on x without memory expansion, so many?

This is the code I used:

 import numpy as np x = np.identity(1e6) x.size # 1000000000000 x.nbytes / 1024 ** 4 # 7.275957614183426 y = 2 * x # python console exits and terminal shows: Killed: 9

+7

python numpy matrix

Rems Feb 13 '16 at 16:53

source share

2 answers

The system allocates only virtual memory, only the first time you write this memory, it is actually used physically. For your example, you allocate 1 trillion numbers, which corresponds to 2 billion pages of memory, but only 1 million (1e6) of these pages is used for writing on the diagonal. This is exactly 4 GB of memory that you see.

+1

Daniel Feb 13 '16 at 17:09

source share

Colonel Thirty Two · Accepted Answer · 2016-02-13T17:03:44+0000

On Linux (and I assume the same thing happens on Mac), when a program allocates memory, the OS does not actually allocate it until it uses it.

If the program never uses memory, then the OS should not spend RAM on it, but it puts the OS in the place when the program asks for a ton of memory and actually needs to use it, but the OS is not enough.

If this happens, the OS can either start killing other secondary processes, or give RAM to the request process, or just kill the request process (what is happening now).

The original 4 GB of memory that Python uses is most likely a page where numpy sets 1 in the identity matrix; the remaining pages have not yet been used. Performing a mathematical operation, such as 2*x , starts accessing (and thus alloocating) all pages until the OS runs out of memory and kills your process.

Why is this giant (unsharp) zero-value matrix in RAM

More articles: