Python string memory usage for FreeBSD

Question

Python string memory usage for FreeBSD

I am observing a strange pattern of memory usage with python strings on Freebsd. Consider the next session. The idea is to create a list that contains some so that the cumulative characters in the list are 100 MB.

l = [] for i in xrange(100000): l.append(str(i) * (1000/len(str(i))))

It uses about 100 MB of memory as expected, and "del l" will clear it.

 l = [] for i in xrange(20000): l.append(str(i) * (5000/len(str(i))))

It uses 165 MB of memory. I really don’t understand where from using extra memory. [The size of both lists is the same]

Python 2.6.4 on FreeBSD 7.2. On Linux / Windows, both use only 100 MB of memory.

Update: I am measuring memory using "ps aux". This can be done using os.sytem after the code snippets. They were also performed separately.

Update2: It looks like freebsd mallocs memory several times. Thus, 5KB allocation actually allocates 8KB. However, I am not sure.

+6

python malloc freebsd

amit Mar 17 '11 at 17:08

source share

3 answers

Fang-pen lin · Answer 1 · 2011-03-19T06:18:53+0000

In my opinion, these are probably fragments in memory. First of all, chunks of memory that are longer than 255 bytes will be allocated using malloc in CPython. You can link to

Improving Python Memory Allocator

For performance reasons, most of the memory allocation, such as malloc, will return an aligned address. For example, you will never get an address, for example

 0x00003

It is not aligned by 4 bytes, for a computer it will access memory very slowly. So the whole address you get through malloc should be

 0x00000 0x00004 0x00008

etc. Alignment by 4 bytes is only the basic general rule, and the actual alignment policy will be an OS variant.

And the memory usage you are talking about should be RSS (not sure). For most of the OS, the virtual memory page size is 4 KB. For what you allocate, you need 2 pages to store 5000 bytes. Let's see an example to illustrate a memory leak. Assume alignment is 256 bytes.

 0x00000 { ... chunk 1 0x01388 } 0x01389 { ... fragment 1 0x013FF } 0x01400 { ... chunk 2 0x02788 } 0x02789 { ... fragment 2 0x027FF } 0x02800 { ... chunk 3 0x03B88 } 0x03B89 { ... fragment 3 0x04000 }

As you can see, there are so many fragments in the memory, they cannot be used, but nevertheless they occupy the memory space on the page. I'm not sure what the FreeBSD alignment policy is, but I think this is due to such a reason. For efficient memory utilization using Python, you can use a large fragment of the pre-allocated bytearray and choose a good number as a piece to use (you need to check which number is better, it depends on the OS).

Apalala · Answer 2 · 2011-03-19T05:16:36+0000

The answer may be in this saga . I think you are witnessing some inevitable overhead memory manager.

As @Hossein says, try executing both code snippets in a single pass, and then replace them.

fabrizioM · Answer 3 · 2011-03-19T08:58:41+0000

I think that all memory addresses in freebsd should be power aligned two. Thus, all python memory pools are somewhat fragmented in memory, rather than continuous.

Try using some other tool to find something interesting.

Python string memory usage for FreeBSD

More articles: