Array performance is very similar to LinkedList - What gives?

Question

Array performance is very similar to LinkedList - What gives?

So, the name is somewhat misleading ... I will keep it simple: I am comparing these two data structures:

An array in which it starts at size 1, and for each subsequent addition there is a call to realloc () to expand the memory, and then add a new (malloced) element at position n-1.
A linked list in which I track head, tail and size. And the addition includes mallocing for the new element and updating the pointer and tail size.

Do not worry about any other details of these data structures. This is the only function I am involved with this testing.

In theory, LL should work better. However, they are almost identical in time, including 10, 100, 1000 ... up to 5,000,000 elements.

The gut feeling is a big pile. I think the default data segment is 10 megabytes on Redhat? I could be wrong. In any case, realloc () first checks to see if space is available at the end of an already allocated contiguous memory location (0- [n-1]). If the nth position is available, the movement of elements does not occur. Instead, realloc () simply reserves the old space + immediately the next space. It's hard for me to find evidence of this, and I have to complicate the time by proving that this array should work worse in practice than LL.

Following is the following analysis after reading the posts below:

[Update # 1] I changed the code to have a separate list, which mallocs stores every 50th iteration for LL and Array. For 1 million additions to the array of almost 18 moves. There is no concept of transition to LL. I made a time comparison, they are still almost identical. Here is some conclusion for 10 million additions:

(Array) time ./a.out a 10,000,000 real 0m31.266s user 0m4.482s sys 0m1.493s (LL) time ./a.out l 10,000,000 real 0m31.057s user 0m4.696s sys 0m1.297s

I would expect the time to be dramatically different with 18 moves. Adding an array requires one more assignment and one more comparison to get and check the return value of realloc to ensure that the transition has occurred.

[Update # 2] I tested ltrace on the testing I wrote above, and I think this is an interesting result ... It seems that realloc (or some memory manager) proactively moves the array to larger adjacent locations based on the current size. For 500 iterations, iterations triggered a memory run: 1, 2, 4, 7, 11, 18, 28, 43, 66, 101, 154, 235, 358 This is pretty close to the summation sequence. I find it quite interesting - I thought that I would publish it.

+4

performance c linked-list arrays unix

Jmoney38 30 sept '10 at 12:43

source share

8 answers

You are correct, realloc will simply increase the size of the allocated block if this is not prevented. In a real-world scenario, will you most likely have other objects allocated on the heap between subsequent additions to the list? In this case, realloc will have to allocate a completely new piece of memory and copy the elements that are already in the list.

Try allocating another object on the heap using malloc for every ten inserts or so, and see if they all execute.

+7

harald 30 sept '10 at 12:54

source share

Assuming your linked list is a pointer to the first item, if you want to add an item to the end, you should first look at the list. This is an O(n) operation.

Assuming that realloc should move the array to a new location, it must cross the array in order to copy it. This is an O(n) operation.

In terms of complexity, both operations are equal. However, as others have noted, realloc can avoid moving the array, in which case adding an element to the O(1) array. Others also noted that the vast majority of your programming time was probably spent on malloc / realloc , which both implementations call once per addition.

Finally, another reason the array is probably faster is cache coherency and, as a rule, high linear copy performance. Jumping around to erratic addresses with significant gaps between them (with both large elements and malloc accounting) is usually not as fast as making a mass copy of the same amount of data.

+1

R .. 30 sept '10 at 13:07

source share

The performance of an array-based solution extended with realloc() will depend on your strategy for creating more space.

If you increase the amount of space by adding a fixed amount of memory each time you redistribute, you get an extension, which on average depends on the number of elements that you have stored in the array. This is done on the assumption that realloc needs to (sometimes) allocate space elsewhere and copy content, rather than just expanding an existing distribution.

If you increase the amount of space by adding a fraction of your current number of elements (doubling is pretty standard), you will get an extension that takes an average of constant time.

+1

Vatine 30 sept '10 at 14:13

source share

(Updated). As others noted, if there are no other recalculations, then copying is not required. Also, as others have noted, the risk of memory copying is reduced (but also its effect, of course) on very small blocks, less than on a page.

In addition, if all you do in your test is to allocate a new memory space, I am not very surprised that you see a small difference, because system calls are usually allocated to save memory.

Instead, select your data structures depending on how you want to use them. For example, a framebuffer is probably best represented by an adjacent array.

A linked list is probably better if you need to quickly reorganize or sort data in a structure.

Then these operations will be more or less effective depending on what you want to do.

(Thanks for the comments below, I was initially confused about how this works.)

0

Prof. Falken 30 sept '10 at 12:47

source share

What is the foundation of your theory that a linked list should work better for inserts at the end? I would not expect this, precisely for the reason that you spoke about. realloc will only be copied when it needs to maintain contact; in other cases, it may be necessary to combine the free pieces and / or increase the size of the piece.

However, for each linked list, node requires a new selection and (subject to a double linked list) two entries. If you need proof of how realloc works, you can simply compare the pointer before and after realloc . You should find that it usually does not change.

I suspect that since you call realloc for each element (obviously unreasonable in production), calling realloc / malloc is the biggest bottleneck for both tests, although realloc often doesn’t provide a new pointer.

In addition, you mix heap and data segments. A heap is a place where memory with malloced is stored. The data segment is intended for global and static variables.

0

Matthew flaschen 30 sept '10 at 12:50

source share

Will the output in the compiler be very different in these two cases?

0

dwarFish 30 sept '10 at 12:55

source share

This is not a real life situation. Presumably, in real life, you are interested in looking at or even deleting elements from your data structures, as well as adding them.

If you allow deletion, but only from the head, the linked list becomes better than an array, because deleting an element is trivial and if instead of releasing the deleted element you put it in a free list that needs to be recycled, you can eliminate many mullocks needed when adding items to the list.

On the other hand, if you need random access to the structure, it is obvious that the array beats the linked list.

0

Jeremyp 30 sept '10 at 13:48

source share

PP · Accepted Answer · 2010-09-30T12:50:52+0000

So, you are testing how quickly you can expand the array associated with a linked list?

In both cases, you call the memory allocation function. Typically, memory allocation functions grab a chunk of memory (possibly a page) from the operating system, and then split it into smaller parts, as your application requires.

Another assumption is that from time to time realloc () will spit out the dummy and allocate a large chunk of memory elsewhere, because it cannot get contiguous chunks on the current highlighted page. If you do not make any other calls to the memory allocation functions between your list, this will not happen. And perhaps using your virtual memory in the operating system means that your bunch of programs are expanding in concert, no matter where the physical pages come from. In this case, the performance will be identical to the collection of malloc () calls.

Expect performance to change where you mix calls to malloc () and realloc ().

Array performance is very similar to LinkedList - What gives?

More articles: