Linked list versus dynamic array for stack implementation using vector class

I read about two different ways to implement a stack: a linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not need to be changed, whereas a dynamic array had to be changed if too many elements were inserted, so wasting a lot of time and memory.

This made me wonder if this is true for C ++ (since there is a vector class that automatically changes each time new elements are added)?

+4
source share
8 answers

It is difficult to compare the two, because their memory usage patterns are completely different.

Vector Resize

A vector changes itself dynamically as necessary. He does this by allocating a new piece of memory, moving (or copying) data from the old piece to a new piece, freeing the old one. In a typical case, a new piece is 1.5 times larger than the old one (contrary to popular belief, 2x in practice seems rather unusual). This means that for a short time, redistribution requires about 2.5 times more memory than the data that you actually save. The rest of the time, the “piece” that is used is at least 2/3 rds and the maximum is completely full. If all sizes are equally likely, we can expect it to be on average about 5/6 ths filled. Looking at it from a different direction, we can expect about 1/6 th or about 17% of the space that will be "wasted" at any given time.

When we resize using such a constant factor (instead of, for example, always adding a certain piece size, for example, increasing in increments of 4 Kb), we get what is called added time with added depreciation. In other words, as the array grows, resizing occurs exponentially less often. The average number of times the array elements were copied tends to be constant (usually around 3, but depends on the growth factor used).

highlighted link lists

Using a linked list, the situation is quite different. We never see resizing, so we don’t see extra time or memory usage for some inserts. At the same time, we see that the extra time and memory are used almost all the time. In particular, each node in the linked list must contain a pointer to the next node. Depending on the size of the data in the node compared to the size of the pointer, this can lead to significant overhead. For example, suppose you need an int s stack. In a typical case, when an int is the same size as the pointer, this will mean 50% of the overhead - all the time. For a pointer, more than int ; twice the size is pretty common (64-bit pointer, 32-bit int). In this case, you have an overhead of ~ 67%, that is, obviously, each node allocates twice as much space for the pointer when saving data.

Unfortunately, this is often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least, if you store small data elements (for example, int ), then the memory allocated for node can be (usually will be) even more than the amount you requested. So, you are requesting 12 bytes of memory to store the int and pointer, but the piece of memory that you get will most likely be rounded to 16 or 32 bytes. Now you are looking at the top at least 75% and possibly at ~ 88%.

Regarding speed, the situation is quite similar: dynamic allocation and freeing of memory often happens rather slowly. The heap manager usually has blocks of free memory and should spend time looking through them to find the block that is most suitable for the size you are asking for. Then it (as a rule) should break this block into two parts, one to satisfy your allocation, and another the remaining memory which it can use to satisfy other distributions. Similarly, when you free memory, it usually goes back to the same list of free blocks and checks if there is already a contiguous memory block, so it can join the two back together.

Allocating and managing multiple memory blocks is expensive.

cache usage

Finally, with recent processors, we are faced with another important factor: the use of cache. In the case of a vector, we have all the data next to each other. Then, after the end of the part of the vector that is used, we have some empty memory. This results in superior cache usage — the data we use is cached; data that we don’t use has practically no effect on the cache.

With a linked list, pointers (and the possible overhead in each node) propagate throughout the list. I., each piece of data that we care about has a pointer overhead and a blank space allocated for the node that we don’t use. In short, the effective cache size decreases by about the same factor as the total overhead of each node in the list, that is, we can easily see only 1/8 th of the cache storing the date we care about, and 7/8 ths of those intended for storing pointers and / or clean trash.

Summary

A linked list may work well if you have a relatively small number of nodes, each of which individually is large enough. If (as is more typical of the stack), you are dealing with a relatively large number of elements, each of which is individually small, you are much less likely to see time savings or memory usage. In contrast, for such cases, a linked list is much more likely to be wasted a lot of time and memory.

+9
source

Yes, what you are saying is true for C ++. For this reason, the default container std::stack , which is the standard stack class in C ++, is neither a vector nor a linked list, but a double-ended queue (a deque ). This has almost all the advantages of a vector, but it changes significantly.

Basically, std::deque is a linked list of sort arrays inside. That way, when you need to resize, it just adds another array.

+3
source

First, the performance tradeoff between linked lists and dynamic arrays is much more subtle.

A vector class in C ++, on demand, is implemented as a “dynamic array”, which means that it must have an amortized constant cost for inserting elements into it. How this is done, usually by increasing the "capacity" of the array geometrically, that is, you double the capacity whenever you finish (or approach the expiration). In the end, this means that the operation of redistribution (allocation of a new piece of memory and copying the current contents into it) will occur only in a few cases. In practice, this means that the overhead for redistributions is displayed only on the performance graphs, like small spikes at logarithmic intervals. This is what the “amortized constant” value means, because as soon as you neglect these small spikes, the cost of the insert operations is basically constant (and in this case trivial).

In the implementation of the linked list, you do not have the overhead of redistributing, however, you have the overhead of allocating each new element to freestore (dynamic memory). Thus, the overhead is a little more regular (not spiked, which may be required sometimes), but can be more significant than using a dynamic array, especially if the elements are pretty cheap to copy (small in size and simple object). In my opinion, linked lists are only recommended for objects that are really expensive to copy (or move). But in the end, this is what you need to check in any situation.

Finally, it is important to point out that link locality is often a determining factor for any application that makes extensive use and circumvention of elements. When using a dynamic array, the elements are packed together in memory one by one, and traversing in order is very efficient, since the CPU can pre-cache memory before read / write operations. In the implementation of vanilla-related lists, transitions from one element to another are usually associated with rather unstable transitions between wildly different memory cells, which actually disables this “pre-fetching” behavior. Thus, if the individual list items are not very large, and the operations with them are usually very long to complete, this lack of prefetching using a linked list will be the dominant performance problem.

As you can guess, I rarely use a linked list ( std::list ), since the number of useful applications is small and far. Very often for large and expensive objects it is usually preferable to use a pointer vector (you get basically the same performance advantages (and disadvantages) as a linked list, but with less memory use (for binding pointers), and you get random access if it is necessary).

The main case I can think of when a linked list wins over a dynamic array (or a segmented dynamic array like std::deque ) is when you often need to insert elements in the middle (not at both ends), however, such situations usually occur when you save a sorted (or sorted, in some way) set of elements, in which case you should use a tree structure to store the elements (like binary search tree (BST)), an unrelated list. And often, such trees store their nodes (elements) using a semi-intersecting memory scheme (for example, a breadboard layout) in a dynamic array or segmented dynamic array (for example, a sloppy dynamic array).

+2
source

Yes, this is true for C++ or any other language. Dynamic array concept . The fact that C ++ has a vector does not change the theory. A vector in C++ actually performs resizing internally, so this task is not the responsibility of the developers. The actual cost does not magically disappear when using vector , it is simply uploaded to the standard library implementation.

+1
source

std::vector is implemented using a dynamic array, while std::list is implemented as a linked list. There are tradeoffs for using both data structures. Choose the one that best suits your needs.

  • As you pointed out, a dynamic array may take longer to add an element if it fills, as it should expand. However, it is faster available, since all its members are grouped in memory. This tight grouping also usually makes it more convenient to cache.

  • Linked lists do not require resizing, but moving them takes longer because the processor must jump in memory.

+1
source

I was wondering if this is true for C ++, since there is a vector class that automatically changes each time new elements are added.

Yes, it still persists because resizing vector is a potentially expensive operation. Inside, if the pre-allocated size for the vector is reached, and you try to add new elements, a new distribution occurs and the old data is moved to a new memory location.

0
source

From the C ++ documentation :

vector :: push_back - add an element to the end

Adds a new element at the end of the vector after its current last element. The contents of val are copied (or moved) to the new element.

This effectively increases the size of the container by one, which leads to automatic redistribution of the allocated storage space if - and only if - the new vector size exceeds the current capacity of the vector.

0
source

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style Skip to 44:40. You prefer std::vector possible over std::list , as described in the video, by Bjarn himself. Since std::vector stores all its elements next to each other, in memory, and because of this it will have the advantage of caching in memory. And this is true for adding and removing elements from std::vector , as well as for searching. He claims that std::list is 50-100x slower than a std::vector .

If you really need a stack, you really should use std::stack instead of creating your own.

0
source

All Articles