Will a custom dispenser improve performance if many vector <T> s are created and destroyed?
In the code below, many vectors with every 10 ints are created with a probability of 60% or an existing vector is deleted with a probability of 40%. Thus, there will be many calls for new / malloc and deletion. Since all these vectors are of type vector<int> , can a custom allocator help here to reduce calls to new and delete and therefore increase performance? The idea is that the space of the removed vector can be reused by the newly constructed one. What would such a distributor look like?
Note. This question is about dispensers, which reduces the number of calls to new and delete .
#include <iostream> #include <vector> #include <random> using namespace std; int main() { // Random generator and distribution mt19937 gen(123456); uniform_real_distribution<> dis01(0., 1.); // Make or delete 10E6 vectors. vector< vector<int> > v; //the inner vectors will make many calls to new and delete v.reserve(10E5); //assume some size. for(int i=0; i<10E6; ++i) { if(dis01(gen)<0.6) // if true: make new sub-vector { v.emplace_back(); //new sub-vector v.back().reserve(10); for(int k=0; k<10; ++k) v.back().emplace_back(k); //fill it with some numbers } else // else, delete the last entry if there is one. if(!v.empty()) v.pop_back(); } cout<<"v.size()= "<<v.size(); return 0; } This is for C ++ 11. Old standards need additional material implemented in the distributor [1].
This is the code for the evidence concept. It works and solves an example but has several limitations. It still demonstrates how a custom allocator can be used to improve performance in a scenario where a lot of std::vector created and destroyed.
PoolAlloc.hh :
template<typename T> struct MemChunk { std::size_t buf_size=0; T* buf=nullptr; T* top=nullptr; std::size_t used=0; }; template<typename T> class PoolAllocator { public: using value_type=T; PoolAllocator(); explicit PoolAllocator(std::size_t); PoolAllocator(PoolAllocator const&) noexcept; template<typename U> PoolAllocator(PoolAllocator<U> const&) noexcept; PoolAllocator(PoolAllocator&&) noexcept; PoolAllocator& operator=(PoolAllocator const&)=delete; PoolAllocator& operator=(PoolAllocator&&)=delete; ~PoolAllocator(); template <typename U> struct rebind { using other=PoolAllocator<U>; }; T* allocate(std::size_t); void deallocate(T*, std::size_t) noexcept; template<typename U1, typename U2> friend bool operator==(PoolAllocator<U1> const&, PoolAllocator<U2> const&) noexcept; private: std::vector<MemChunk<T>>* memory_=nullptr; int* ref_count_=nullptr; std::size_t default_buf_size_=0; }; template<typename T> PoolAllocator<T>::PoolAllocator(): PoolAllocator{100000} {} template<typename T> PoolAllocator<T>::PoolAllocator(std::size_t buf_size): memory_{new std::vector<MemChunk<T>>}, ref_count_{new int(0)}, default_buf_size_{buf_size} { memory_->emplace_back(); memory_->back().buf_size=buf_size; memory_->back().buf=new T[buf_size]; memory_->back().top=memory_->back().buf; ++(*ref_count_); } template<typename T> PoolAllocator<T>::PoolAllocator(PoolAllocator const& src) noexcept: memory_{src.memory_}, ref_count_{src.ref_count_}, default_buf_size_{src.default_buf_size_} { ++(*ref_count_); } template<typename T> PoolAllocator<T>::PoolAllocator(PoolAllocator&& src) noexcept: memory_{src.memory_}, ref_count_{src.ref_count_}, default_buf_size_{src.default_buf_size_} { src.memory_=nullptr; src.ref_count_=nullptr; } template<typename T> template<typename U> PoolAllocator<T>::PoolAllocator(PoolAllocator<U> const& src) noexcept: memory_{src.memory_}, ref_count_{src.ref_count_}, default_buf_size_{src.default_buf_size_} { ++(*ref_count_); } template<typename T> PoolAllocator<T>::~PoolAllocator() { if (ref_count_!=nullptr) { --(*ref_count_); if (*ref_count_==0) { if (memory_!=nullptr) { for (auto& it : *memory_) { delete[] it.buf; } delete memory_; } delete ref_count_; } } } template<typename T> T* PoolAllocator<T>::allocate(std::size_t n) { MemChunk<T>* mem_chunk=&memory_->back(); if ((mem_chunk->used+n)>mem_chunk->buf_size) { default_buf_size_*=2; memory_->emplace_back(); mem_chunk=&memory_->back(); std::size_t buf_size=default_buf_size_; if (n>default_buf_size_) { buf_size=n; } mem_chunk->buf_size=buf_size; mem_chunk->buf=new T[mem_chunk->buf_size]; mem_chunk->top=mem_chunk->buf; } T* r=mem_chunk->top; mem_chunk->top+=n; mem_chunk->used+=n; return r; } template<typename T> void PoolAllocator<T>::deallocate(T* addr, std::size_t n) noexcept { MemChunk<T>* mem_chunk=&memory_->back(); if (mem_chunk->used>n and (mem_chunk->top-n)==addr) { mem_chunk->used-=n; mem_chunk->top-=n; } } template<typename U1, typename U2> bool operator==(PoolAllocator<U1> const& lhs, PoolAllocator<U2> const& rhs) noexcept { return (std::is_same<U1, U2>::value and lhs.memory_==rhs.memory_); } The use of your example is modified as follows:
#include <iostream> #include <vector> #include <random> #include "PoolAlloc.hh" using namespace std; int main() { // Random generator and distribution mt19937 gen(123456); uniform_real_distribution<> dis01(0., 1.); PoolAllocator<int> palloc{1000000}; // Make or delete 10E6 vectors. vector< vector<int, PoolAllocator<int>> > v; //the inner vectors will make many calls to new and delete v.reserve(10E5); //assume some size. for(int i=0; i<10E6; ++i) { if(dis01(gen)<0.6) // if true: make new sub-vector { v.emplace_back(palloc); //new sub-vector v.back().reserve(10); for(int k=0; k<10; ++k) v.back().emplace_back(k); //fill it with some numbers } else // else, delete the last entry if there is one. if(!v.empty()) v.pop_back(); } cout<<"v.size()= "<<v.size(); return 0; } The number of malloc calls drops from ~ 6e6 to 21. The total number of instructions decreases from 3.7e9 to 2.5e9 (using -O3, measured with valgrind --tool=callgrind ).
There are several implementation details that will affect performance in different usage situations.
Currently, several buffers are used. If one is full, the other is created. Thus, there should never be a redistribution that will lead you into the world of pain (see comments).
The biggest question: how to deal with freed memory. Currently, a trivial approach is used, which only frees the memory available for subsequent allocation when it is at the end of the buffer. For your example, this is enough, since you only free up memory at the end of the buffer.
For more complex scenarios, you will need a more complex mechanism. To store addresses, some data structure and sizes of free pieces of memory are required. Multiple concepts are possible here and their performance will depend on the specific situation they are used. I doubt there is a good one-time solution here.
You could get some performance by writing a allocator to more efficiently reuse recently freed memory, especially if all vectors are size 10. Of course, if that happens, you will get more performance by using a fixed size object. If the distribution size for vectors should be dynamic, then your problem will be as abstract as allocating shared memory, and you are unlikely to improve the standard allocator.
You most likely will not improve performance compared to STL if you cannot use information that relates to your specific case, but no more general case.
The best solution would be to not delete the vector objects, but just leave them in the vector>, maintain an iterator / pointer to the "end" of the vector (decreasing instead of deleting), and then instead of placing the element (building the vector), you simply promote your iterator, check .end() , and then replace if necessary, otherwise reuse the old vector. This assumes that your program does not rely on the side effects of the constructor or destructor (the vector does not, but you are not telling us your actual use case).
As I understand from https://en.wikipedia.org/wiki/Allocator_(C%2B%2B) , C ++ - dispensers reduce distribution and release requests for a specific container. I assume this means that creating and deleting containers still requires new and delete calls.
You might want to check out https://github.com/gperftools/gperftools . This is a replacement for malloc. He claims improvements in the distribution of small objects, especially in multi-threaded programs.
Custom dispensers may solve some problems, but this is not a silver bullet. There is not enough example to find out what the best solution will be. I am going to offer something else, not because it is better, but because in some case it may be better.
v.resize(10E5, std::vector<int>(10)); Instead
v.reserve(10E5); But then you need an iterator for the next free slot on the vector and all this good stuff.
Have you tried the existing forward pool allocator?
http://theboostcpplibraries.com/boost.pool
I think if it comes to reusing the memory of the "std :: vector object", this should be somewhat related to the creation / replacement pool.
As I can see, custom allocators do offer an advantage over standard allocators when you know exactly how memory will be used. In general, you compromise on the / perf size, and a custom allocator is what allows you to control this decision.
If in your example you can use page size chunks for each list, you can just keep a free list of pages and pass them in all future distributions. This will have a lot of memory overhead if you really only have ten ints in each list, but can be a big win if the lists are larger and can be executed without calling a new one or deleting an int. This essentially creates a fixed-size memory pool for each list. When you are done with the list, you simply return the page to the free list and use it for the next list.