Smart pointer. Remembering with std :: map

I am currently in the middle of a project where productivity is vital. Below are some of my questions on this issue.

Question1

My project includes a lot of boost::shared_ptr . I know that creating shared pointers in a run using boost::make_shared is slow, since there is a lot of overhead since you need to track links. I wanted to know that if common boost pointers were already created, would these two statements have the same performance or would be faster than the others. If regular pointers are faster and I already have shared pointers, what parameters do I have to call the method that the shared pointer points to?

  statement1: sharedptr->someMethod(); //here the pointer is a shared ptr created by boost::make_shared statement2: regularptr->someMethod(); //here the pointer is a regular one made with new 

Question 2

I have an instance method that is quickly called, which creates std::vector<std::string> on the stack each time. I decided to store this vector pointer in a static std :: map (ie) std::map<std::String,std::vector<std::string>*> . If the vector does not exist in the map for the key (which may be the name of the method). A valid vector address is created and added to the map. So my question is: is it worth looking for a map for a vector address and returning a valid address by simply creating it on the stack, for example std::vector<std::string> somevector . I would also like the idea of ​​performance std::map find.

Any ideas regarding this issue would be appreciated.

+6
source share
4 answers

Answer to Q # 1

If regular pointers are faster and I already have shared pointers, what parameters do I have to call the method that the shared pointer points to?

operator-> inside boost::shared_ptr has a statement :

 typename boost::detail::sp_member_access< T >::type operator-> () const { BOOST_ASSERT( px != 0 ); return px; } 

So, first of all, make sure you have NDEBUG (usually it is done automatically in build versions):

 #define NDEBUG 

I did an assembler comparison between dereferencing boost::shared_ptr and a raw pointer:

 template<int tag,typename T> NOINLINE void test(const T &p) { volatile auto anti_opti=0; ASM_MARKER<tag+0>(); anti_opti = p->data; anti_opti = p->data; ASM_MARKER<tag+1>(); (void)anti_opti; } 

 test<1000>(new Foo); 

ASM test code when T is Foo* (don't be afraid, I have diff below):

 _Z4testILi1000EP3FooEvRKT0_: .LFB4088: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq %rdi, %rbx subq $16, %rsp .cfi_def_cfa_offset 32 movl $0, 12(%rsp) call _Z10ASM_MARKERILi1000EEvv movq (%rbx), %rax movl (%rax), %eax movl %eax, 12(%rsp) movl %eax, 12(%rsp) call _Z10ASM_MARKERILi1001EEvv movl 12(%rsp), %eax addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc 

 test<2000>(boost::make_shared<Foo>()); 

ASM test code when T is boost::shared_ptr<Foo> :

 _Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_: .LFB4090: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq %rdi, %rbx subq $16, %rsp .cfi_def_cfa_offset 32 movl $0, 12(%rsp) call _Z10ASM_MARKERILi2000EEvv movq (%rbx), %rax movl (%rax), %eax movl %eax, 12(%rsp) movl %eax, 12(%rsp) call _Z10ASM_MARKERILi2001EEvv movl 12(%rsp), %eax addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc 

The diff -U 0 foo_p.asm shared_ptr_foo_p.asm command is output here:

 --- foo_p.asm Fri Apr 12 10:38:05 2013 +++ shared_ptr_foo_p.asm Fri Apr 12 10:37:52 2013 @@ -1,2 +1,2 @@ -_Z4testILi1000EP3FooEvRKT0_: -.LFB4088: +_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_: +.LFB4090: @@ -11 +11 @@ -call _Z10ASM_MARKERILi1000EEvv +call _Z10ASM_MARKERILi2000EEvv @@ -16 +16 @@ -call _Z10ASM_MARKERILi1001EEvv +call _Z10ASM_MARKERILi2001EEvv 

As you can see, the only difference is the function signature and the tag value of the non-type template argument, the rest of the IDENTICAL code .


In general - shared_ptr very expensive - link counting is synchronized between threads (usually using atomic operations). If you use boost::intrusive_ptr , you can implement your own increment / decrement without thread synchronization, reference counting.

If you can afford to use unique_ptr or move semantics (via Boost.Move or C ++ 11) - then there will not be any reference counting - it will be faster even more.

LIVE DEMO WITH ASM OUTPUT

 #define NDEBUG #include <boost/make_shared.hpp> #include <boost/shared_ptr.hpp> #define NOINLINE __attribute__ ((noinline)) template<int> NOINLINE void ASM_MARKER() { volatile auto anti_opti = 11; (void)anti_opti; } struct Foo { int data; }; template<int tag,typename T> NOINLINE void test(const T &p) { volatile auto anti_opti=0; ASM_MARKER<tag+0>(); anti_opti = p->data; anti_opti = p->data; ASM_MARKER<tag+1>(); (void)anti_opti; } int main() { { auto p = new Foo; test<1000>(p); delete p; } { test<2000>(boost::make_shared<Foo>()); } } 

Answer to Q # 2

I have an instance method, which is quickly called, which creates std :: vector each time on the stack.

As a general rule, it is recommended that you try to reuse vector capacity to prevent costly redistributions. For example, it is better to replace:

 { for(/*...*/) { std::vector<value> temp; // do work on temp } } 

with:

 { std::vector<value> temp; for(/*...*/) { // do work on temp temp.clear(); } } 

But it seems that because of the type std::map<std::string,std::vector<std::string>*> you are trying to do some kind of memoization .

As already suggested, instead of std::map , which has O (ln (N)) search / insert, you can try using boost::unordered_map / std::unordered_map , which has O (1) average and O (N ) worst search / insert complexity and better locality / compactness (in terms of caching).

Also, cosider to try Boost.Flyweight :

Flyweights are small classes of descriptors that provide constant access to common shared data, which allows you to manage large amounts of entities within reasonable memory limits. Boost.Flyweight simplifies the use of this common programming idiom by providing a flyweight class template that acts as a replacement for replacing const T >.

+13
source

For Question1:

Significant performance enhancement can be achieved when designing the architecture, using the algorithm and while low-level problems are also important only with a high level of development. Let's move on to your question: the performance of a regular pointer is higher than shared_ptr. But the amount of overhead that you do not see using shared_ptr is also greater, which increases the cost of maintaining the code in a longer mode. Avoid creating and destroying redundant objects in performance-critical applications. In such cases, shared_ptr plays an important role, which plays a role in sharing shared objects over threads by reducing the overhead of freeing resources. Yes, a common pointer consumes more time than regular pointers due to recounting, highlighting (object, counter, deletion), etc. you can make shared_ptr faster by preventing an unnecessary copy of them. Use it as ref (shared_ptr const &). In addition, you do not need shared resources by threads, do not use shared_ptr, and regular ptr will give better results in this case.

Question 2

If you want to use the shared object reuse pool of shared_ptr, you can better explore the approach to the object pool design pattern. http://en.wikipedia.org/wiki/Object_pool_pattern

+4
source

Question 1:

I use generic pointers extensively in my project, but I would not want to use shared_ptr<T> . This requires a heap object that is allocated separately from T itself, so the memory overhead is doubled, and memory usage is increased by an amount that depends on the implementation of the executable library. intrusive_ptr more efficient, but there is one key problem that annoys me, and this is a function call:

 void Foo(intrusive_ptr<T> x) {...} 

every time you call Foo, the reference counter of parameter x must be incremented with a relatively expensive atomic increment, and then decremented at the output. But this is redundant because you can usually assume that the caller already has a link to x and that the link is valid for the entire duration of the call. There are possible ways in which the caller may not have a link, but it is not difficult to write the code so that the link to the caller is always valid.

Therefore, I prefer to use my own class of smart pointers, which is the same as intrusive_ptr, except that it is implicitly converted to and from T *. Then I always declare that my methods use simple pointers, avoiding unnecessary reference counting:

 void Foo(T* x) {...} 

This approach has proven effective in my project, but to be honest, I never measured the difference in performance that it makes.

Also, if possible, prefer to use auto_ptr (C ++ 03) or unique_ptr (C ++ 11).

Question 2:

I do not understand why you are thinking about using std :: map. First of all, hash_map will be faster (as long as this is not an implementation of VC ++ Dinkumware in VS2008 / 2010, the details are somewhere here ), and secondly, if you need only one vector for the method, why not use a static variable like std::vector<std::string> ?

If you need to search for a vector in a hash table every time a method is called, I assume that you will save less time or less time compared to creating a new vector every time. If you look at the vector in std :: map, it will take even more time.

+2
source

Q1: just look at the implementation:

 T * operator-> () const // never throws { BOOST_ASSERT(px != 0); return px; } 

It is clear that it returns a member variable and does not compute anything on the fly, so the performance will be as fast as dereferencing a simple pointer (taking into account the usual tricks of optimization / performance of a non-optimized assembly compiler, you can always expect to suck - you should not consider) .

Q2: "is it worth looking for a map for the vector address and returning a valid address by simply creating it on the stack, for example std::vector<std::string> somevector . I also need an idea about the performance of std::map::find ".

Is it worth it depends on the amount of data that would need to be copied in vector , and to a lesser extent - the number of nodes in map , the length of the common prefixes in the keys being compared, etc. ... As always, if you're interested, a guideline. In general, though, I expect the answer to be yes if the vectors contain a significant amount of data (or this data is slowed down for regeneration). std::map is a binary binary tree, so in general you expect a search in O (log2N), where N is the current number of elements (i.e. size() ).

You can also use a hash table that gives O (1), which will be faster for a huge number of elements, but it is impossible to tell where the threshold is. Performance still depends on the high cost of the hash function that you use on your keys, their length (some hash implementations, such as Microsoft std::hash , include a maximum of 10 characters located along the hashed string, so there is an upper time limit expended, but significantly greater collision potential), hash table collision handling approaches (for example, offset lists for finding alternative buckets versus alternative hash functions compared to containers riveted from buckets) and the cl oneness of collision.

+1
source

All Articles