Answer to Q # 1
If regular pointers are faster and I already have shared pointers, what parameters do I have to call the method that the shared pointer points to?
operator-> inside boost::shared_ptr has a statement :
typename boost::detail::sp_member_access< T >::type operator-> () const { BOOST_ASSERT( px != 0 ); return px; }
So, first of all, make sure you have NDEBUG (usually it is done automatically in build versions):
#define NDEBUG
I did an assembler comparison between dereferencing boost::shared_ptr and a raw pointer:
template<int tag,typename T> NOINLINE void test(const T &p) { volatile auto anti_opti=0; ASM_MARKER<tag+0>(); anti_opti = p->data; anti_opti = p->data; ASM_MARKER<tag+1>(); (void)anti_opti; }
test<1000>(new Foo);
ASM test code when T is Foo* (don't be afraid, I have diff below):
_Z4testILi1000EP3FooEvRKT0_: .LFB4088: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq %rdi, %rbx subq $16, %rsp .cfi_def_cfa_offset 32 movl $0, 12(%rsp) call _Z10ASM_MARKERILi1000EEvv movq (%rbx), %rax movl (%rax), %eax movl %eax, 12(%rsp) movl %eax, 12(%rsp) call _Z10ASM_MARKERILi1001EEvv movl 12(%rsp), %eax addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc
test<2000>(boost::make_shared<Foo>());
ASM test code when T is boost::shared_ptr<Foo> :
_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_: .LFB4090: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq %rdi, %rbx subq $16, %rsp .cfi_def_cfa_offset 32 movl $0, 12(%rsp) call _Z10ASM_MARKERILi2000EEvv movq (%rbx), %rax movl (%rax), %eax movl %eax, 12(%rsp) movl %eax, 12(%rsp) call _Z10ASM_MARKERILi2001EEvv movl 12(%rsp), %eax addq $16, %rsp .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc
The diff -U 0 foo_p.asm shared_ptr_foo_p.asm command is output here:
--- foo_p.asm Fri Apr 12 10:38:05 2013 +++ shared_ptr_foo_p.asm Fri Apr 12 10:37:52 2013 @@ -1,2 +1,2 @@ -_Z4testILi1000EP3FooEvRKT0_: -.LFB4088: +_Z4testILi2000EN5boost10shared_ptrI3FooEEEvRKT0_: +.LFB4090: @@ -11 +11 @@ -call _Z10ASM_MARKERILi1000EEvv +call _Z10ASM_MARKERILi2000EEvv @@ -16 +16 @@ -call _Z10ASM_MARKERILi1001EEvv +call _Z10ASM_MARKERILi2001EEvv
As you can see, the only difference is the function signature and the tag value of the non-type template argument, the rest of the IDENTICAL code .
In general - shared_ptr very expensive - link counting is synchronized between threads (usually using atomic operations). If you use boost::intrusive_ptr , you can implement your own increment / decrement without thread synchronization, reference counting.
If you can afford to use unique_ptr or move semantics (via Boost.Move or C ++ 11) - then there will not be any reference counting - it will be faster even more.
LIVE DEMO WITH ASM OUTPUT
#define NDEBUG #include <boost/make_shared.hpp> #include <boost/shared_ptr.hpp> #define NOINLINE __attribute__ ((noinline)) template<int> NOINLINE void ASM_MARKER() { volatile auto anti_opti = 11; (void)anti_opti; } struct Foo { int data; }; template<int tag,typename T> NOINLINE void test(const T &p) { volatile auto anti_opti=0; ASM_MARKER<tag+0>(); anti_opti = p->data; anti_opti = p->data; ASM_MARKER<tag+1>(); (void)anti_opti; } int main() { { auto p = new Foo; test<1000>(p); delete p; } { test<2000>(boost::make_shared<Foo>()); } }
Answer to Q # 2
I have an instance method, which is quickly called, which creates std :: vector each time on the stack.
As a general rule, it is recommended that you try to reuse vector capacity to prevent costly redistributions. For example, it is better to replace:
{ for() { std::vector<value> temp;
with:
{ std::vector<value> temp; for() {
But it seems that because of the type std::map<std::string,std::vector<std::string>*> you are trying to do some kind of memoization .
As already suggested, instead of std::map , which has O (ln (N)) search / insert, you can try using boost::unordered_map / std::unordered_map , which has O (1) average and O (N ) worst search / insert complexity and better locality / compactness (in terms of caching).
Also, cosider to try Boost.Flyweight :
Flyweights are small classes of descriptors that provide constant access to common shared data, which allows you to manage large amounts of entities within reasonable memory limits. Boost.Flyweight simplifies the use of this common programming idiom by providing a flyweight class template that acts as a replacement for replacing const T >.