Passing std :: pow - cache?

I tried to optimize my digital program and ran into something mysterious. I iterate over a code that performs thousands of floating point operations, of which 1 is a pow call - however, this call takes 5% of the time ... This is not necessarily a critical problem, but it's weird, I like to understand what is happening.

When I was profiling for cache skips, the VS.NET 2010RC profiler reports that almost all of the cache misses occur in std::pow ... so ... what's up with that? Is there a faster alternative? I tried powf , but it is only a little faster; he is still responsible for the abnormal amount of cache misses.

Why does a main function like pow cause cache misses?

Edit: This is not managed code. /Oi enabled, but the compiler may choose to ignore this. Replacing pow(x,y) with exp(y*log(x)) has similar performance - now all cache misses are in the log function.

+7
c ++ profiling caching cpu
source share
4 answers

If you replace std::pow(var) with another function, like std::max(var, var) , does it still occupy 5%? Do you still have cache misses?

I guess that no on time, and yes to skipping the cache. Power calculation is slower than many other operations (which are you using?). Calling code that is not in the cache will lead to a miss in the cache, no matter what function it is.

+1
source share

Yes, it's slow. As for why in detail, someone who feels more confident may try to explain.

Want to speed it up? here: http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-cc/

+2
source share

Can you give more information about the x, as well as the environment in which pow is rated?

What you see may be hardware pickups at work. Depending on the profiler, the distribution of the β€œcost” of the various assembly instructions may not be correct, it should be even more frequent in long delayed instructions, such as those needed to evaluate pow.

Added to this, I would use a real profiler such as VTune / PTU than the one available in any version of Visual Studio.

+2
source share

If your code is associated with a lot of crunches, I would not be surprised that std::pow consumes 5% of the execution time. Many numerical operations are very fast, so a slower operation, such as std::pow , will take longer than other faster operations. (This also explains why you did not notice a big improvement in switching to std::powf .)

The disadvantages of the cache are somewhat more perplexing, and it is difficult to offer an explanation without additional data. One possibility is that if your other code is so memory intensive that it absorbs the entire allocated cache, then it is not surprising that std::pow takes all hits on misses in the cache.

+1
source share

All Articles