What makes this feature work much slower?

I am trying to experiment to see if local variables are stored in functions on the stack.

So I wrote a small performance test

function test(fn, times){ var i = times; var t = Date.now() while(i--){ fn() } return Date.now() - t; } ene function straight(){ var a = 1 var b = 2 var c = 3 var d = 4 var e = 5 a = a * 5 b = Math.pow(b, 10) c = Math.pow(c, 11) d = Math.pow(d, 12) e = Math.pow(e, 25) } function inversed(){ var a = 1 var b = 2 var c = 3 var d = 4 var e = 5 e = Math.pow(e, 25) d = Math.pow(d, 12) c = Math.pow(c, 11) b = Math.pow(b, 10) a = a * 5 } 

I expected the feedback function to work much faster. Instead, an amazing result appeared.

Until I test one of the functions, it works 10 times faster than after testing the second.

Example:

 > test(straight, 10000000) 30 > test(straight, 10000000) 32 > test(inversed, 10000000) 390 > test(straight, 10000000) 392 > test(inversed, 10000000) 390 

The same behavior when testing in an alternative order.

 > test(inversed, 10000000) 25 > test(straight, 10000000) 392 > test(inversed, 10000000) 394 

I tested it in both the Chrome browser and Node.js, and I don’t know why this will happen. The effect persists until the current page is refreshed or the Node REPL is restarted.

What could be the source of such significant (12 times worse) performance?

PS. Since it only works in some environments, write the environment that you use to test it.

My were:

OS: Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (Official Build) (64-bit)

/ Edit
In Firefox 39, ~ 5500 ms is required for each test, regardless of order. It seems that this only happens on certain engines.

/ Edit2
The inclusion of a function in a test function makes it always work at the same time.
Is it possible that there is an optimization that builds a function parameter if it is always the same function?

+64
performance javascript v8
Jul 29 '15 at 10:46
source share
3 answers

As soon as you call test with two different fn() callsite functions inside, it becomes megamorphic, and V8 cannot connect to it.

Functional calls (as opposed to om(...) method calls) in V8 are accompanied by one element of the built-in cache, rather than a true polymorphic built-in cache.

Since V8 cannot connect to fn() callsite, it cannot apply various optimizations to your code. If you look at your code in IRHydra (I downloaded compilation artifacts for your convenience), you will notice that the first optimized version of test (when it was specialized for fn = straight ) has a completely empty main loop.

enter image description here

V8 simply nested straight and deleted all the code you hoped to compare with the Dead Code Elimination optimization. In the older version of V8, instead of DCE, V8 will just pull the code out of the loop through LICM - because the code is completely loop invariant.

When straight not embedded, V8 cannot apply these optimizations - hence the performance difference. A newer version of V8 will still apply DCE to straight and inversed , turning them into empty functions

enter image description here

therefore, the difference in performance is not so big (about 2-3x). Older V8s weren’t aggressive enough with DCE, and that would have shown a greater difference between inlined and not-inlined cases, because the maximum performance of the nested case was solely the result of an aggressive loop-invariant code movement (LICM).

In this regard, the note shows why tests should never be written like this: how their results will not be used, because you end up measuring an empty loop.

If you are interested in polymorphism and its consequences in V8, see my post “What about monomorphism” (section “Not all caches are the same“ talk about caches related to function calls ”). I also recommend reading one of my talks about the dangers of micromarketing for example, the latest "Benchmarking JS" chatting with GOTO Chicago 2015 ( video ) - this can help you avoid common mistakes.

+97
Jul 29 '15 at 13:01
source share

You do not understand the stack.

While the “real” stack really only has Push and Pop operations, this really does not apply to the type of stack used to execute it. Besides Push and Pop , you can also access any variable arbitrarily if you have your own address. This means that the order of local users does not matter, even if the compiler does not reorder it for you. In pseudo-assembly, it seems to you that

 var x = 1; var y = 2; x = x + 1; y = y + 1; 

translates to something like

 push 1 ; x push 2 ; y ; get y and save it pop tmp ; get x and put it in the accumulator pop a ; add 1 to the accumulator add a, 1 ; store the accumulator back in x push a ; restore y push tmp ; ... and add 1 to y 

In truth, the real code looks something like this:

 push 1 ; x push 2 ; y add [bp], 1 add [bp+4], 1 

If the stream stack was really a real, strict stack, that would be impossible, really. In this case, the order of operations and local residents will matter much more than now. Instead, by allowing random access to values ​​on the stack, you save a lot of work for both compilers and the processor.

To answer your real question, I suspect that none of the functions actually does anything. You only modify local residents, and your functions do not return anything - it is completely legal for the compiler to completely abandon function bodies and, possibly, even function calls. If this is true, any performance difference that you observe is probably just a measurement artifact or something related to the inherent cost of calling the function / iteration.

+17
Jul 29 '15 at 11:22
source share

Embedding a function in a test function makes it always work at the same time.
Is it possible that there is an optimization that builds a function parameter if it is always the same function?

Yes, this is similar to what you are observing. As already mentioned in @Luaan, the compiler probably dumps the bodies of your straight and inverse functions anyway, because they do not have any side effects, but only works with some local variables.

When you call test(…, 100000) for the first time, the optimizing compiler realizes after some iterations that the called fn() always the same, and makes it inline, avoiding the expensive function call. All he does now is 10 million times, decreasing the variable and testing it against 0 .

But when you call test with a different fn , then it should de-optimize. He can do some other optimizations later, but now, knowing that there are two different functions that can be called, they can no longer integrate them.

Since the only thing you really measure is a function call, which leads to serious differences in your results.

An experiment to check if local variables are stored in functions on the stack

As for your actual question, no, individual variables are not stored on the stack (stacked computer ), but in the register ( register the machine ). It doesn't matter in which order they are declared or used in your function.

However, they are stored on the stack as part of the so-called "stack frames". You will have one frame per function call, saving the context variables of its execution. In your case, the stack might look like this:

 [straight: a, b, c, d, e] [test: fn, times, i, t] … 
+3
Jul 29 '15 at 12:58
source share



All Articles