Why the webAssembly function is almost 300 times slower than the same JS function

Question

Why the webAssembly function is almost 300 times slower than the same JS function

Find string length 300 * slower

First I read the answer to the question Why is my WebAssembly function slower than the equivalent in JavaScript?

But it shed some light on this problem, and I spent a lot of time, which may very well be yellow on the wall.

I do not use globals, I do not use memory. I have two simple functions that find the length of a line and compare them to the same in plain old Javascript. I have 4 parameters and 3 more local ones, and I am returning a floating or double number.

In Chrome, Javascript is 40 times faster than webAssembly, and in Firefox wasm is almost 300 times slower than Javascript.

test case jsPref.

I added a test case in jsPref WebAssembly V Javascript math

What am I doing wrong?

Or

I missed an obvious mistake, bad practice, or suffer from a stupid encoder.
WebAssembly not for 32bit OS (win 10 i7CPU laptops)
WebAssembly is far from finished technology.

Please select option 1.

I read an example of using web assembly

Reuse existing code with WebAssembly built into the larger JavaScript / HTML application. It can be anything from a simple helper library for offloading computationally oriented tasks.

I was hoping that I could replace some geometry libraries with webAssembly to get extra performance. I was hoping it would be cool, 10 or more times faster. BUT 300 times slower than WTF.

UPADTE

This is not a JS optimization problem.

To ensure that optimization has the smallest possible effect, I checked the following methods to reduce or eliminate any optimization errors.

counter c += length(... to make sure all code is executing.
bigCount += c to ensure that the entire function is executed. Not required
4 lines for each function to reduce tilt. Not required
all values are randomly generated by double numbers
each function call returns its result.
add a slower length calculation to JS using Math.hypot to prove that the code is executing.
added an empty call that returns the first JS parameter to see the overhead

 // setup and associated functions const setOf = (count, callback) => {var a = [],i = 0; while (i < count) { a.push(callback(i ++)) } return a }; const rand = (min = 1, max = min + (min = 0)) => Math.random() * (max - min) + min; const a = setOf(100009,i=>rand(-100000,100000)); var bigCount = 0; function len(x,y,x1,y1){ var nx = x1 - x; var ny = y1 - y; return Math.sqrt(nx * nx + ny * ny); } function lenSlow(x,y,x1,y1){ var nx = x1 - x; var ny = y1 - y; return Math.hypot(nx,ny); } function lenEmpty(x,y,x1,y1){ return x; } // Test functions in same scope as above. None is in global scope // Each function is copied 4 time and tests are performed randomly. // c += length(... to ensure all code is executed. // bigCount += c to ensure whole function is executed. // 4 lines for each function to reduce a inlining skew // all values are randomly generated doubles // each function call returns a different result. tests : [{ func : function (){ var i,c=0,a1,a2,a3,a4; for (i = 0; i < 10000; i += 1) { a1 = a[i]; a2 = a[i+1]; a3 = a[i+2]; a4 = a[i+3]; c += length(a1,a2,a3,a4); c += length(a2,a3,a4,a1); c += length(a3,a4,a1,a2); c += length(a4,a1,a2,a3); } bigCount = (bigCount + c) % 1000; }, name : "length64", },{ func : function (){ var i,c=0,a1,a2,a3,a4; for (i = 0; i < 10000; i += 1) { a1 = a[i]; a2 = a[i+1]; a3 = a[i+2]; a4 = a[i+3]; c += lengthF(a1,a2,a3,a4); c += lengthF(a2,a3,a4,a1); c += lengthF(a3,a4,a1,a2); c += lengthF(a4,a1,a2,a3); } bigCount = (bigCount + c) % 1000; }, name : "length32", },{ func : function (){ var i,c=0,a1,a2,a3,a4; for (i = 0; i < 10000; i += 1) { a1 = a[i]; a2 = a[i+1]; a3 = a[i+2]; a4 = a[i+3]; c += len(a1,a2,a3,a4); c += len(a2,a3,a4,a1); c += len(a3,a4,a1,a2); c += len(a4,a1,a2,a3); } bigCount = (bigCount + c) % 1000; }, name : "length JS", },{ func : function (){ var i,c=0,a1,a2,a3,a4; for (i = 0; i < 10000; i += 1) { a1 = a[i]; a2 = a[i+1]; a3 = a[i+2]; a4 = a[i+3]; c += lenSlow(a1,a2,a3,a4); c += lenSlow(a2,a3,a4,a1); c += lenSlow(a3,a4,a1,a2); c += lenSlow(a4,a1,a2,a3); } bigCount = (bigCount + c) % 1000; }, name : "Length JS Slow", },{ func : function (){ var i,c=0,a1,a2,a3,a4; for (i = 0; i < 10000; i += 1) { a1 = a[i]; a2 = a[i+1]; a3 = a[i+2]; a4 = a[i+3]; c += lenEmpty(a1,a2,a3,a4); c += lenEmpty(a2,a3,a4,a1); c += lenEmpty(a3,a4,a1,a2); c += lenEmpty(a4,a1,a2,a3); } bigCount = (bigCount + c) % 1000; }, name : "Empty", } ],

Update Results.

Since the test has a lot more overhead, the results are closer, but the JS code is still two orders of magnitude faster.

Notice how slow the Math.hypo t function is. If optimization were to be performed, this function would be next to the faster len function.

WebAssembly 13389 μs
Javascript 728 μs

 /* ======================================= Performance test. : WebAssm V Javascript Use strict....... : true Data view........ : false Duplicates....... : 4 Cycles........... : 147 Samples per cycle : 100 Tests per Sample. : undefined --------------------------------------------- Test : 'length64' Mean : 12736µs ±69µs (*) 3013 samples --------------------------------------------- Test : 'length32' Mean : 13389µs ±94µs (*) 2914 samples --------------------------------------------- Test : 'length JS' Mean : 728µs ±6µs (*) 2906 samples --------------------------------------------- Test : 'Length JS Slow' Mean : 23374µs ±191µs (*) 2939 samples << This function use Math.hypot rather than Math.sqrt --------------------------------------------- Test : 'Empty' Mean : 79µs ±2µs (*) 2928 samples -All ---------------------------------------- Mean : 10.097ms Totals time : 148431.200ms 14700 samples (*) Error rate approximation does not represent the variance. */

What is the point of WebAssambly if it does not optimize

End of update

All things related to the problem.

Find the length of the string.

Custom language source

 // declare func the < indicates export name, the param with types and return type func <lengthF(float x, float y, float x1, float y1) float { float nx, ny, dist; // declare locals float is f32 nx = x1 - x; ny = y1 - y; dist = sqrt(ny * ny + nx * nx); return dist; } // and as double func <length(double x, double y, double x1, double y1) double { double nx, ny, dist; nx = x1 - x; ny = y1 - y; dist = sqrt(ny * ny + nx * nx); return dist; }

The code compiles in Wat for reading

 (module (func (export "lengthF") (param f32 f32 f32 f32) (result f32) (local f32 f32 f32) get_local 2 get_local 0 f32.sub set_local 4 get_local 3 get_local 1 f32.sub tee_local 5 get_local 5 f32.mul get_local 4 get_local 4 f32.mul f32.add f32.sqrt ) (func (export "length") (param f64 f64 f64 f64) (result f64) (local f64 f64 f64) get_local 2 get_local 0 f64.sub set_local 4 get_local 3 get_local 1 f64.sub tee_local 5 get_local 5 f64.mul get_local 4 get_local 4 f64.mul f64.add f64.sqrt ) )

Like compiled wasm in a hexadecimal string (Note does not include a name section) and is loaded using WebAssembly.compile. The exported functions are then run for the Javascript len function (in the following snippet)

  // hex of above without the name section const asm = '0061736d0100000001110260047d7d7d7d017d60047c7c7c7c017c0303020001071402076c656e677468460000066c656e67746800010a3b021c01037d2002200093210420032001932205200594200420049492910b1c01037c20022000a1210420032001a122052005a220042004a2a09f0b' const bin = new Uint8Array(asm.length >> 1); for(var i = 0; i < asm.length; i+= 2){ bin[i>>1] = parseInt(asm.substr(i,2),16) } var length,lengthF; WebAssembly.compile(bin).then(module => { const wasmInstance = new WebAssembly.Instance(module, {}); lengthF = wasmInstance.exports.lengthF; length = wasmInstance.exports.length; }); // test values are const (same result if from array or literals) const a1 = rand(-100000,100000); const a2 = rand(-100000,100000); const a3 = rand(-100000,100000); const a4 = rand(-100000,100000); // javascript version of function function len(x,y,x1,y1){ var nx = x1 - x; var ny = y1 - y; return Math.sqrt(nx * nx + ny * ny); }

And the test code is the same for all 3 functions and works in strict mode.

  tests : [{ func : function (){ var i; for (i = 0; i < 100000; i += 1) { length(a1,a2,a3,a4); } }, name : "length64", },{ func : function (){ var i; for (i = 0; i < 100000; i += 1) { lengthF(a1,a2,a3,a4); } }, name : "length32", },{ func : function (){ var i; for (i = 0; i < 100000; i += 1) { len(a1,a2,a3,a4); } }, name : "lengthNative", } ]

FireFox Test Results

  /* ======================================= Performance test. : WebAssm V Javascript Use strict....... : true Data view........ : false Duplicates....... : 4 Cycles........... : 34 Samples per cycle : 100 Tests per Sample. : undefined --------------------------------------------- Test : 'length64' Mean : 26359µs ±128µs (*) 1128 samples --------------------------------------------- Test : 'length32' Mean : 27456µs ±109µs (*) 1144 samples --------------------------------------------- Test : 'lengthNative' Mean : 106µs ±2µs (*) 1128 samples -All ---------------------------------------- Mean : 18.018ms Totals time : 61262.240ms 3400 samples (*) Error rate approximation does not represent the variance. */

+16

performance javascript webassembly

Blindman67 Jan 9 '18 at 17:49

source share

2 answers

The JS engine can apply many dynamic optimizations to this example:

Perform all calculations with integers and convert them only to double for the final call in Math.sqrt.
Enter a len function call.
Take the calculation out of the loop, since it always calculates the same thing.
Recognize that the cycle is left blank and completely eliminate it.
Recognize that the result is never returned from the test function and therefore deletes the entire object of the test function.

All but (4) apply even if you add the result of each call. With (5), the end result is an empty function anyway.

With Wasm, the engine cannot perform most of these steps because it cannot embed the boundaries of the language (at least not one engine does it today, AFAICT). In addition, it was assumed for Wasm that the executing (stand-alone) compiler had already performed the corresponding optimizations, so the Wasm JIT tends to be less aggressive than one for JavaScript, where static optimization is not possible.

+4

Andreas Rossberg Jan 9 '18 at 19:41

source share

Coline · Accepted Answer · 2018-01-10T05:53:23+0000

Andreas describes several good reasons why the JavaScript implementation was initially noticed 300 times faster . However, your code has a number of other problems.

This is a classic "micro test", i.e. the code you are testing is so small that other overhead in the testing cycle is a significant factor. For example, when calling WebAssembly from JavaScript, there is an overhead that will affect your results. What are you trying to measure? raw processing speed? or language overhead?
Your results vary greatly, from x300 to x2, due to small changes in the test code. Again, this is a micro test problem. Others saw the same thing when using this approach to measure performance, for example, this post claims to be faster on x84 , which is clearly wrong!
The current WebAssembly virtual machine is very new and MVP. It will be faster. Your JavaScript virtual machine has had 20 years to reach its current speed. JS border performance & lt; => wasm is being processed and optimized .

For a more accurate answer, see the WebAssembly team’s joint document, which outlines expected performance gains of about 30% at runtime

. Finally, to answer your question:

What is the point in WebAssembly if it does not optimize

I think you have a misconception about what WebAssembly will do for you. Based on the above document, performance optimization at runtime is pretty modest. However, there are a number of performance benefits:

Its compact medium binary format and low-level nature means that the browser can load, parse and compile code much faster than JavaScript. It is assumed that WebAssembly can be compiled faster than your browser can load it.
WebAssembly has predictable runtime performance. In JavaScript, performance tends to increase with each iteration with further optimization. It may also decrease due to se-optimization.

There are also a number of non-performance related benefits.

For a more realistic performance measurement, look at:

Use it in Figma
Results of its use with PDFKit

Both practical, production codebases.

Why the webAssembly function is almost 300 times slower than the same JS function

Find string length 300 * slower

test case jsPref.

What am I doing wrong?

UPADTE

Update Results.

All things related to the problem.

More articles: