Efficient computation of n picks k in Node.js

Question

Efficient computation of n picks k in Node.js

I have performance sensitive code on a Node.js server that needs to consider combinations. From this SO answer, I used this simple recursive function to calculate n by choosing k:

function choose(n, k) { if (k === 0) return 1; return (n * choose(n-1, k-1)) / k; }

Then, since we all know that iteration is almost always faster than recursion, I wrote this function based on the multiplicative formula :

 function choosei(n,k){ var result = 1; for(var i=1; i <= k; i++){ result *= (n+1-i)/i; } return result; }

I did some tests on my machine. Here are the results of only one of them:

 Recursive x 178,836 ops/sec ±7.03% (60 runs sampled) Iterative x 550,284 ops/sec ±5.10% (51 runs sampled) Fastest is Iterative

The results have consistently shown that the iterative method is really about 3-4 times faster than the recursive method in Node.js (at least on my machine).

This is probably fast enough for my needs, but is there a way to speed it up ? My code should call this function very often, sometimes with fairly large n and k values, so the faster the better.

EDIT

After running a few more tests with le_m and Mike solutions, it turns out that although both are significantly faster than the proposed iterative method, the Mike method using the Pascal triangle looks a little faster than the le_m log table method.

 Recursive x 189,036 ops/sec ±8.83% (58 runs sampled) Iterative x 538,655 ops/sec ±6.08% (51 runs sampled) LogLUT x 14,048,513 ops/sec ±9.03% (50 runs sampled) PascalsLUT x 26,538,429 ops/sec ±5.83% (62 runs sampled) Fastest is PascalsLUT

The logarithmic search method was approximately 26-28 times faster than the iterative method in my tests, and the method using the Pascal triangle was approximately 1.3-1.8 times faster than the logarithmic search method.

Note that I followed le_m's suggestion to pre-compute logarithms with higher precision using mathjs , and then converted them to regular JavaScript Number (which are always 64-bit double-precision floats ).

+6

performance javascript algorithm node.js

Nanowizard Jun 07 '16 at 12:54

source share

2 answers

The following algorithm has complexity at runtime O (1) given the linear search table of logarithmic factors with the complexity of the space O (n) .

Limiting n and k to the range [0, 1000] makes sense, since binomial(1000, 500) already dangerously close to Number.MAX_VALUE . Therefore, we need a lookup table of size 1000.

On a modern JavaScript engine, a compact array of n numbers is n * 8 bytes in size. Thus, a full lookup table will require 8 kilobytes of memory. If we limit our input to the range [0, 100], the table will occupy only 800 bytes.

 var logf = [0, 0, 0.6931471805599453, 1.791759469228055, 3.1780538303479458, 4.787491742782046, 6.579251212010101, 8.525161361065415, 10.60460290274525, 12.801827480081469, 15.104412573075516, 17.502307845873887, 9.987214495661885, 22.552163853123425, 25.19122118273868, 27.89927138384089, 30.671860106080672, 33.50507345013689, 36.39544520803305, 39.339884187199495, 42.335616460753485, 45.38013889847691, 48.47118135183523, 51.60667556776438, 54.78472939811232, 58.00360522298052, 61.261701761002, 64.55753862700634, 67.88974313718154, 71.25703896716801, 74.65823634883016, 78.0922235533153, 81.55795945611504, 85.05446701758152, 88.58082754219768, 92.1361756036871, 95.7196945421432, 99.33061245478743, 102.96819861451381, 106.63176026064346, 110.32063971475739, 114.0342117814617, 117.77188139974507, 121.53308151543864, 125.3172711493569, 129.12393363912722, 132.95257503561632, 136.80272263732635, 140.67392364823425, 144.5657439463449, 148.47776695177302, 152.40959258449735, 156.3608363030788, 160.3311282166309, 164.32011226319517, 168.32744544842765, 172.3527971391628, 176.39584840699735, 180.45629141754378, 184.53382886144948, 188.6281734236716, 192.7390472878449, 196.86618167289, 201.00931639928152, 205.1681994826412, 209.34258675253685, 213.53224149456327, 217.73693411395422, 221.95644181913033, 226.1905483237276, 230.43904356577696, 234.70172344281826, 238.97838956183432, 243.2688490029827, 247.57291409618688, 251.8904022097232, 256.22113555000954, 260.5649409718632, 264.9216497985528, 269.2910976510198, 273.6731242856937, 278.0675734403661, 282.4742926876304, 286.893133295427, 291.3239500942703, 295.76660135076065, 300.22094864701415, 304.6868567656687, 309.1641935801469, 313.65282994987905, 318.1526396202093, 322.66349912672615, 327.1852877037752, 331.7178871969285, 336.26118197919845, 340.815058870799, 345.37940706226686, 349.95411804077025, 354.5390855194408, 359.1342053695754, 363.73937555556347]; function binomial(n, k) { return Math.exp(logf[n] - logf[nk] - logf[k]); } console.log(binomial(5, 3));

Explanation

Starting with the original iterative algorithm, we first replace the product with the sum of the logarithms:

 function binomial(n, k) { var logresult = 0; for (var i = 1; i <= k; i++) { logresult += Math.log(n + 1 - i) - Math.log(i); } return Math.exp(logresult); }

Now our cycle summarizes more than k terms. If we rearrange the sum, we can easily see that we sum over the sequential logarithms of log(1) + log(2) + ... + log(k) , etc., which we can replace with sum_of_logs(k) , which virtually identical to log(n!) . Preliminary calculation of these values and their storage in our logf table logf leads to the above single-line algorithm.

Reference table calculation:

I recommend that you pre-calculate the correspondence table with higher accuracy and convert the resulting elements to 64-bit floating-point numbers. If you don't need this extra precision, or want to run this code on the client side, use this:

 var size = 1000, logf = new Array(size); logf[0] = 0; for (var i = 1; i <= size; ++i) logf[i] = logf[i-1] + Math.log(i);

Numerical Accuracy:

Using log factorials, we avoid the accuracy issues inherent in storing raw factorials.

We could even use the Stirling approximation for log(n!) Instead of the lookup table and still get 12 significant digits for the above calculations both at runtime and in O (1) spatial complexity:

 function logf(n) { return n === 0 ? 0 : (n + .5) * Math.log(n) - n + 0.9189385332046728 + 0.08333333333333333 / n - 0.002777777777777778 * Math.pow(n, -3); } function binomial(n , k) { return Math.exp(logf(n) - logf(n - k) - logf(k)); } console.log(binomial(1000, 500)); // 2.7028824094539536e+299

+8

le_m Jun 09 '16 at 2:42

source share

Mike 'Pomax' Kamermans · Accepted Answer · 2016-06-09T03:07:22+0000

Never calculate factorials; they grow too fast. Instead, calculate the desired result. In this case, you need binomial numbers that have an incredibly simple geometric construction: you can construct the triangular Pascal nickname as you need it and do it with simple arithmetic.

Start with [1] and [1,1]. The next line has [1] at the beginning, [1 + 1] in the middle and [1] at the end: [1,2,1]. Next line: [1] at the beginning, the sum of the first two terms at spot 2, the sum of the following two terms at the fifth point and [1] at the end: [1,3,3,1]. The next line: [1], then 1 + 3 = 4, then 3 + 3 = 6, then 3 + 1 = 4, then [1] at the end, and so on and so forth. As you can see, there are no factorials, logarithms and even multiplications: just super fast addition with pure integers. So simple, you can create a massive lookup table manually.

And you have to.

Never calculate in the code what you can calculate manually, but simply include as constants for immediate search; in this case, writing out a table where the number of elements is approximately equal to n = 20 is absolutely trivial, and then you can just use this as your “initial LUT” and maybe never even access the high lines.

But, if you need them, or even more, then because you cannot create an infinite lookup table, you compromise: you start with a predefined LUT and a function that can “fill in” some term that you need . not yet:

 module.exports = (function() { // step 1: a basic LUT with a few steps of Pascal triangle var binomials = [ [1], [1,1], [1,2,1], [1,3,3,1], [1,4,6,4,1], [1,5,10,10,5,1], [1,6,15,20,15,6,1], [1,7,21,35,35,21,7,1], [1,8,28,56,70,56,28,8,1], ... ]; // step 2: a function that builds out the LUT if it needs to. function binomial(n,k) { while(n >= binomials.length) { let s = binomials.length; let nextRow = []; nextRow[0] = 1; for(let i=1, prev=s-1; i<s; i++) { nextRow[i] = binomials[prev][i-1] + binomials[prev][i]; } nextRow[s] = 1; binomials.push(nextRow); } return binomials[n][k]; } return binomial; }());

Since this is an array of integers, the amount of memory is small. For a lot of work related to binomes, we really don’t even need more than two bytes per integer, which makes this table a brief search: we don’t need more than 2 bytes until you need binomes larger than n = 19, and the full search table is up to n = 19 occupies 380 bytes. This is nothing compared to the rest of your program. Even if we allow 32-bit values, we can get up to n = 35 in just 2380 bytes.

Thus, the search is fast: either O (constant) for previously calculated values, (n * (n + 1)) / 2 steps, if we don’t have LUT at all (in a large record O it will be O (n²), but notation Big O is almost never the right measure of complexity), and somewhere in between is for the terms we need that aren't in the LUT yet. Run some tests for your application that will tell you how big your initial LUT should be, just hard code that (seriously. These are constants, these are exactly the values that need to be hard-coded) and keep the generator around just in case.

However, remember that you are in the land of JavaScript, and you are limited to the numeric type of JavaScript: integers reach only 2 ^ 53 , except that the property is an integer (each n has a separate m=n+1 such that mn=1 ) is not guaranteed. This is unlikely to ever be a problem: once we reach this limit, we are dealing with binomial coefficients that you should never use.

Efficient computation of n picks k in Node.js

EDIT

More articles: