Algorithm Optimization: Logarithm by calculation or search table

I am trying to optimize an audio algorithm that should calculate two algorithms at each step, similar to the following. Now I read that there is no algorithm for logarithms that runs in polynomial time. My question would be if it made sense to do all the logarithms using the lookup table, since they are always the same, although the lack of a large amount of memory access?

for(int f=1;f<11000;f++){ for(int fk=1;fk<1000;fk++){ int k = ceil(12 * log2f((f - 0.5) / fk)); } } 

I am programming in C ++.

Thanks a lot!

+3
source share
4 answers

If you really need

 ceil(12 * log2(/* something */)) 

then there will be a very simple calculation of O (1), which will work using a table of 12 values.

  • Use frexp () to split the value into exponent and mantissa. (This is just a bit of manipulation, so it only takes a few cycles.)

  • Look at the mantissa in the pre-computed list of powers of the 12th root 2.0 (divided by 2), which you can do with up to four comparisons.

  • Result 12 * (indicator - 1) + least root index> = mantissa.

Edited to add:

In fact, even the best solution, because the powers of the 12th root of the two are reasonably distributed evenly. If you divide [0.5, 1.0) (the mantissa range returned by frexp) into 17 evenly distributed subbands, each subband falls into one of two possible return values. Therefore, if you associate each subrange with an index in the root vector, you only need to compare the target (within this range) with one root.

It's too late for me to write coherent English, so you have to settle for the code:

 int Ceil12TimesLog2(double x) { int exp; double mantissa = std::frexp(x, &exp) * 2; int idx = indexes[int((mantissa - 1.0) * 17)]; return 12 * (exp - 1) + (mantissa <= roots[idx] ? idx : idx + 1); } // Here are the reference tables. double roots[12] = { 0x1.0000000000000p+0, 0x1.0f38f92d97963p+0, 0x1.1f59ac3c7d6c0p+0, 0x1.306fe0a31b715p+0, 0x1.428a2f98d728bp+0, 0x1.55b8108f0ec5ep+0, 0x1.6a09e667f3bccp+0, 0x1.7f910d768cfb0p+0, 0x1.965fea53d6e3dp+0, 0x1.ae89f995ad3adp+0, 0x1.c823e074ec129p+0, 0x1.e3437e7101344p+0 }; int indexes[17] = { 0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 9, 9, 10, 10, 11, 11 }; 

I tried this and this led to the fact that the total calculation time was reduced from about 0.5 second (for log2f) to 0.15 second using a loop in OP. The total size of the reference tables is 164 bytes.

+4
source

Writing z for brevity and clarity instead of fk, your inner loop computes ceil(12 * log2f((f - 0.5) / z)) . Now 12 * log2f((f - 0.5) / z) = 12*log2f(f - 0.5) - 12*log2f(z) . Pre-calculate an array with 999 elements, allowing you to calculate all the values ​​only with log calculations of only 11998 instead of 10988001 of them:

 for (int z=1; z<1000; ++z) z12[z] = 12 * log2f(z); for (int f=1; f<11000; ++f) { w = 12 * log2f(f - 0.5); for (int z=1; z<1000; ++z) { int k = ceil(w - z12[z]); } } 
+2
source

I found:

  • Although you have 11Kx1K = 11M combinations (f, fk) ,
  • the number of different values ​​of k is only 294. (you just need 9 bits to represent it).

So, you can build a 2d array to store the display and load it into memory. He looks like

 LOOKUP[f][fk] = v, f in 1..11000, fk in 1..1000 -------------------- v1,1 v1,2 v1,3 ... v1,1000 v2,1 v2,2 v2,3 ... v2,1000 ... ... ... ... v11000,1 , ... v11000,1000 

Since each v is two bytes, you only need 11Kx1Kx2B = 22 MB memory to load this table. It's nothing.

+1
source

If the order of the cycle is reversed, then the number of repeated values ​​for k will be much higher. There are only 12977 cases in the set, where k has "run of one"; The longest run is 618.

This suggests that the reverse approach minimizes the number of calls to log2f - you need to calculate the index n, where k (z, f + n)! = K (z, f) (and loop n instances ...)

In any case, in the final product, I doubt the huge LUT. Even the approach to using tables of size 11000 + 1000 seems to me suboptimal. Instead, I would suggest that only with integers 11000 + 1000 does either linear or max exist. The 2nd power piecewise polynomial approximation of log2, consisting of 8-16 pieces.

If such an approach is found, then polynomial coefficients will be placed in the NEON or XXM registers and allow parallel implementation without access to memory.

0
source

All Articles