C ++ exp LUT (lookup table)

In the simulation related to the C ++ processor that I am writing, I traced the bottleneck through valgrind in my program to cmath::exp . It currently consumes over 40% of my simulation time. I can associate the input with a relatively small domain, but I would like to control the accuracy. I was thinking about going to LUT (lookup table) instead of exp , but I'm not quite sure how to do the โ€œright wayโ€ (tm). I have problems:

  • Large lookup tables do not cache, thereby slowing down access.
  • Best way to convert binary input to integer for access to lookup table
  • The answer to (2) depends on the slope of the input function?
  • I reinvent the wheel - has it been done before?

What is the best way to implement / (include from a library) LUT for exp ?

+7
source share
3 answers

A very similar question was asked earlier. Here is my answer:

This approach was proposed by the author of this question, and I was able to effectively implement it (a small search table and minimal additional work after the search). This is in C #, but translating to C ++ should be simple, and I can help if you run into difficulties.

0
source
  • The optimal size of the lookup table is determined by the trade-off between efficiency, accuracy, and implementation complexity. You will have to profile, we can not tell you the answer (we do not know the answer).

  • Use lrint from <math.h> to convert double to long int . I'm not sure if this is in <cmath> .

  • I'm not sure if the slope has anything to do with converting floating-point numbers to integers. Could you tell us about your concerns?

  • Yes, you are reinventing the wheel. What you do has been done over and over again by anyone who has ever implemented a math library. There is a lot of literature on this subject.

A direct lookup table is far from optimal. You will want to use some kind of polynomial approximation, possibly piecewise, with coefficients selected from the lookup table. For a function as smooth and predictable as exp , the polynomial will give you much higher accuracy with the same amount of computational effort. The necessary polynomials will depend on the trade-off between complexity and accuracy, and whether you want to minimize the expected error, minimize the maximum error, or use some other loss functions.

Limiting the exp domain doesnโ€™t really help much, as it spreads so easily across the entire domain.

 // only works for x in [0, 1] double exp2_limited(double x); // works for all x, but doesn't handle overflow properly double exp2(double x) { return scalbn(exp2_limited(fmod(x, 1.0)), (long) floor(x)); } 

Summary:

  • You must know the required accuracy before you can create such a function.

  • You also need to know the loss function (i.e. select the loss function).

  • You need a profile before you know how fast it is.

  • Use polynomials.

+1
source

I had this problem and tried several stack samples to diagnose it. What this does is indicate where the calls are coming from and what the value of the argument is. I found that when calling exp from certain places, the value of the argument was strongly repeatable.

This suggested a memoization approach that made a huge difference.

He needs a simple wrapper function:

 double exp_cached(double arg, double* old_arg, double* old_result){ if (arg== *old_arg) return *old_result; *old_arg = arg; *old_result = exp(arg); return *old_result; } 

and wherever exp(foo) is called, do:

 static double old_arg = -999999999, old_result; ... ... exp_cached(foo, &old_arg, &old_result)... 

Thus, exp not called if its argument in the place where it is called has the same argument value as before.

+1
source

All Articles