Analytical acceleration method exp (A * x) in MATLAB

I need to repeatedly calculate f(x)=exp(A*x) for a small, variable column-vector x and a huge, constant matrix A (many rows, several columns). In other words, x little, but A*x lot. My problem sizes are such that A*x takes about the same amount of time as the exp () part.

In addition to expanding Taylor and pre-calculating the range of exp(y) values exp(y) provided that the range of y values ​​of A*x is known), which I could not significantly speed up (while maintaining accuracy) with what MATLAB does on its own, I think of analytical repetition problems to be able to pre-calculate some values.

For example, I found that exp(A*x)_i = exp(\sum_j A_ij x_j) = \prod_j exp(A_ij x_j) = \prod_j exp(A_ij)^x_j

This would allow me to pre-compute exp(A) once, but the required looping would be as expensive as the original call to exp() , and multiplication (\ prod) would have to be done additionally.

Is there any other idea I could follow, or solutions in MATLAB that I might have missed?

Edit: some more details

A has a size of 26873856 by 81 (yes, it's huge), so x is 81 by 1. nnz(A) / numel(A) is 0.0012 , nnz(A*x) / numel(A*x) is 0.0075 . I already use a sparse matrix to represent A , however exp () of the sparse matrix is ​​not yet sparse. That way I keep x not sparse, and I compute exp(full(A*x)) , which turned out to be as fast / slow as full(exp(A*x)) (I think A*x in any case is not sparse, since x is not sparse.) exp(full(A*sparse(x))) is a way to have sparse A*x , but slower. Even slower options are exp(A*sparse(x)) (with doubled memory effect for a non-sparse type matrix sparse) and full(exp(A*sparse(x)) (which again gives an unsharp result).

 sx = sparse(x); tic, for i = 1 : 10, exp(full(A*x)); end, toc tic, for i = 1 : 10, full(exp(A*x)); end, toc tic, for i = 1 : 10, exp(full(A*sx)); end, toc tic, for i = 1 : 10, exp(A*sx); end, toc tic, for i = 1 : 10, full(exp(A*sx)); end, toc Elapsed time is 1.485935 seconds. Elapsed time is 1.511304 seconds. Elapsed time is 2.060104 seconds. Elapsed time is 3.194711 seconds. Elapsed time is 4.534749 seconds. 

Yes, I am calculating the element-wise exp, I am updating the above equation to reflect this.

Another edit: I tried to be smart, with little success:

 tic, for i = 1 : 10, B = exp(A*x); end, toc tic, for i = 1 : 10, C = 1 + full(spfun(@(x) exp(x) - 1, A * sx)); end, toc tic, for i = 1 : 10, D = 1 + full(spfun(@(x) exp(x) - 1, A * x)); end, toc tic, for i = 1 : 10, E = 1 + full(spfun(@(x) exp(x) - 1, sparse(A * x))); end, toc tic, for i = 1 : 10, F = 1 + spfun(@(x) exp(x) - 1, A * sx); end, toc tic, for i = 1 : 10, G = 1 + spfun(@(x) exp(x) - 1, A * x); end, toc tic, for i = 1 : 10, H = 1 + spfun(@(x) exp(x) - 1, sparse(A * x)); end, toc Elapsed time is 1.490776 seconds. Elapsed time is 2.031305 seconds. Elapsed time is 2.743365 seconds. Elapsed time is 2.818630 seconds. Elapsed time is 2.176082 seconds. Elapsed time is 2.779800 seconds. Elapsed time is 2.900107 seconds. 
+8
performance matlab exp simplify
source share
1 answer

Computers do not really exhibit exhibitors. You might think that it is, but what they do is highly accurate polynomial approximations.

Literature:

The last link looked pretty good. Perhaps this should have been the first.

Since you work with images, you probably have a discrete number of intensity levels (usually 255). This may make it possible to reduce the selection or search depending on the character “A”. One way to verify this is to do something like the following for a fairly representative group of "x" values:

 y=Ax cdfplot(y(:)) 

If you were able to pre-segment your images into “more interesting” and “not so interesting” ones - as if you were considering an X-ray machine capable of trimming everything “outside the human body” and pin them to zero to pre-resolve your data, which can reduce number of unique values. You can review the previous one for each unique "mode" within the data.

My approaches will include:

  • Look at alternative exp (x) formulations, which are lower precision but higher speed.
  • consider looking for tables if you have multiple x levels sufficient
  • consider a combination of interpolation and table search if you have too many levels to search for a table.
  • consider a single search (or alternative wording) based on a segmented mode. If you know that it is a bone and are looking for veins, then perhaps it should get less expensive data processing.

Now I have to ask myself why you will live in a lot of iterations exp (A * x) * x, and I think you can switch between the frequency / wave number range and the time / space region. You can also deal with probabilities using exp (x) as the basis, and make Bayesian fun. I don't know that exp (x) is a good conjugate before, so I'm going to go with Fourier stuff.

Other options: - consider using fft, fft2 or fftn to fit your matrices - they are fast and can do some of what you are looking for.

I am sure that there is a longer domain change on the following:

You may be able to combine the search with a calculation using the woodbury matrix. I would think about it, although some were sure. ( link ) At some point, I knew that everything that mattered (CFD, FEA, FFT) was related to matrix inversion, but since then I forgot specific details.

Now, if you live in MatLab, you can use a “coder” that converts MatLab code to c-code. No matter how interesting the interpreter can be, a good c compiler can be much faster. The mnemonics (hopefully not too ambitious) that I use is shown here: link starting at about 13:49. It is really simple, but it shows the difference between the canonical interpreted language (python) and the compiled version of the same (cython / c).

I am sure that if I had some features, and I was asked, I could more aggressively deal with a more specific relevant answer.

You may not have a good way to do this on regular hardware, you might think of something like GPGPU. CUDA and its peers have massive parallel operations that can significantly accelerate the cost of several video cards. You can have thousands of “overglorated pipelines” running multiple ALUs, and if the job is properly parallelized (as it looks), then it can make LOT faster.

EDIT:

I was thinking about Eureqa . One of the options that I would consider if I had some kind of "big iron" for development, but not for production, would be to use their Eureqa product to get a fast enough, fairly accurate approximation.

If you performed a “quick” decomposition of the singular values ​​of your matrix “A”, you will find that the dominant performance is determined by 81 eigenvectors. I would look at the eigenvalues ​​and see if there are only a few of these 81 eigenvectors that provide most of the information. If so, then you can pin the rest to zero and build a simple transform.

Now, if it were me, I would like to get an “A” from the exhibitor. I wonder if you can look at the matrix of eigenvectors 81x81 and "x" and think a little about linear algebra and about the space in which you project your vectors. Is there a way that you can create a function that looks like this:

f (x) = B2 * exp (B1 * x)

such that

B1 * x

much smaller than your current

Ax.

+2
source share

All Articles