How to vectorize equations?

I am trying to implement the Softmax regression algorithm to solve the K-classifier problem after viewing Professor Andrew Ng's GLM lectures. I thought that I understood everything that he said until it occurred to me to write code to implement the cost function for Softmax regression, which looks like this:

Cost function for Softmax Regression with Weight Decay

The problem I am facing is trying to figure out a way to vectorize it. Again I thought that I understood how to go about vectorizing equations like this because I was able to do this for linear and logistic regression, but looking at this formula, I got stuck.

While I would like to find a vectorized solution for this (I understand that there is already a similar question: Vectorized implementation of Softmax regression ), I am more interested in whether any of you can tell me the way (your way) to methodically convert equations like this into vectorized forms. For example, for those of you who are experts or experienced veterans in ML, when you first read new algorithms in the literature and see them in the same notation for the above equation, how are you going to convert them into vectorized forms?

I understand that perhaps I am moving away from a student who asks Mozart: “How do you play the piano so well?” But my question is simply motivated by the desire to become better on this material and assume that not everyone was born, knowing how to vectorize the equations, and therefore someone there should develop their own system, and if so, please share! Thank you very much in advance!

Greetings

+8
math machine-learning regression glm
source share
2 answers

The help files that come with Octave have the following entry:

19.1 Basic insert:

In a very good first approximation, the goal in vectorization is to write code that avoids loops and uses the operations of whole arrays. As a trivial example, consider

for i = 1:n for j = 1:m c(i,j) = a(i,j) + b(i,j); endfor endfor 

compared to much simpler

  c = a + b; 

It is not just easier to write; also internally much easier to optimize. Octave delegates this operation to the main one, which, among other optimizations, can use a special vector of hardware instructions or, possibly, even complete additions parallel to each other. In the general case, if the code is vectorized, the implementation has more freedom with respect to the assumptions that it can make in order to speed up execution.

This is especially important for loops with "cheap" bodies. Often it is enough to vectorize only the innermost cycle to get an acceptable idea. The general rule is that the "order" of the vectorized body must be greater than or equal to the "order" of the closed loop.

As a less trivial example, instead of

  for i = 1:n-1 a(i) = b(i+1) - b(i); endfor 

records

  a = b(2:n) - b(1:n-1); 

This shows an important general concept of using arrays for instead of iterating over an index variable.  Index expressions. Also use boolean indexing generously. If the condition needs to be checked, this condition can also be written as a logical index. For example, instead of

  for i = 1:n if (a(i) > 5) a(i) -= 20 endif endfor 

records

  a(a>5) -= 20; 

which uses the fact that 'a> 5' creates a boolean index.

Use bitwise vector operators whenever possible to avoid a loop (operators such as '. *' And '. ^').  Arithmetic operations. For simple built-in functions, the “vectorization” function can do this automatically.

- Built-in function: vectorize (FUN) Create a vectorized version of the built-in FUN function, replacing all occurrences of ``, '/', etc. with '.', './', etc.

  This may be useful, for example, when using inline functions with numerical integration or optimization where a vector-valued function is expected. fcn = vectorize (inline ("x^2 - 1")) => fcn = f(x) = x.^2 - 1 quadv (fcn, 0, 3) => 6 See also:  inline,  formula,  argnames. 

Also use translation in these elementary statements as to avoid loops and unnecessary intermediate memory allocations.
 Broadcasting.

Use the built-in and library functions, if possible. Built-in and compiled functions are very fast. Even with the library function of m files, the likelihood that it is already optimized or will be optimized in a future release.

For example, even better than

  a = b(2:n) - b(1:n-1); 

is an

  a = diff (b); 

Most Octave functions are written using the arguments vector and array to the mind. If you find that you are writing a loop with a very simple operation, it is likely that such a function already exists. The following functions are often found in vectorized code:

  • Index manipulation

     * find * sub2ind * ind2sub * sort * unique * lookup * ifelse / merge 
  • Repetitions

     * repmat * repelems 
  • Vectorized Arithmetic

     * sum * prod * cumsum * cumprod * sumsq * diff * dot * cummax * cummin 
  • Large Array Shape

     * reshape * resize * permute * squeeze * deal 

Also look at these pages from the Stanford ML wiki for more detailed instructions with examples.

http://ufldl.stanford.edu/wiki/index.php/Vectorization

http://ufldl.stanford.edu/wiki/index.php/Logistic_Regression_Vectorization_Example

http://ufldl.stanford.edu/wiki/index.php/Neural_Network_Vectorization

+2
source share

This looks rather complicated for vectorization, since you are doing exponentials inside your sums. I assume that you raise e to arbitrary powers. What you can vectorize is the second term of the expression \ sum \ sum theta ^ 2, just make sure you use. * Operator in matlab enter description of the link here to the computer \ theta ^ 2

The same goes for the internal members of the relation of what is included in the logarithm. \ theta 'x ^ (i) is a vectorizable expression.

You can also use the memoization or dynamic programming method and try to reuse the calculation results e ^ \ theta 'x ^ (i).

As a rule, in my experience, the method of vectorization is to transfer work without vectorization. Then try to vectorize the most obvious parts of your calculation. At each step, adjust your function a little and always check if you get the same result as a non-vectorized calculation. In addition, having several test cases is very useful.

+1
source share

All Articles