Effective operator +

I have to calculate large sums of three-dimensional vectors, and comparing the use of a vector class with the overloaded operator + and operator * compared to summing the individual components shows a performance difference of about three times. I know to suggest that the difference should be related to the construction of objects in overloaded operators.

How can construction be avoided and productivity increased?

I am very puzzled because the following afaik is the standard way to do this, and I expect the compiler to optimize this. In real life, the sums will not be executed in a loop, but in rather large expressions (several tens of MB in the total pre-executable file), summing up different vectors, therefore the + operator is used below.

class Vector { double x,y,z; ... Vector& Vector::operator+=(const Vector &v) { x += vx; y += vy; z += vz; return *this; } Vector Vector::operator+(const Vector &v) { return Vector(*this) += v; // bad: construction and copy(?) } ... } // comparison double xx[N], yy[N], zz[N]; Vector vec[N]; // assume xx, yy, zz and vec are properly initialized Vector sum(0,0,0); for(int i = 0; i < N; ++i) { sum = sum + vec[i]; } // this is a factor 3 faster than the above loop double sumxx = 0; double sumyy = 0; double sumzz = 0; for(int i = 0; i < N; ++i) { sumxx = sumxx + xx[i]; sumyy = sumyy + yy[i]; sumzz = sumzz + zz[i]; } 

Any help is greatly appreciated.

EDIT: Thank you all for your great contribution, I now have the same performance. @Dima and especially @Xeo's answer did the trick. I would like to mark more than one answer "accepted". I will also check out some other suggestions.

+7
source share
10 answers

This article has really good reasoning regarding optimizing operators like + , - , * , / .
Add operator+ as a free function like this in terms of operator+= :

 Vector operator+(Vector lhs, Vector const& rhs){ return lhs += rhs; } 

A notice on how lhs Vector is a copy, not a link. This allows the compiler to do optimizations such as copying. The general rule that the article says is: if you need a copy, do it in the parameters so that the compiler can optimize. The article does not use this example, but operator= for the idiom of copy and swap.

+7
source

Why not replace

 sum = sum + vec[i]; 

from

 sum += vec[i]; 

... which should eliminate two calls to the copy constructor and one call to the assignment operator for each iteration.

But, as always, profile and know where the costs go, not guessing.

+4
source

You might be interested in expression patterns .

+4
source

I implemented most of the optimizations offered here and compared them to the performance of a function call, for example

 Vector::isSumOf( Vector v1, Vector v2) { x = v1.x + v2.x; ... } 

Repeating the execution of the same cycle with several billionth sums of vectors for each method in alternating order did not lead to the promised wins.

In the case of a member function dispatched by bbtrb, this method is 50% longer than calling the isSumOf() function.

The free, non-member operator + (Xeo) method needed to double the time (100% more) of the SumOf() function.

 (gcc 4.6.3 -O3) 

I am aware that this was not representative testing, but since I could not reproduce the performance gain using operators in general. I suggest avoiding them if possible.

+2
source

Usually the + operator looks like this:

 return Vector (x + vx, y + vy, z + vz); 

with an appropriately defined constructor. This allows the compiler to optimize the return values .

But if you are compiling for IA32, then SIMD is worth considering, along with changes in the algorithms, to take advantage of the nature of SIMD. Other processors may have SIMD style instructions.

+1
source

I think the performance difference is due to compiler optimizations. Adding array elements in a loop can be vectorized by the compiler. Modern processors have instructions for adding multiple numbers in a single clock cycle, such as SSE, SSE2, etc. This is similar to the likely explanation for factor 3 differences you see.

In other words, adding the corresponding elements from two arrays in a loop can, as a rule, be faster than adding the corresponding members of the class. If you represent the vector as an array inside your class, rather than x, y and z, you can get the same acceleration for your overloaded operators.

+1
source

Are the implementations of your Vector statement directly in the header file, or are they in a separate cpp file? In the header file, they are usually embedded in an optimized assembly. But if they are compiled into a different translation unit, then they often will not (depending on your build settings). If functions are not built-in, the compiler will not be able to perform the type of optimization you are looking for.

In such cases, pay attention to disassembly. Even if you donโ€™t know much about assembly code, it is usually pretty easy to understand what differs in simple cases like these.

+1
source

Actually, if you look at any real matrix code, the + operator and the + = operator do not do this.

Due to the need for copying, they introduce a pseudo-sentence into the expression and do only the real work when completing the task. Using lazy evaluations like this also allows you to delete NULL operations while evaluating an expression:

 class Matrix; class MatrixOp { public: virtual void DoOperation(Matrix& resultInHere) = 0; }; class Matrix { public: void operator=(MatrixOp* op) { // No copying has been done. // You have built an operation tree. // Now you are goign to evaluate the expression and put the // result into *this op->DoOperation(*this); } MatrixOp* operator+(Matrix& rhs) { return new MatrixOpPlus(*this,rhs);} MatrixOp* operator+(MatrixOp* rhs){ return new MatrixOpPlus(*this,rhs);} // etc }; 

Of course, this is much more complicated than what I described here in this simplified example. But if you use a library designed for matrix operations, then it will already be done for you.

+1
source

Your Vector implementation:

Inject operator+() as follows:

 Vector Vector::operator+(const Vector &v) { return Vector(x + vx, y + vy, z + vz); } 

and add an inline statement to your class definition (this avoids stack hits and popups of return addresses and method arguments for every method call if the compiler finds this useful).

Then add this constructor:

 Vector::Vector(const double &x, const double &y, const double &z) : x(x), y(y), z(z) { } 

which allows you to create a new vector very efficiently (for example, you would do operator+() in your sentence)!

In the code using your Vector:

You did it:

 for(int i = 0; i < N; ++i) { sum = sum + vec[i]; } 

Expand this type of loop! Performing only one operation (since it would be optimized for using SSE2 / 3 extensions or something similar) in a very large cycle is very inefficient. You should do something like this:

 //Unrolled loop: for(int i = 0; i <= N - 10; i += 10) { sum = sum + vec[i]; + vec[i+1]; + vec[i+2]; + vec[i+3]; + vec[i+4]; + vec[i+5]; + vec[i+6]; + vec[i+7]; + vec[i+8]; + vec[i+9]; } //Doing the "rest": for(int i = (N / 10) * 10; i < N; ++i) { sum = sum + vec[i]; } 

(Note that this code has not been verified and may contain a "off-one" -error or so ...)

0
source

Note that you are asking different things because the data does not fit the same in memory. When using the Vector array, the coordinates are alternated with "x1, y1, z1, x2, y2, z2, ...", and with double arrays "x1, x2, ..., y1, y2, ... z1, z2 ... " I believe this may affect compiler optimization or how caching handles it.

0
source

All Articles