I am working on an ios project and with Apple LLVM 4.0 with optimization. I implemented two different versions of the function: one in C and one in NEON. I wanted to test their work against each other. My idea was to name them the same number of times, and then search them in Time Profiler to see the relative time spent in each. My code initially looked like
used_value = score_squareNEON(patch, image, current_pos); used_value = score_squareC(patch, image, current_pos);
When I was profiling the time, when the NEON code was not displayed at all. Then i tried
for(int i = 0; i < successively_bigger_numbers; i++) { used_value = score_squareNEON(patch, image, current_pos); { used_value = score_squareC(patch, image, current_pos);
There is still no contribution from the NEON code. Next was
used_value = score_squareNEON(patch, image, current_pos); test = score_squareC(patch, image, current_pos);
where the test has never been read. Nothing. Then
test = score_squareNEON(patch, image, current_pos); test = 0; other_used_variable += test; used_value = score_squareC(patch, image, current_pos);
Still nothing. As a result, he finally performed both functions:
value = score_squareNEON(patch, image, current_pos); test = score_squareC(patch, image, current_pos); ... min = (value+test)/2;
Also very important. These functions were defined in the same file in which I called them. When I tried to move function declarations to another file, both of them are called in each example.
First off, I really respect compilers. Secondly, what exactly do I need to do to make sure that the function is called? It made me start to doubt all the things that I had timed before. What if in a normal sample
timerStart(); functionCall(); timerEnd();
function in the middle is fully optimized? Do I need to start checking this somehow every time, or is there a trick I can use? What are the rules that determine when a compiler can optimize an entire function call?