Performance tuning for iPhone openGLES

I have been trying for quite some time to optimize the frame rate of my game without success. I work on the latest iPhone SDK and have an iPhone 3G 3.1.2 device.

I call arround 150 drawcalls, creating a total of about 1900 triangles (all objects are textured using two texture layers and multitexturing. Most textures come from the same texture as the texture stored in the compressed pvrtc 2bpp texture). It displays on my phone at about 30 frames per second, which seems to me too low for only 1900 triangles.

I tried a lot of things to optimize performance, including batch combining objects, converting vertices to CPUs and rendering them in one drawcall. this delays 8 drawcalls (unlike 150 drawcalls), but the performance is about the same (fps drops to arround 26fps)

I use 32-byte vertices stored in a striped array (position 12 bytes, norm 12 bytes, 8 bytes cu). I handle triangles and vertices are ordered in TriStrip order.

I did some profiling, but I really don't know how to interpret it.

Can someone tell me what else I could do to figure out the bottleneck and help me interpret the profiling data?

Thank you so much!

+6
performance iphone opengl-es
source share
4 answers

You may be associated with a processor. Statistics on the use of tiler / renderer in the OpenGL ES tool show that the GPU has a 20-30% duty cycle for rendering at a speed of 20-30 frames per second, which indicates that the GPU can operate at a speed of 60 frames per second with sufficient power . There seem to be a few things you could do to get additional information from tools and sharks on what to do:

By default, Sampler displays each sample from each stream, which means that auxiliary host threads created by system frames dominate your view. To better understand what the CPU actually does, make sure that a detailed view is displayed (the third button in the lower left corner) and change Sample Perspective to Run Sample Sample to exclude samples where the stream is inactive / blocked.

I do not see any samples in the Shark track from your application. This may be due to the fact that your code is fast enough so that it does not appear anywhere in the list of hot functions, but it may also be due to the fact that Shark cannot find characters for your application. You may need to configure the search paths in your settings or manually specify Shark in the binary application. In addition, Shark, by default, displays a list of functions sorted by how much CPU time is spent on them. It may be useful to change the view to something more than a regular call tree so that you can visualize how your overall rendering cycle spends its time. To do this, change the "View" parameter in the lower right corner to "Tree (top to bottom)." (If you don’t see the name or function of your application here, then Shark is definitely missing your characters.)

+1
source share

Unfortunately, I am not very good at OpenGL, but here are some things that stand out for me from three results:

1) From the Sampling tool, can you have some background connection on the Internet?

2) Percentages shown seem low to me (although I don’t know how to improve them).

3) Despite the fact that 10% seems low, this seems like a good point of attack - however, it is almost as suspicious that memcpy has so much time. In addition, ValidateState is a pretty big amount and can hold you back.

The tool is reasonable. I think you are using the right tools to test performance, you just need to think more about what they mean for your application.

0
source share

Without a complete source, It’s hard to say exactly what is happening. Tool tracing shows a 20% Render usage, which is slightly low. This probably means that you are connected to the processor. However, if that were the case, I would expect to see more specific application examples in your first track.

My advice is to minimize your own time class. Something like this (C ++):

#include <sys/time.h> class Timer { public: Timer() { gettimeofday(&m_time, NULL); } void Reset() { gettimeofday(&m_time, NULL); } // returns time since construction or Reset in microseconds. unsigned long GetTime() const { timeval now; gettimeofday(&now, NULL); unsigned long micros = (now.tv_sec-m_time.tv_sec)*1000000+ (now.tv_usec-m_time.tv_usec); return micros; } protected: timeval m_time; }; 

A time when your sections of code know exactly where your time is being spent.

Another quick fix is ​​to disable the Thumb instruction set. This can help your floating point performance of 20% or more due to your executable file size.

0
source share

If you are using glFlush or glFinish, remove all of them.

0
source share

All Articles