Why is this OpenGL ES code slowing down on the iPhone?

Question

Why is this OpenGL ES code slowing down on the iPhone?

I changed the GLSprite iPhone SDK example a bit while learning OpenGL ES and it turned out to be pretty slow. Even in the simulator (at worst), so I have to do something wrong, as there are only 400 textured triangles.

const GLfloat spriteVertices[] = { 0.0f, 0.0f, 100.0f, 0.0f, 0.0f, 100.0f, 100.0f, 100.0f }; const GLshort spriteTexcoords[] = { 0,0, 1,0, 0,1, 1,1 }; - (void)setupView { glViewport(0, 0, backingWidth, backingHeight); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glOrthof(0.0f, backingWidth, backingHeight,0.0f, -10.0f, 10.0f); glMatrixMode(GL_MODELVIEW); glClearColor(0.3f, 0.0f, 0.0f, 1.0f); glVertexPointer(2, GL_FLOAT, 0, spriteVertices); glEnableClientState(GL_VERTEX_ARRAY); glTexCoordPointer(2, GL_SHORT, 0, spriteTexcoords); glEnableClientState(GL_TEXTURE_COORD_ARRAY); // sprite data is preloaded. 512x512 rgba8888 glGenTextures(1, &spriteTexture); glBindTexture(GL_TEXTURE_2D, spriteTexture); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, spriteData); free(spriteData); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glEnable(GL_TEXTURE_2D); glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); glEnable(GL_BLEND); } - (void)drawView { .. glClear(GL_COLOR_BUFFER_BIT); glLoadIdentity(); glTranslatef(tx-100, ty-100,10); for (int i=0; i<200; i++) { glTranslatef(1, 1, 0); glDrawArrays(GL_TRIANGLE_STRIP, 0, 4); } .. }

drawView is called every time you touch the screen or your finger moves on the screen, and tx, ty are set to the x, y coordinates where this touch occurred.

I also tried using GLBuffer when the translation was pre-generated and there was only one DrawArray, but gave the same performance (~ 4 FPS).

=== EDIT ===

At the same time, I changed it to use much smaller ATVs (size: 34x20) and much less overlap. ~ 400 quads → 800 triangles spread across the entire screen. The texture size is atlas 512x512 and RGBA_8888, while the texture coordinates are in the float. The code is very ugly in terms of API efficiency: there are two MatrixMode changes along with two loads and two translations, and then drawarrays for a triangular strip (quad). Now it gives ~ 45 FPS.

+6

iphone opengl-es

f3r3nc Jan 16 '09 at 10:46

source share

5 answers

Your texture is 512 * 512 * 4 bytes per pixel. This is a megabyte of data. If you do it 200 times per frame, you generate a bandwidth load of 200 megabytes per frame.

With about 4 frames per second, you consume 800 MB / s of read-only texture. Frame and Zbuffer also need bandwidth. Then there is a processor and do not underestimate the bandwidth requirements of the display.

RAM on embedded systems (such as your iphone) is not as fast as on a desktop PC. What you see here is the effect of fasting in the passband. RAM simply cannot process data faster.

How to cure this problem:

choose the size of the texture. On average you should have 1 texel per pixel. This gives crisp textures. I know - this is not always possible. Use common sense.
use mipmaps. This takes up 33% of the extra space, but allows the graphics chip to choose, if possible, use low-resolution mipmap.
Try reducing texture formats. Perhaps you can use the ARGB4444 format. This will double the rendering speed. Also consider compressed texture formats. Decompression does not lead to a decrease in performance, as in hardware. Infact the opposite is true: due to the smaller memory size, the graphics chip can read texture data faster.

+3

Nils pipenbrinck Jan 16 '09 at 11:04

source share

I guess my first attempt was just a bad (or very good) test. The iPhone has a PowerVR MBX Lite with a tile-based GPU. It divides the screen into smaller tiles and makes them parallel. Now in the first case, the unit may be a little depleted due to a very high overlap. Moreover, they could not be trimmed due to the same distance and therefore all texture coordinates had to be calculated (this could easily be verified by changing the translation in the loop). Also, due to overlapping parallelism, it was impossible to use and some tiles sat doing nothing, and the rest (1/3) worked a lot.

So, I think, although memory bandwidth can be a bottleneck, this is not the case in this example. The problem is more likely related to how the graphical HW and test setup work.

+2

f3r3nc Feb 03 '09 at 10:39

source share

I am not familiar with the iPhone, but if it does not have special equipment for handling floating point numbers (I suspect it is not), then it would be faster to use integers when possible.

I'm currently developing for Android (which also uses OpenGL ES), and for example, my vertex array is int instead of float. I can’t say what the difference is, but I think it’s worth a try.

0

Maciej gryka Feb 03 '09 at 1:10

source share

Apple is very compressed regarding the specific hardware features of the iPhone, which seems very strange to those of us who come from the background of the console. But people were able to determine that the CPU is a 32-bit RISC ARM1176JZF . The good news is that it has a complete floating-point block, so we can continue to write mathematical and physical code as we do on most platforms.

http://gamesfromwithin.com/?p=239

0

user61805 Feb 03 '09 at 6:23

source share

Bruce miller · Accepted Answer · 2010-03-28T07:31:20+0000

(I know that it is very late, but I could not resist. I will post anyway if other people come here to seek advice.)

This has nothing to do with the size of the texture. I do not know why people rated Nils. It seems like it has a fundamental misunderstanding in the OpenGL pipeline. He seems to think that for a given triangle, the entire texture is loaded and displayed on that triangle. The opposite.

Once the triangle has been displayed in the viewport, it is rasterized. For each screen pixel, your triangle is covered, a fragment shader is called. The default fragment shader (OpenGL ES 1.1 that you are using) will look for the texel that most closely matches (GL_NEAREST) with the pixel you are drawing. It may look like 4 texels since you are using a higher quality GL_LINEAR method to average the best texels. However, if the pixel count in your triangle is 100, then most of the texture bytes you will need to read are 4 (searches) * 100 (pixels) * 4 (bytes per color). Far from what Nils said. It's amazing that he can sound like he really knows what he's talking about.

WRT for tiled architecture, this is common in OpenGL embedded devices to preserve link locality. I believe that each tile undergoes each drawing operation, quickly selecting most of them. Then the tile decides what to draw on itself. It will be much slower if you turn on blending, just like you. Since you use large triangles that can overlap and blend with other tiles, the GPU needs to do a lot of extra work. If instead of rendering the square of the example with alpha edges, you should visualize the actual shape (instead of the square image of the shape), then you can turn off blending for this part of the scene, and I'm sure that will speed up the process tremendously.

If you want to try it, just turn off mixing and see how things get faster, even if they don't look right. glDisable (GL_BLEND);

Why is this OpenGL ES code slowing down on the iPhone?

More articles: