Simple GLSL Shader Awfully Slow

I am trying to implement a 2D outline shader in OpenGL ES2.0 for iOS. This is insanely slow. Like at 5fps slowly. I tracked it to texture2D () calls. However, without them, any convolutional shader is canceled. I tried using lowp instead of mediump, but everything is just black, although it gives another 5 frames per second, but it is still unsuitable.

Here is my fragment shader.

varying mediump vec4 colorVarying; varying mediump vec2 texCoord; uniform bool enableTexture; uniform sampler2D texture; uniform mediump float k; void main() { const mediump float step_w = 3.0/128.0; const mediump float step_h = 3.0/128.0; const mediump vec4 b = vec4(0.0, 0.0, 0.0, 1.0); const mediump vec4 one = vec4(1.0, 1.0, 1.0, 1.0); mediump vec2 offset[9]; mediump float kernel[9]; offset[0] = vec2(-step_w, step_h); offset[1] = vec2(-step_w, 0.0); offset[2] = vec2(-step_w, -step_h); offset[3] = vec2(0.0, step_h); offset[4] = vec2(0.0, 0.0); offset[5] = vec2(0.0, -step_h); offset[6] = vec2(step_w, step_h); offset[7] = vec2(step_w, 0.0); offset[8] = vec2(step_w, -step_h); kernel[0] = kernel[2] = kernel[6] = kernel[8] = 1.0/k; kernel[1] = kernel[3] = kernel[5] = kernel[7] = 2.0/k; kernel[4] = -16.0/k; if (enableTexture) { mediump vec4 sum = vec4(0.0); for (int i=0;i<9;i++) { mediump vec4 tmp = texture2D(texture, texCoord + offset[i]); sum += tmp * kernel[i]; } gl_FragColor = (sum * b) + ((one-sum) * texture2D(texture, texCoord)); } else { gl_FragColor = colorVarying; } } 

This is not optimized and not complete, but I need to improve performance before continuing. I tried replacing the texture2D () call in the loop with just solid vec4, and it does not run any problems, despite everything else.

How can I optimize this? I know this is possible because I saw how more of the effects involved in 3D work without problems. I do not understand why this is causing any problems.

+15
filter opengl-es convolution glsl
Sep 18 '12 at 3:27
source share
2 answers

I did this very thing myself, and I see several things that could be optimized here.

First, I would remove the conditional enableTexture and instead split your shader into two programs, one for the true state of this and one for false. Legends are very expensive in iOS fragment shaders, especially those that have textures in them.

Secondly, there are nine dependent textures. These are textures in which texture coordinates are computed inside the fragment shader. Dependent texture reads are very large on PowerVR GPUs on iOS devices because they do not allow hardware to optimize texture reads using caching, etc. Since you took a sample from a fixed offset for 8 surrounding pixels and one central, these calculations should be moved to the vertex shader. This also means that these calculations should not be performed for each pixel, only once for each vertex, and then hardware interpolation will process the rest.

Thirdly, for for () loops all processed by the iOS shader compiler have not been processed so far, so I try to avoid those where I can.

As I mentioned, I made convolution shaders like this in my open-source iOS GPUImage framework. For the general convolution filter, I use the following vertex shader:

  attribute vec4 position; attribute vec4 inputTextureCoordinate; uniform highp float texelWidth; uniform highp float texelHeight; varying vec2 textureCoordinate; varying vec2 leftTextureCoordinate; varying vec2 rightTextureCoordinate; varying vec2 topTextureCoordinate; varying vec2 topLeftTextureCoordinate; varying vec2 topRightTextureCoordinate; varying vec2 bottomTextureCoordinate; varying vec2 bottomLeftTextureCoordinate; varying vec2 bottomRightTextureCoordinate; void main() { gl_Position = position; vec2 widthStep = vec2(texelWidth, 0.0); vec2 heightStep = vec2(0.0, texelHeight); vec2 widthHeightStep = vec2(texelWidth, texelHeight); vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight); textureCoordinate = inputTextureCoordinate.xy; leftTextureCoordinate = inputTextureCoordinate.xy - widthStep; rightTextureCoordinate = inputTextureCoordinate.xy + widthStep; topTextureCoordinate = inputTextureCoordinate.xy - heightStep; topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep; topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep; bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep; bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep; bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep; } 

and the following fragment shader:

  precision highp float; uniform sampler2D inputImageTexture; uniform mediump mat3 convolutionMatrix; varying vec2 textureCoordinate; varying vec2 leftTextureCoordinate; varying vec2 rightTextureCoordinate; varying vec2 topTextureCoordinate; varying vec2 topLeftTextureCoordinate; varying vec2 topRightTextureCoordinate; varying vec2 bottomTextureCoordinate; varying vec2 bottomLeftTextureCoordinate; varying vec2 bottomRightTextureCoordinate; void main() { mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate); mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate); mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate); mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate); mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate); mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate); mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate); mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate); mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate); mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2]; resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2]; resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2]; gl_FragColor = resultColor; } 

The texelWidth and texelHeight are the inverse of the width and height of the input image, and convolutionMatrix uniform sets the scales for the different patterns in your convolution.

On iPhone 4, it works in 4-8 ms for a 640x480 video frame, which is enough for 60 frames per second with this image size. If you just need to do something like edge detection, you can simplify this by converting the image into brightness into a preliminary pass, and then only a sample from one color channel. This is even faster, at about 2 ms per frame on a single device.

+38
Sep 18 '12 at 16:40
source share

The only way to reduce the time spent on this shader is to reduce the number of texture samples. Since your shader displays textures from equally spaced points around the center pixels and linearly combines them, you can reduce the number of samples by using the GL_LINEAR parameter to access the texture.

Basically, instead of sampling each texel, sample between pairs of texels to get a linearly weighted sum.

Call the sample with the offset (-stepw, -steph) and (-stepw, 0) as x0 and x1, respectively. Then your amount

sum = x0*k0 + x1*k1

Now, if you try between these two texels, at a distance k0/(k0+k1) from x0 and therefore k1/(k0+k1) from x1, then the GPU will perform linear weighting during the sampling and give you

y = x1*k1/(k0+k1) + x0*k0/(k1+k0)

Thus, the amount can be calculated as

sum = y*(k0 + k1) from only one sample!

If you repeat this for other neighboring pixels, you end up making 4 texture samples for each adjacent offset and one additional texture for the center pixel.

Link explains it a lot better

+6
Sep 18
source share



All Articles