I did this very thing myself, and I see several things that could be optimized here.
First, I would remove the conditional enableTexture and instead split your shader into two programs, one for the true state of this and one for false. Legends are very expensive in iOS fragment shaders, especially those that have textures in them.
Secondly, there are nine dependent textures. These are textures in which texture coordinates are computed inside the fragment shader. Dependent texture reads are very large on PowerVR GPUs on iOS devices because they do not allow hardware to optimize texture reads using caching, etc. Since you took a sample from a fixed offset for 8 surrounding pixels and one central, these calculations should be moved to the vertex shader. This also means that these calculations should not be performed for each pixel, only once for each vertex, and then hardware interpolation will process the rest.
Thirdly, for for () loops all processed by the iOS shader compiler have not been processed so far, so I try to avoid those where I can.
As I mentioned, I made convolution shaders like this in my open-source iOS GPUImage framework. For the general convolution filter, I use the following vertex shader:
attribute vec4 position; attribute vec4 inputTextureCoordinate; uniform highp float texelWidth; uniform highp float texelHeight; varying vec2 textureCoordinate; varying vec2 leftTextureCoordinate; varying vec2 rightTextureCoordinate; varying vec2 topTextureCoordinate; varying vec2 topLeftTextureCoordinate; varying vec2 topRightTextureCoordinate; varying vec2 bottomTextureCoordinate; varying vec2 bottomLeftTextureCoordinate; varying vec2 bottomRightTextureCoordinate; void main() { gl_Position = position; vec2 widthStep = vec2(texelWidth, 0.0); vec2 heightStep = vec2(0.0, texelHeight); vec2 widthHeightStep = vec2(texelWidth, texelHeight); vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight); textureCoordinate = inputTextureCoordinate.xy; leftTextureCoordinate = inputTextureCoordinate.xy - widthStep; rightTextureCoordinate = inputTextureCoordinate.xy + widthStep; topTextureCoordinate = inputTextureCoordinate.xy - heightStep; topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep; topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep; bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep; bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep; bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep; }
and the following fragment shader:
precision highp float; uniform sampler2D inputImageTexture; uniform mediump mat3 convolutionMatrix; varying vec2 textureCoordinate; varying vec2 leftTextureCoordinate; varying vec2 rightTextureCoordinate; varying vec2 topTextureCoordinate; varying vec2 topLeftTextureCoordinate; varying vec2 topRightTextureCoordinate; varying vec2 bottomTextureCoordinate; varying vec2 bottomLeftTextureCoordinate; varying vec2 bottomRightTextureCoordinate; void main() { mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate); mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate); mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate); mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate); mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate); mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate); mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate); mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate); mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate); mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2]; resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2]; resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2]; gl_FragColor = resultColor; }
The texelWidth and texelHeight are the inverse of the width and height of the input image, and convolutionMatrix uniform sets the scales for the different patterns in your convolution.
On iPhone 4, it works in 4-8 ms for a 640x480 video frame, which is enough for 60 frames per second with this image size. If you just need to do something like edge detection, you can simplify this by converting the image into brightness into a preliminary pass, and then only a sample from one color channel. This is even faster, at about 2 ms per frame on a single device.