Even using NEON vector operations through something like an Accelerate view (which OpenCV doesn't currently use unless I miss something), it will be difficult to outperform shaders when running a simple 3x3 convolution kernel. For example, I can run the Sobel snap detection engine on a 640x480 2.5 ms video frame on an iPhone 4 using shaders, which is more than fast enough for real-time image processing. In addition, OpenGL ES shaders have significant processing advantages for display since you can store everything on the GPU side all the way and avoid costly data transfers for paint events.
If you need an easy way to do this, my open source GPUImage framework has very fast built-in convolutions, such as Sobel edge detection or the image sharpening core, and this allows you to easily create your own 3x3 convolution kernels. It wraps all of OpenGL ES for you, so you donβt need to know anything about it (unless you want to write your own custom effects, but even there you just need to know a little GLSL).
For example, to make only the X-component of the Sobel border detection kernel, you can configure your convolution filter with the following code:
GPUImage3x3ConvolutionFilter *filter = [[GPUImage3x3ConvolutionFilter alloc] init]; [filter setConvolutionKernel:(GPUMatrix3x3){ {-1.0f, 0.0f, 1.0f}, {-2.0f, 0.0f, 2.0f}, {-1.0f, 0.0f, 1.0f} }];
Then you just need to attach it to the camera, image or video input and display, source data or video recorders, and the structure will handle the rest. For best performance, you can write your own custom implementation optimized for the specific kernel that you want to run, for example, I made for GPUImageSobelEdgeDetectionFilter.
source share