Effective bicubic filtering code in GLSL?

I am wondering if anyone has complete, working and efficient code for filtering bicubic texture in glsl. There are the following:

http://www.codeproject.com/Articles/236394/Bi-Cubic-and-Bi-Linear-Interpolation-with-GLSL or https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/ GPU / Shaders / Interp / interpolation-bicubic.glsl

but both do 16 texture reads where only 4 are required:

https://groups.google.com/forum/#!topic/comp.graphics.api.opengl/kqrujgJfTxo

However, in the above method there is no function "cubic ()", which does not know what it should do, and also takes an inexplicable parameter "texscale".

There is also a version of NVidia:

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter20.html

but I believe that it uses CUDA, which is typical for NVidia cards. I need glsl.

I could probably migrate the version of nvidia to glsl, but I thought I'd ask first if anyone already had the full glsl bicubic shader duty cycle.

+8
source share
8 answers

I decided to take a minute to dig out my old actions in Perforce and found the missing cubic () function; enjoy!:)

vec4 cubic(float v) { vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v; vec4 s = n * n * n; float x = sx; float y = sy - 4.0 * sx; float z = sz - 4.0 * sy + 6.0 * sx; float w = 6.0 - x - y - z; return vec4(x, y, z, w); } 
+7
source

A missing cubic() function in a JAre answer might look like this:

 vec4 cubic(float x) { float x2 = x * x; float x3 = x2 * x; vec4 w; wx = -x3 + 3*x2 - 3*x + 1; wy = 3*x3 - 6*x2 + 4; wz = -3*x3 + 3*x2 + 3*x + 1; ww = x3; return w / 6.f; } 

It returns four weights for a cubic B-spline.

Everything is explained in NVidia Gems .

+6
source

(EDIT)

cubic spline

  • Texscale is the coefficient of the size of the selection window. You can start with a value of 1.0.

 vec4 filter(sampler2D texture, vec2 texcoord, vec2 texscale) { float fx = fract(texcoord.x); float fy = fract(texcoord.y); texcoord.x -= fx; texcoord.y -= fy; vec4 xcubic = cubic(fx); vec4 ycubic = cubic(fy); vec4 c = vec4(texcoord.x - 0.5, texcoord.x + 1.5, texcoord.y - 0.5, texcoord.y + 1.5); vec4 s = vec4(xcubic.x + xcubic.y, xcubic.z + xcubic.w, ycubic.x + ycubic.y, ycubic.z + ycubic.w); vec4 offset = c + vec4(xcubic.y, xcubic.w, ycubic.y, ycubic.w) / s; vec4 sample0 = texture2D(texture, vec2(offset.x, offset.z) * texscale); vec4 sample1 = texture2D(texture, vec2(offset.y, offset.z) * texscale); vec4 sample2 = texture2D(texture, vec2(offset.x, offset.w) * texscale); vec4 sample3 = texture2D(texture, vec2(offset.y, offset.w) * texscale); float sx = sx / (sx + sy); float sy = sz / (sz + sw); return mix( mix(sample3, sample2, sx), mix(sample1, sample0, sx), sy); } 

A source

+4
source

Wow. I found out the code above (I can't comment on w / reputation <50) since I came up with it in early 2011. The problem I was trying to solve was related to the old IBM T42 (sorry the model number eluded me) and this is the ATI graphics stack. I developed the code on an NV card, and initially I used 16 texture notes. It was pretty slow, but fast enough for my purposes. When someone said that he was not working on his laptop, it became obvious that they did not support enough fixed fragments per fragment. I had to develop a workflow, and I could figure out how to do it, with the number of textures made that would work.

I thought of it this way: okay, so if I process each quad (2x2) with a linear filter, the remaining problem is, can rows and columns share weights? This was the only problem in my mind when I decided to process the code. Of course, they can be divided; the weights are the same for each column and row; excellent!

Now I had four samples. The rest of the problem was how to properly combine the samples. This was the biggest obstacle to overcome. It took about 10 minutes with a pencil and paper. With trembling hands, I dialed the code, and it worked, nice. Then I uploaded the binaries to the guy who promised to test it on T42 (?), And he said that it worked. The end.:)

I can assure that the equations verify and give mathematically identical results for calculating the samples individually. FYI: with a processor, it is faster to perform horizontal and vertical scans separately. With multiple GPU passes, this is not such a great idea, especially when it is possibly not possible at all in a typical use case.

Food for thought: You can use texture search for the cube () function. Which is faster depending on the GPU, but generally speaking, the sampler shines on the ALU side, just doing arithmetic to balance the situation. YMMV.

+4
source

I found this implementation that can be used as a replacement for replacing textures () (from http://www.java-gaming.org/index.php?topic=35123.0 (one typo fixed)):

 // from http://www.java-gaming.org/index.php?topic=35123.0 vec4 cubic(float v){ vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v; vec4 s = n * n * n; float x = sx; float y = sy - 4.0 * sx; float z = sz - 4.0 * sy + 6.0 * sx; float w = 6.0 - x - y - z; return vec4(x, y, z, w) * (1.0/6.0); } vec4 textureBicubic(sampler2D sampler, vec2 texCoords){ vec2 texSize = textureSize(sampler, 0); vec2 invTexSize = 1.0 / texSize; texCoords = texCoords * texSize - 0.5; vec2 fxy = fract(texCoords); texCoords -= fxy; vec4 xcubic = cubic(fxy.x); vec4 ycubic = cubic(fxy.y); vec4 c = texCoords.xxyy + vec2 (-0.5, +1.5).xyxy; vec4 s = vec4(xcubic.xz + xcubic.yw, ycubic.xz + ycubic.yw); vec4 offset = c + vec4 (xcubic.yw, ycubic.yw) / s; offset *= invTexSize.xxyy; vec4 sample0 = texture(sampler, offset.xz); vec4 sample1 = texture(sampler, offset.yz); vec4 sample2 = texture(sampler, offset.xw); vec4 sample3 = texture(sampler, offset.yw); float sx = sx / (sx + sy); float sy = sz / (sz + sw); return mix( mix(sample3, sample2, sx), mix(sample1, sample0, sx) , sy); } 

Example: Nearest, bilinear, bicubic:

enter image description here

ImageData of this image

 {{{0.698039, 0.996078, 0.262745}, {0., 0.266667, 1.}, {0.00392157, 0.25098, 0.996078}, {1., 0.65098, 0.}}, {{0.996078, 0.823529, 0.}, {0.498039, 0., 0.00392157}, {0.831373, 0.00392157, 0.00392157}, {0.956863, 0.972549, 0.00784314}}, {{0.909804, 0.00784314, 0.}, {0.87451, 0.996078, 0.0862745}, {0.196078, 0.992157, 0.760784}, {0.00392157, 0.00392157, 0.498039}}, {{1., 0.878431, 0.}, {0.588235, 0.00392157, 0.00392157}, {0.00392157, 0.0666667, 0.996078}, {0.996078, 0.517647, 0.}}} 

I tried to reproduce this (many other interpolation methods)

enter image description here

but they have a clamped complement while I repeat (wrapping) the borders. Therefore, this is not quite the same.

This bicubic business does not seem to be a suitable interpolation , i.e. it does not take the initial values ​​at the points where the data is defined.

+3
source

For anyone interested in GLSL code to do tri- cubic interpolation, code-casting using cubic interpolation can be found in the examples / glCubicRayCast folder at: http://www.dannyruijters.nl/cubicinterpolation/CI.zip

edit: cubic interpolation code is now available on github: CUDA and WebGL and GLSL sample .

+2
source

I have been using @Maf's cubic spline recipe for over a year, and I recommend it if the cubic B-spline meets your needs.

But I recently realized that for my particular application it is important that the intensities match exactly at the sample points. So I switched to using the Catmull-Rom spline, which uses a slightly different recipe:

 // Catmull-Rom spline actually passes through control points vec4 cubic(float x) // cubic_catmullrom(float x) { const float s = 0.5; // potentially adjustable parameter float x2 = x * x; float x3 = x2 * x; vec4 w; wx = -s*x3 + 2*s*x2 - s*x + 0; wy = (2-s)*x3 + (s-3)*x2 + 1; wz = (s-2)*x3 + (3-2*s)*x2 + s*x + 0; ww = s*x3 - s*x2 + 0; return w; } 

I found these coefficients, as well as the coefficients for a number of other flavors of cubic splines in the lecture notes: http://www.cs.cmu.edu/afs/cs/academic/class/15462-s10/www/lec-slides/lec06 .pdf

+1
source

I think it is possible that a Catmull version could be made with 4 texture searches by (a) placing the input texture as a checkerboard with alternative slots saved as positives and negatives, and (b) a corresponding modification to textureBicubic. This will depend on the fact that the contributions / weights wx / ww will always be negative, and the contributions wy / wz will always be positive. I have not double-checked whether this is true, or exactly how the modified textureBicubic will look.

... I have confirmed that w-contributions satisfy the rules + ve -ve.

0
source

All Articles