Before you begin, imagine the pseudo-code of an algorithm written in their article:
procedure AdaptiveThreshold(in,out,w,h) 1: for i = 0 to w do 2: sum ← 0 3: for j = 0 to h do 4: sum ← sum+in[i, j] 5: if i = 0 then 6: intImg[i, j] ← sum 7: else 8: intImg[i, j] ← intImg[i−1, j] +sum 9: end if 10: end for 11: end for 12: for i = 0 to w do 13: for j = 0 to h do 14: x1 ← i−s/2 {border checking is not shown} 15: x2 ← i+s/2 16: y1 ← j −s/2 17: y2 ← j +s/2 18: count ← (x2−x1)×(y2−y1) 19: sum ← intImg[x2,y2]−intImg[x2,y1−1]−intImg[x1−1,y2] +intImg[x1−1,y1−1] 20: if (in[i, j]×count) ≤ (sum×(100−t)/100) then 21: out[i, j] ← 0 22: else 23: out[i, j] ← 255 24: end if 25: end for 26: end for
intImg is an integral image of the input image to the threshold, suggesting shades of gray.
I have successfully implemented this algorithm, so I’ll talk about my doubts.
What is count ? If this is the number of pixels in the window, why is it 2 * 2 = 4, instead of 3 * 3 = 9 according to the algorithm?
The document has a basic assumption that they are not talking about. The value of s requires it to be odd, and the window must be:
x1 = i - floor(s/2) x2 = i + floor(s/2) y1 = j - floor(s/2) y2 = j + floor(s/2)
count is, of course, the total number of pixels in the window, but you also need to make sure that you do not go beyond. What you have there should have 3 x 3 windows and therefore s = 3 , not 2. Now, if s = 3 , but if we chose i = 0, j = 0 , we will have the values x and y which are negative . We cannot have this, therefore the total number of valid pixels in this window is 3 x 3 centered at i = 0, j = 0 is 4, and therefore count = 4 . For windows within the image, then count will be 9.
Also, why is the original pixel value multiplied by the count? The document says that the value is compared with the average value of the surrounding pixels, why this is not so:
in[i,j] <= (sum/count) * ((100 - t) / 100)
then?
The condition you are looking at is on line 20 of the algorithm:
20: (in[i, j]×count) ≤ (sum×(100−t)/100)
The reason we look at in[i,j]*count is because we assume that in[i,j] is the average intensity in the sxs window. Therefore, if we looked at the sxs window and summed up all the intensities, this is equal to in[i,j] x count . The algorithm is quite inventive. Basically we compare the estimated average intensity ( in[i,j] x count ) in the sxs window, and if this is less than t% average value of the actual sxs in this window ( sum x ((100-t)/100) ), then the output will be set to black. If it is larger than the output, then it will be white. However, you have eloquently stated that it should be like this:
in[i,j] <= (sum/count) * ((100 - t) / 100)
This is essentially the same as line 20, but you divided both sides of the equation into count , so it's still the same expression. I would say that this directly speaks of what I spoke about above. Multiplying by count is certainly confusing, and so what you wrote makes more sense.
So you just see it differently, and it’s completely normal! Therefore, in order to answer your question, what you indicated is certainly correct and equivalent to the expression observed in the real algorithm.
Hope this helps!