You may think that colors are dots in three-dimensional space. The definition of "average" depends on how much space you use. You will get different results if you average RGB, HSL or something more "exotic".
But if you are limited to 2 bits per color, then none of this will really matter much, and what you suggest is fine (except as noted in the comments, you need & , not ^ , to mask) .
(By means of the "middle", I assume that you mean adding bits (for each color) and dividing by 2 (right shift). Note that if you do this repeatedly (for example, with two images, then output the result with the third, then because of this with the fourth), then you will end up with something black, because the right shift is rounded, so you are slightly biased to lower the values).
source share