I believe that you found a typo in the slides (probably it should be something like while(i + blockDim.x < n) ).
If you look at the source code in the CUDA SDK "reduction" example, the body of the last reduce6 looks like this:
template <class T, unsigned int blockSize, bool nIsPow2> __global__ void reduce6(T *g_idata, T *g_odata, unsigned int n) { T *sdata = SharedMemory<T>();
Note the explicit check inside while , which prevents access outside of g_idata access. Your initial suspicion is true; n is just the size of the g_idata array.
Jared hoberock
source share