As soon as I noticed that Windows does not support intensive computational flows in a specific kernel - it switches kernels instead. Therefore, I assumed that the work would be completed faster if the stream maintained access to the same data caches. Indeed, I was able to observe a steady improvement in speed by 1% after installing the thread affinity mask on one core (in the ppmd (de) compression stream). But then I tried to create a simple demonstration for this effect and more or less failed - that is, it works as expected on my system (Q9450):
buflog = 21 bufsize = 2097152
(cache flush) first run = 6.938s
time with default affinity = 6.782s
time with first core only = 6.578s
speed gain is 3.01%
but the people I asked for could not reproduce the effect. Any suggestions?
#include <stdio.h>
PS I can post a link to a compiled version if someone needs it.
source share