(src >> start) & ((1 << len)-1)
is one way to express the extraction of the len bit, starting with start . (In this case, start is the LSB of the range you want. Your function requires an MSB input.) This is an expression from the Wikipedia article about the x86 Extension of the BMI1 instruction set .
Both ways to create a mask look risky if len is the full width of the type. (Angular case of extracting all bits). Shifts over the entire width of the type can either give zero or not change. (IIRC), it actually causes undefined behavior, but in practice this happens. For example, x86 masks the downward shift count to a range of 0-31 (for 32-bit offsets). With 32 bit ints:
If 1 <32 produces 1, then 1-1 = 0, so the result will be zero.
If ~ 0 <32 produces ~ 0 rather than 0, the mask will be zero.
I think from your examples you need bits [iStartPos : iStartPos - iNumOfBites] , where the bits are numbered from scratch.
The main thing that I changed in your function is the name of the function and variables and adding a comment.
bitResult - input to the function; do not use "result" in his name.iStartPos ok but a bit verboseiNumOfBites Computers have bits and bytes. If you are dealing with bites, you need a doctor (or dentist).
In addition, the return type must be unsigned .
// extract bits [msb : msb-len] from input into the low bits of the result unsigned BitExtract(unsigned input, int msb, int len) { return (input >> (msb-len + 1)) & ~(~0 << len); }
If your start position parameter was lsb rather than msb, the expression would be simpler and the code would be smaller and faster (unless that does the extra work for the caller). With LSB as a parameter, BitExtract has 7 instructions, versus 9 if it is MSB (on x86-64, gcc 5.2).
There is also a machine instruction (introduced with Intel Haswell and AMD Piledriver) that performs this operation. You will get slightly smaller and slightly faster code using it. It also uses the LSB convention, len position, not the MSB, so you get a shorter code with LSB as an argument.
Intel CPUs only know the version that will require first loading into the register, so when the values ββare compile-time constants, it does not save much compared to just switching and masking. eg. see this post about using it or pextr for RGB32 β RGB16 . And, of course, it doesn't matter if the MSB or LSB parameter is in the required range if start and len are compile time constants.
Only AMD implements a version of bextr that can have a control mask as an immediate constant, but unfortunately, it seems that gcc 5.2 does not use the immediate version for code that uses the internal one (even with -march=bdver2 (i.e. bulldozer v2 aka piledriver). ( generate bextr with the direct argument yourself in some cases using -march=bdver2 .)
I tested it on godbolt to find out what code you would get with or without bextr.
To perform a security check (msb-len+1)&0xff an additional instruction (a movzx ) is movzx to avoid starting the start byte in the length byte. I left it because it is nonsense to ask for a start bit outside the range 0-31, not to mention the range 0-255. Since this is not a failure, just return some other meaningless result, there are not many.
Anyway, bext saves quite a few instructions (if BMI2 shlx / shrx not available !! -march=native on godbolt is Haswell and therefore also includes BMI2.)