This is the fastest algorithm for rotating an arbitrary number of bits that I could come up with (I rotate an array of 8 bytes, you can easily convert it to transfer 16):
Use EEPROM to create a mask table for your shifts. The mask only increases the amount of 1 s to the right:
final static byte[] ROTL_MASK = { (byte) 0x00, //shift 0: 00000000 //this one is never used, we don't do shift 0. (byte) 0x01, //shift 1: 00000001 (byte) 0x03, //shift 2: 00000011 (byte) 0x07, //shift 3: 00000111 (byte) 0x0F, //shift 4: 00001111 (byte) 0x1F, //shift 5: 00011111 (byte) 0x3F, //shift 6: 00111111 (byte) 0x7F //shift 7: 01111111 };
Then you first use Util.arrayCopyNonAtomic to quickly replace bytes if the shift is greater than 8:
final static byte BITS = 8; //swap whole bytes: Util.arrayCopyNonAtomic(in, (short) (shift/BITS), out, (short) 0, (short) (8-(shift/BITS))); Util.arrayCopyNonAtomic(in, (short) 0, out, (short) (8-(shift/BITS)), (short) (shift/BITS)); shift %= BITS; //now we need to shift only up to 8 remaining bits if (shift > 0) { //apply masks byte mask = ROTL_MASK[shift]; byte comp = (byte) (8 - shift); //rotate using masks out[8] = in[0]; // out[8] is any auxiliary variable, careful with bounds! out[0] = (byte)((byte)(in[0] << shift) | (byte)((in[1] >> comp) & mask)); out[1] = (byte)((byte)(in[1] << shift) | (byte)((in[2] >> comp) & mask)); out[2] = (byte)((byte)(in[2] << shift) | (byte)((in[3] >> comp) & mask)); out[3] = (byte)((byte)(in[3] << shift) | (byte)((in[4] >> comp) & mask)); out[4] = (byte)((byte)(in[4] << shift) | (byte)((in[5] >> comp) & mask)); out[5] = (byte)((byte)(in[5] << shift) | (byte)((in[6] >> comp) & mask)); out[6] = (byte)((byte)(in[6] << shift) | (byte)((in[7] >> comp) & mask)); out[7] = (byte)((byte)(in[7] << shift) | (byte)((in[8] >> comp) & mask)); }
You can optionally remove the mask variable and use a direct link to the table instead.
Using this rather than naive bitwise rotation implementation turned out to be about 450-500% faster.
Miragepv
source share