This article, “A Simple In-Place In-Shuffle Algorithm,” describes how to transpose a 2 * N matrix and gives a hint on how to do this for other cases, so 3 * N is possible. This answer to another question shows that this is really possible.
Or use an algorithm that writes each value to its transposed location, then does the same for the value from that location, and so on, until the loop is connected. The flag processes values in a bit vector. And continue until this vector is 1s.
Both algorithms are not caching. Perhaps some clever use of the PREFETCH statement can improve this.
Edit:
C ++, RGB for single planes, not optimized:
#include <iostream>
My initial intention is to use a WIDTHHEIGHT-sized bit vector that gives the WIDTHHEIGHT / 8 overhead. But you can always sacrifice speed for space. The bit vector can be either WIDTH or HEIGHT, or any desired value of 0. The trick is to maintain a pointer to the cell in front of which all values are transposed. The bit vector for cells starting at this pointer. After this is all 1s, it moves to the next position, then all the steps of the algorithm are performed, except for the actual data movement. And bit-bit is ready to continue transposing. This option is O (N ^ 2) instead of O (N).
Edit2:
PREFITCH optimization is not difficult to implement: just calculate the indexes, call PREFETCH and put the indexes in the queue (ringbuffer), then get the indexes from the queue and move the data.
Edit3:
The idea of another algorithm that has size O (1), time O (N * log (N)), is cached and can be faster than the loop algorithm (for image sizes <1Gb):
- Divide the N * 3 matrix into several 3 * 3 char matrices and transfer them
- Divide the result into 3 * 3 char [3] matrices and transfer them
- Continue while matrix size is smaller than array size
- Now we have up to 3 * 2 * log3 (N) ordered parts. Join them.
- Connect pieces of the same size first. Very simple “cycles” of length 4 can be used.
- Combine fragments of unequal size with reverse (reverse (a), reverse (b))
Evgeny Kluev
source share