I don't want to tread on the excellent answers given above, but no one talks about why “r” matters what it has. The low-level answer provided by the Colin Percival Scrypt document is that it refers to a "memory bandwidth product." But what does that mean?
If you use Scrypt correctly, you should have a large block of memory, which is mainly located in main memory. The main memory takes time. When an iteration of a block transition cycle first selects an element from a large block to mix into a working buffer, it has to wait about 100 ns for the first data set. Then he must request another and wait for his arrival.
With r = 1, you will perform 4nr Salsa20 / 8 iterations and 2n readings with latency from the main memory.
This is not good, because it means that an attacker can gain an advantage over you by creating a system with a reduced latency in the main memory.
But if you increase r and proportionally decrease N, you can achieve the same memory requirements and perform the same amount of calculations as before, except that you used some random accesses for sequential access. The serial access extension allows either the CPU or the library to efficiently prefetch the next required data blocks. While initial latency still exists, reduced or eliminated latency for later blocks averages the initial delay to a minimum level. Thus, an attacker will benefit little from improving their memory technologies compared to yours.
However, there is a point of decreasing returns with increasing r, and this is due to the previously mentioned “memory bandwidth product”. What this product indicates is how many bytes of data can transit from main memory to the processor at any given time. This is the same idea as the highway - if after 10 minutes you go from point A to point B (latency), and the road delivers 10 cars per minute to point B from point A (bandwidth), the carriageway between points A and B contains 100 cars. Thus, the optimal value of r refers to the number of 64-byte pieces of data that you can request immediately to hide the delay of this initial request.
This improves the speed of the algorithm, allowing you to either increase N for more memory and calculations, or increase p for more calculations, if necessary.
There are other problems with increasing "r" too much, which I have not seen much:
- An increase in r with decreasing N decreases the number of pseudorandom jumps around the memory. Serial access is easier to optimize and may give an attacker a window. As Colin Percival noted on Twitter, a larger r could allow an attacker to use cheaper and slower storage technology, significantly reducing costs ( https://twitter.com/cperciva/status/661373931870228480 ).
- The size of the working buffer is 1024r bits, so the number of possible end products that will eventually be loaded into PBKDF2 to create the Scrypt output key is 2 ^ 1024r. The number of permutations (possible sequences) of transitions around a large block of memory is 2 ^ NlogN. This means that there are 2 ^ NlogN possible memory cycle products. If 1024r> NlogN, this seems to indicate that the working buffer is in a mixed state. Although I do not know this for sure and would like to see proof or refutation, correlations between the result of the working buffer and the sequence of jumps may be found, which can allow an attacker to reduce his memory requirements without significantly increasing the computational cost. Again, this is a number-based observation - it’s possible that everything is so well mixed in every round that it’s not a problem. r = 8 is significantly lower than this potential threshold for the standard N = 2 ^ 14 - for N = 2 ^ 14 this threshold will be r = 224.
To summarize all the recommendations:
- Choose r to be large enough to average the effects of memory latency on your device and nothing more. Keep in mind that the value recommended by Colin Percival, r = 8, seems to remain valid optimally as a whole for memory technology, and this does not seem to have changed significantly over 8 years; 16 may be a little better.
- Decide how much memory you want to use in the stream, bearing in mind that this also affects the computation time, and set N accordingly.
- Increase p arbitrarily high so that your use can tolerate (note: on my system and using my own implementation, p = 250 (4 threads) with N = 16384 and r = 8 takes ~ 5 seconds) and enable if you have dealing with value added memory.
- When setting up, prefer large size N and memory block size to increased p and calculation time. Scrypt's initial advantage comes from its large memory block size.
Test of my own implementation of Scrypt on Surface Pro 3 with i5-4300 (2 cores, 4 threads) using the constant 128Nr = 16 MB and p = 230; left axis - seconds, lower axis - r value, error bars - +/- 1 standard deviation:
