What are the optimal twisting factors?

I use Java scrypt library to store passwords. It calls N , r and p values ​​when I encrypt things, which in its documentation are called parameters "processor cost", "memory cost" and "parallelization cost". The only problem is that I don’t know what exactly they mean, or what good values ​​will be for them; maybe they somehow correspond to the -t, -m and -M buttons on the original Colin Percival app ?

Does anyone have any suggestions on this? The library itself lists N = 16384, r = 8 and p = 1, but I do not know if it is strong or weak or what.

+56
java cryptography scrypt
Jun 20 '12 at 18:57
source share
3 answers

At the beginning:

cpercival mentioned something around in his slides since 2009

  • (N = 2, 14, r = 8, p = 1) 100 ms (interactive use) and
  • (N = 2, 20, r = 8, p = 1) 5s (sensitive storage).

These values ​​turned out to be good enough for general use (password-db for some WebApps) even today (2012-09). Of course, features are application dependent.

In addition, these values ​​(mainly) mean:

  • N : total work rate, number of iterations.
  • r : block use for the main hash; finely adjusts the relative cost of memory.
  • p : parallelization coefficient finely adjusts the relative cost of the processor.

r and p are designed to solve a potential problem that processor speed and memory size and bandwidth do not increase as expected. If processor performance increases faster, you increase p ; instead, a breakthrough in memory technology provides an improvement in order; you increase r . And N needs to keep up with the overall doubling of performance over several periods of time.

Important: all values ​​change the result. (Updated :) This is why all scrypt parameters are stored in the resulting string.

+60
Sep 25 '12 at 10:41
source share

The memory required for scrypt to work is calculated as:

128 bytes Ă— N_cost Ă— r_blockSizeFactor

for the parameters you specify ( N=16384 , r=8 , p=1 )

128 Ă— 16384 Ă— 8 = 16,777,216 bytes = 16 MB

You must consider this when choosing options.

Bcrypt is “weaker” than Scrypt (although it is still three orders of magnitude stronger than PBKDF2 ) because it requires only 4 KB of memory. You want to make it harder to parallelize cracking in hardware. For example, if the video card has 1.5 GB of internal memory, and you configured scrypt to use 1 GB of memory:

128 Ă— 16384 Ă— 512 = 1,073,741,824 bytes = 1 GB

then the attacker could not parallelize it on his video card. But then your application / phone / server will need to use 1 GB of RAM each time they calculate the password.

This helps me think of scrypt options as a rectangle. Where:

  • width is the amount of memory required (128 * N * r)
  • height is the number of iterations performed.
  • and the final area is the total hardness

enter image description here

  • cost ( N ) increases both memory usage and iteration .
  • blockSizeFactor ( r ) increases memory usage .

The remaining parallelization ( p ) parameter means that you need to execute all 2, 3 or more times:

enter image description here

If you had more memory than the processor, you could calculate three separate paths in parallel - requiring triple memory:

enter image description here

But in all realities of the real world, it is calculated sequentially, three times requiring calculations:

enter image description here

In fact, no one has ever chosen a factor p other than p=1 .

What are the ideal factors?

  • How much RAM can you save
  • as much time as you can save!

Bonus card

Graphic version above:

enter image description here

Notes:

  • vertical axis - scale
  • The cost factor itself (horizontal) - log (iterations = 2 CostFactor )
  • Highlighted in the curve r=8

And an enlarged version is higher in a reasonable area:

enter image description here

+41
May 18 '15 at 16:44
source share

I don't want to tread on the excellent answers given above, but no one talks about why “r” matters what it has. The low-level answer provided by the Colin Percival Scrypt document is that it refers to a "memory bandwidth product." But what does that mean?

If you use Scrypt correctly, you should have a large block of memory, which is mainly located in main memory. The main memory takes time. When an iteration of a block transition cycle first selects an element from a large block to mix into a working buffer, it has to wait about 100 ns for the first data set. Then he must request another and wait for his arrival.

With r = 1, you will perform 4nr Salsa20 / 8 iterations and 2n readings with latency from the main memory.

This is not good, because it means that an attacker can gain an advantage over you by creating a system with a reduced latency in the main memory.

But if you increase r and proportionally decrease N, you can achieve the same memory requirements and perform the same amount of calculations as before, except that you used some random accesses for sequential access. The serial access extension allows either the CPU or the library to efficiently prefetch the next required data blocks. While initial latency still exists, reduced or eliminated latency for later blocks averages the initial delay to a minimum level. Thus, an attacker will benefit little from improving their memory technologies compared to yours.

However, there is a point of decreasing returns with increasing r, and this is due to the previously mentioned “memory bandwidth product”. What this product indicates is how many bytes of data can transit from main memory to the processor at any given time. This is the same idea as the highway - if after 10 minutes you go from point A to point B (latency), and the road delivers 10 cars per minute to point B from point A (bandwidth), the carriageway between points A and B contains 100 cars. Thus, the optimal value of r refers to the number of 64-byte pieces of data that you can request immediately to hide the delay of this initial request.

This improves the speed of the algorithm, allowing you to either increase N for more memory and calculations, or increase p for more calculations, if necessary.

There are other problems with increasing "r" too much, which I have not seen much:

  • An increase in r with decreasing N decreases the number of pseudorandom jumps around the memory. Serial access is easier to optimize and may give an attacker a window. As Colin Percival noted on Twitter, a larger r could allow an attacker to use cheaper and slower storage technology, significantly reducing costs ( https://twitter.com/cperciva/status/661373931870228480 ).
  • The size of the working buffer is 1024r bits, so the number of possible end products that will eventually be loaded into PBKDF2 to create the Scrypt output key is 2 ^ 1024r. The number of permutations (possible sequences) of transitions around a large block of memory is 2 ^ NlogN. This means that there are 2 ^ NlogN possible memory cycle products. If 1024r> NlogN, this seems to indicate that the working buffer is in a mixed state. Although I do not know this for sure and would like to see proof or refutation, correlations between the result of the working buffer and the sequence of jumps may be found, which can allow an attacker to reduce his memory requirements without significantly increasing the computational cost. Again, this is a number-based observation - it’s possible that everything is so well mixed in every round that it’s not a problem. r = 8 is significantly lower than this potential threshold for the standard N = 2 ^ 14 - for N = 2 ^ 14 this threshold will be r = 224.

To summarize all the recommendations:

  • Choose r to be large enough to average the effects of memory latency on your device and nothing more. Keep in mind that the value recommended by Colin Percival, r = 8, seems to remain valid optimally as a whole for memory technology, and this does not seem to have changed significantly over 8 years; 16 may be a little better.
  • Decide how much memory you want to use in the stream, bearing in mind that this also affects the computation time, and set N accordingly.
  • Increase p arbitrarily high so that your use can tolerate (note: on my system and using my own implementation, p = 250 (4 threads) with N = 16384 and r = 8 takes ~ 5 seconds) and enable if you have dealing with value added memory.
  • When setting up, prefer large size N and memory block size to increased p and calculation time. Scrypt's initial advantage comes from its large memory block size.

Test of my own implementation of Scrypt on Surface Pro 3 with i5-4300 (2 cores, 4 threads) using the constant 128Nr = 16 MB and p = 230; left axis - seconds, lower axis - r value, error bars - +/- 1 standard deviation:

scrypt r vs p values ​​with constant large block size

+10
Oct 23 '15 at 8:31
source share



All Articles