In my previous post, I didn't pay close attention to the memory requirements of Argon2 when running my benchmarks. Instead, I just ran them until I got tired of waiting around. Further, I really didn't do justice to either scrypt nor Argon2 when showing the parallelization factor. So, as a result, I did a lot more benchmarks on both, so you could more clearly see how the cost affects the time calculating the password hash, and how parallelization can affect that time.
More scrypt Benchmarks and Clarification
So, let's run some more scrypt benchmarks, and take a closer look at what's going on. Recall that the cost parameters for scrypt are:
- N: The CPU and RAM cost
- r: The mixing loop memory block size
- p: The product factor
The recommended cost factors from the Colin Percival are:
- N: 16384 (214)
- r: 8
- p: 1
To calculate the amount of memory used, we use the following equation:
Memory in bytes = (N * r * 128) + (r * p * 128)
So, according to the recommendation:
(214 * 8 * 128) + (8 * 1 * 128) = 16,778,240 bytes
Or, about 16 megabytes. According to Anthony Ferrara, you should be using at least 16 MiB with scrypt. At 4 Mib or less, it is demonstrably weaker than bcrypt. So, when you're looking at the images below, you'll notice that the first memory result is in red text, to show that 8 MiB is the weak point with those cost factors, and the 16 MiB is green, showing cost factors from there up are preferred. As a result, anything between 8 MiB and 16 MiB is default black, in that you should probably be cautious using these cost factors. While you might be demonstrably stronger than bcrypt with these work loads, you're not at the developer's recommendation of 16 MiB.
So, knowing that, let's look at the results. Notice that when we increase our product factor, our execution time increases by that factor. Despite urban legend, this isn't a parallelization constant (well, it is, but it's not breaking up the problem into smaller ones- it's multiplying it). The idea is that once you've reached a reasonable memory cost, you can increase the execution time by creating more mixing loops with the same cost on each loop. So, instead of one mixing loop costing you 16 MiB, you can have two mixing loops costing you 16 MiB. We didn't divide the problem, we multiplied it. As such, our execution time will double from one mixing loop to two mixing loops.
This seems strange, and indeed it is, but you should start with "p=1" at the memory cost you can afford, then increase the proudct factor if you can donate more time to the execution. In other words, the product factor is designed for hardware limited scenarios. In general, you'll want to look at your execution time, and let the memory come in as an after thought (provided it's at or more than 16 MiB).
As in the last post, I have highlighted the green cells with interactive password sessions targeting half-of-a-second and red cells with symmetric key derivation targeting a full five seconds.
More Argon2 Benchmarks
When showing Argon2 in my last post, I did a poor job demonstrating several additional cost factors, and I didn't make it clear how the parallelization played a roll when keeping the same cost factor. As a result, I ran additional benchmarks to make it more clear exactly how CPU and RAM play with each other in your work load.
As a reminder, the cost parameters for Argon2 are as follows:
- n: The number of iterations on the CPU.
- m: The memory work load.
- p: The parallelization factor.
Unlike scrypt, where a single factor manipulates both the CPU and RAM cost, Argon2 separates them out. You deliberately have two knobs to play with- "n" for the CPU and "m" for the RAM. But, one affects the other. If you are targeting a specific execution time, and you increase your memory factor by two, then your CPU work factor will be decreased by half. On the reverse, if you increase your CPU work factor by two, then your memory work factor will be decreased by half. So, affecting one work factor affects the other.
Why is this important? Well, let's consider setting your memory requirement into the gigabyte range. At 1 GiB, for an interactive login session of .5 seconds, you would need at least both cores working on the hash, and you would only get a single iteration. In other words, your work is entirely memory dependent without any significant CPU cost. Maybe you're trying to thwart FPGAs or ASICs with the large memory requirement. However, is it possible that an adversary has 1 GiB of on-die cache? If so, because you're entirely memory-dependent, and no CPU work load, you've been able to cater to the adversary, without significant hardware cost.
On the reverse, you could get CPU heavy with 2048 iterations to hit your .5 seconds execution time, but then you would only be using 256 KiB of memory. You're likely not defeating the FPGAs and ASICs that Argon2 is designed for, as you're almost entirely processor-driven.
So, what to do? It would probably be a good idea to target a balance- require a significant amount of memory, even if it doesn't break the on-die cache barrier, while also requiring a significant amount of processor work. Sticking with Colin's recommendation of 16 MiB (214) of memory and 32 iterations on 4 cores for interactive logins is probably a good balance. Then again, it will all depend on your hardware, what you can expect in customer execution time, load, and other variables.
However, here are additional timings of Argon2, just like with scrypt, so you can see how parallelization affects identical costs. Again, green cells are targeting .5 seconds for interactive logins, and red cells are targeting 5 seconds for symmetric key derivation.
Hopefully, this will help you make a more educated decision about your cost factors when deploying either scrypt or Argon2 as your password hash or symmetric key derivation function. Remember, that you have a few things to consider when picking your costs:
- Make it expensive for both CPU and memory.
- Target a realistic execution time for the situation.
- Guarantee that you can always meet these goals after deployment.
Also, don't deploy Argon2 into production quite yet. Let is bake for a while. If it still stands secure in 2020, then you're probably good to go. Otherwise, deploy scrypt, or the other functions mentioned in the prior post.