Image of the glider from the Game of Life by John Conway
Skip to content

Let's Talk Password Hashing

TL;DR

In order of preference, hash passwords with:

  1. scrypt
  2. bcrypt
  3. Argon2
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2

Do not hash passwords with:

  1. MD5
  2. md5crypt
  3. UNIX crypt(3)
  4. SHA-1/2/3
  5. Skein
  6. BLAKE2
  7. Any general purpose hashing function.
  8. Any encryption algorithm.
  9. Your own design.
  10. Plaintext

Introduction

Something that comes up frequently in crypto circles, aside from the constant database leaks of accounts and passwords, are hashing passwords. Because of the phrase "hashing passwords", developers who may not know better will think of using generic one-way fixed-length collision-resistant cryptographic hashing functions, such as MD5, SHA-1, SHA-256, or SHA-512, without giving a second thought to the problem. Of course, using these functions is problematic, because they are fast. Turns out, we don't like fast hashing functions, because password crackers do like fast hashing functions. The faster they can do, the sooner they can recover the password.

The Problem

So, instead of using MD5, SHA-1, SHA-256, SHA-512, etc., the cryptographic community got together, and introduced specifically designed password hashing functions, where a custom work factor is included as a cost. Separately, key derivation functions were also designed for creating cryptographic keys, where a custom work factor was also included as a cost here. So, with password-based key derivation functions and specifically designed password hashing functions, we came up with some algorithms that you should be using instead.

The Solution

The most popular algorithms of this type would include, in my personal order of preference from most preferred to least preferred:

  1. scrypt (KDF)
  2. bcrypt
  3. Argon2 (KDF)
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2 (KDF)

The only difference between a KDF and a password hashing function, is that the digest length can be arbitrary with KDFs, whereas password hashing functions will have a fixed length output.

For the longest time, I was not a fan of scrypt as a password hashing function. I think I've changed my mind. Even though scrypt is sensitive to the parameters picked, and it suffers from a time-memory trade-off (TMTO), it's still considered secure, provided you pick sane defaults. I also place bcrypt over Argon2, because Argon2 was just recently announced as the Password Hashing Contest winner. As with all cryptographic primitives, we need to time to analyze, attack, and pick apart the design. If after about 5 years, it still stands strong and secure, then it can be recommended as a solution for production. In the meantime, it's something certainly worth testing, but maybe not for production code. Finally, I prefer sha512crypt and sha256crypt over PBKDF2, mostly because they are included with every GNU/Linux distribution by default, they are based on the strong SHA-2 hashing function, which has had years and mountains of analysis, and unlike PBKDF2, you know exactly which hashing function is used. PBKDF2 could be using SHA-2 functions by default, or it could be using SHA-1. You'll need to check your library to be sure.

Different Strokes for Different Folks

Regardless, all of the above functions include cost parameters for manipulating how long it takes to calculate the hash from a password. It's less important exactly what the cost parameters are, and more important that you are targeting an appropriate time to work through the cost, and create the hash. This means you need to identify your threat model and your adversary.

The two common scenarios you'll find yourself in, are:

  1. Password storage
  2. Encryption keys

For password storage, your threat model is likely the password database getting leaked to the Internet, and password crackers around the world working on the hashes in the database to recover passwords. Thus, your adversary is malware, Anonymous, and password crackers. For encryption keys, your threat model is likely private encrypted keys getting compromised and side-channel attacks. Thus, your adversary is also malware, poor key exchanges, or untrusted networks. Knowing your threat model and your adversary changes how you approach the problem.

With password storage, you may be dealing with an interactive login, such as through a website. As such, you probably want the password hashing time to be quick, while still maintaining a work factor that would discourage large distributed attacks on your leaked database. Possibly, .5 seconds. This means if the database was leaked, the password cracker could do no more than 2 passwords per second. When you compare this to the millions of hashes per second a GPU could execute on Windows NTLM passwords, 2 passwords per second is extremely attractive. For encryption keys, you probably don't need to worry about interactive sessions, so taking 5 seconds to create the key from the password probably isn't a bad thing. So key crackers spending 5 seconds per guess trying to recover the password that created the encrypted private key is really nice.

bcrypt, sha256crypt, sha512crypt, & PBKDF2

So, knowing the work factors, what would it look like for the above algorithms? Below, I look at bcrypt, sha256crypt, sha512crypt, and PBKDF2 with their appropriate cost. I've highlighted the row green where a possible work factor could mean spending 0.5 seconds on hashing the password, and a red row where a possible work factor could mean spending 5 full seconds on creating a password-based encryption key.

Spreadsheet table showing various cost factors for bcrypt, sha256crypt, sha512crypt, and PBKDF2.

Notice that for bcrypt, this means for password hashing, a factor of 13 would provide a cost of about 0.5s to hash the password, where a factor of 16 would get me close to my cost of about 5 seconds for creating a password-based key. For sha256crypt, sha512crypt, and PBKDF2, that seems to be about 640,000 and 5,120,000 iterations respectively.

scrypt

When we move to scrypt, things get a touch more difficult. With bcrypt, sha256crypt, sha512crypt, and PBKDF2, our cost is entirely a CPU load factor. Unfortunately, while possibly problematic for fast GPU clusters, they still fall victim to algorithm-specific FPGAs and ASICs. In order to combat this, we need to also include a memory cost, seeing as though memory on these devices is expensive. However, having both a CPU and a RAM cost, means multiple knobs to tweak. So, Colin Percival, the designer of scrypt, decided to bundle both the CPU and the RAM cost three factors: "N", "r", and "p". The resulting memory usage is calculated as follows:

Memory in bytes = (N * r * 128) + (r * p * 128)

There are a lot of suggestions out there about what's "best practice". It seems that you should at least have the following cost factors with scrypt, which provides a 16 MiB memory load:

  • N: 16384 (214)
  • r: 8
  • p: 1

While you should be aware of the sensitivity of scrypt parameters, provided you are working with at least 16 MiB of RAM, you aren't any worse than other password hashing functions or KDFs. So, in the following tables, I increase the memory cost needed for the hash by tweaking the three parameters.

Update 2016-06-29: I've clarified these parameters in a follow-up post, which you should most definitely read at https://pthree.org/2016/06/29/further-investigation-into-scrypt-and-argon2-password-hashing/.

First table showing the cost factors of scrypt. Second table showing different cost factors of scrypt. Third table showing yet different cost factors of scrypt.

Because I only have access to a single-socket-quad core CPU in this testing machine, I wanted to limit my "p" cost to 1, 2, and 4, which is displayed in those tables. Further, I'm limited on RAM, and don't want to disrupt the rest of the applications and services running on the box, so I've limited my "r" cost to 4, 8, and 16 multiplied by 128 bytes (512 bytes, 1024 bytes, and 2048 bytes).

Interestingly enough, Colin Precival recommends 16 MiB (N=16384 (214), r=8, p=1) for interactive logins and 16 MiB (N=131072 (217), r=1, p=1) for symmetric key derivation. If I were targeting my 0.5s password hashing time, then I could improve that to 256 MiB (N=65536 (216), r=8, p=1), or 2 GiB (N=2097152 (221), r=8, p=1), if targeting just slightly more than 5 seconds for symmetric key derivation.

Argon2

Finally, we look at Argon2. Argon2 comes in two flavors- Argon2d and Argon2i; the first of which is data (d)ependent and the latter is data (i)independent. The former is supposed to be resistant against GPU cracking while the latter is supposed to be resistant against side-channel attacks. In other words, Argon2d would be suitable for password hashing, while Argon2i would be suitable for encryption key derivation. However, regardless of Argon2d or Argon2i, the cost parameters will perform the same, so we'll treat them as a single unit here.

Like scrypt, Argon2 has both a CPU and a RAM cost. However, both are handled separately. The CPU cost is handled through standard iterations, like with bcrypt or PBKDF2, and the RAM cost is handled through specifically ballooning the memory. When I started playing with it, I found that just manipulating the iterations felt very much like bcrypt, but I could affect the overall time it took to calculate the hash by just manipulating the memory also. When combining the two, I found that iterations affected the cost more than the RAM, but both had significant say in the calculation time, as you can see in the tables below. As with scrypt, it also has a parallelization cost, defining the number of threads you want working on the problem:

First table showing the cost factors of Argon2. Second table showing different cost factors of Argon2. Third table showing yet different cost factors of Argon2.

Note the RAM cost between 256 KiB and 16 MiB, in addition to the number of iterations and the processor count cost. As we balloon our RAM, we can bring our iteration cost down. As we require more threads to work on the hash, we can bring that iteration count down even further. Regardless, we are trying to target 0.5s for an interactive password login, and a full 5 seconds for password-based encryption key derivation.

Conclusion

So, what's the point? When hashing passwords, whether to store them on disk, or to create encryption keys, you should be using password-based cryptographic primitives that were specifically designed for this problem. You should not be using general purpose hashing functions of any type, because of their speed. Further, you should not be rolling out your own "key-stretching" algorithm, such as recursively hashing your password digest and additional output.

Just keep in mind- if the algorithm was specifically designed to handle passwords, and the cost is sufficient for your needs, threat model, and adversary, then you're doing just fine. Really, you can't go wrong with any of them. Just avoid any algorithm not specifically designed around passwords. The goal is security through obesity.

Best practice? In order of preference, use:

  1. scrypt
  2. bcrypt
  3. Argon2
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2

Do not use:

  1. MD5
  2. md5crypt
  3. UNIX crypt(3)
  4. SHA-1/2/3
  5. Skein
  6. BLAKE2
  7. Any general purpose hashing function.
  8. Any encryption algorithm.
  9. Your own design.
  10. Plaintext

{ 2 } Comments

  1. Sandra | July 5, 2016 at 5:36 am | Permalink

    If cost is not an issue, what are your thoughts about using two hashing functions. First scrypt, and then either another pw hash func or an enc hash func, if the output from a pw hash func cannot be treated as a pw when hashing?

  2. Aaron Toponce | July 6, 2016 at 9:01 am | Permalink

    Never roll your own. This is the problem Ashley Madison ran into. Although they deployed bcrypt with an acceptable cost factor of 12, they subverted it through encrypted MD5 tokens, which allowed password crackers to quickly tear through the password hashes. If cost isn't a factor, then increase it. Don't roll your own.

Post a Comment

Your email is never published nor shared.