On the Internet, mostly in crypto circles, you'll see something like the following in a comment, forum post, on a mailing list, other otherwise:
Do not use fast hashes to store passwords on disk. Use bcrypt.
In most cases, however, the understanding of why to use bcrypt isn't entirely clear. You'll hear the standard answer "It's slow", without a real understand as to "how slow?" nor as to "why is it slow?". In this post, I want to explain why bcrypt is slow, some misconceptions about using fast hashes, and where the real strength of bcrypt lies (hint- it's not speed). Finally, I'll close with an alternative that many are starting to talk about as a possible replacement to bcrypt.
First, when people are talking about using bcrypt for password hashing, they are referring to the bcrypt cryptographic key derivation function, designed by Niels Provos and David Mazières. Bcrypt is designed to be intentionally slow and expensive. It was designed specifically with password storage in mind. The motivation is clear- if a password database of any kind is leaked to the Internet, it should be cost prohibitive for password crackers to make any sort of progress recovering the unknown passwords from the known hashes.
How does bcrypt work though? What is the algorithm? According to the paper, the core bcrypt function in pseudocode is as follows:
bcrypt(cost, salt, input) state = EksBlowfishSetup(cost, salt, input) ctext = "OrpheanBeholderScryDoubt" //three 64-bit blocks repeat (64) ctext = EncryptECB(state, ctext) //encrypt using standard Blowfish in ECB mode return Concatenate(cost, salt, ctext)
The first function, "EksBlowfishSetup(cost, salt, input)" in the algorithm is defined as follows:
EksBlowfishSetup(cost, salt, key) state = InitState() state = ExpandKey(state, salt, key) repeat (2^cost) // exponential cost by powers of 2 state = ExpandKey(state, 0, key) state = ExpandKey(state, 0, salt) return state
In the "EksBlowfishSetup", you'll notice the "repeat" step uses a binary exponential parameter. As the cost is increased, the time it will take to finish the algorithm will take exponentially longer. Bcrypt was designed with this cost parameter to adjust for Moore's law. As computing strength continues to improve, bcrypt should be flexible in its design to adjust for those advancements. This is why the cost parameter is baked into bcrypt, and why people call it "slow".
Finally, you'll notice the "ExpandKey(state, salt, key)" function in the algorithm. It is defined as follows:
ExpandKey(state, salt, key) for(n = 1..18) P_n key[32(n-1)..32n-1] XOR P_n //treat the key as cyclic ctext = Encrypt(salt[0..63]) P_1 = ctext[0..31] P_2 = ctext[32..63] for(n = 2..9) ctext = Encrypt(ctext XOR salt[64(n-1)..64n-1]) //encrypt using the current key schedule and treat the salt as cyclic P_2n-1) = ctext[0..31] P_2n = ctext[32..63] for(i = 1..4) for(n = 0..127) ctext = Encrypt(ctext XOR salt[64(n-1)..64n-1]) //as above S_i[2n] = ctext[0..31] S_i[2n+1] = ctext[32..63] return state
Because bcrypt was based on Blowfish, the "ExpandKey(state, 0, key)" function used in the "EksBlowfishSetup" function is the same as regular Blowfish key schedule since all XORs with the all-zero salt value are ineffectual. The bcrypt "ExpandKey(state, 0, salt)" function is similar, but uses the salt as a 128-bit key.
Also, to clarify, a 128-bit salt is also baked into the algorithm, as you can see. This is to prevent the building of lookup tables for bcrypt, such as rainbow tables. Salts do not slow down crackers, and it's assumed that salts will be leaked with the database. All salts provide is the protection against using a hash lookup table to find the originating plaintext. Because salts are baked into bcrypt, bcrypt lookup tables will never exist. This forces password crackers to brute force the hash.
Understanding password security
There are a few key security elements related to passwords that you must understand. They are the following:
- The unpredictability measurement, aka "entropy", of the password provided by the user.
- The speed at which brute forcing passwords can commence.
- The cryptographic strength of the function itself.
I ordered these for a specific reason- the most likely "weak link" in the chain of password security is password the user provides. History of leaked password databases have shown us that. If users understood real strength behind passwords, they would understand the basic concepts of entropy, even if they weren't familiar with the term itself. If entropy levels were high in all user's passwords, no matter what, then the success of recovering passwords from hashes via brute force would be ineffective. But, 70-80%, and better, of password databases are recovered, because of this simple concept not getting applied. The speed at which password crackers brute forced their way through the hashes in the database would no longer matter, because no amount of practical computing power would be able to work fast enough within the death of the Universe, to recover the user's password.
Sadly, this just isn't the case. People suck as picking passwords.
So, we need to compensate for users picking bad passwords, and bcrypt makes a great leap in this regard. Because of the cost parameter which is part of the algorithm, we can adjust the cost to make password hashing intentionally slow. And, as computing power increases, the cost parameter can continue to be adjusted to compensate. This is what most people understand, when they claim that "bcrypt is slow".
The argument is that cryptographic hashes are designed to be fast, fast, fast. And they're right. Cryptographic hash functions are designed to provide data integrity regardless of the size of the input. If I have a 4.7 GB CD image, I should be able to calculate its digest in reasonable time, so when I transfer the image to another computer, I can recalculate the digest, and compare that the two digests match, in reasonable time.
This would seem like a Bad Thing for password storage, because passwords are short (much shorter than 4.7 GB at least), so password crackers would be able to guess millions or billions of passwords per second using a fast cryptographic hash. You're right, for the most part. Ars Technica ran a story on password cracking with a 25-GPU cluster. It achieves a speed of 350 billion NTLM passwords per second, which means every conceivable Windows NTLM password can be recovered in less than 6 hours using this behemoth. It can work MD5 at 180 billion per second, or 63 billion per second with SHA-1.
At these speeds, the argument against using fast cryptographic hashes to store passwords is sounding pretty strong. Except, that Ars Technica article, and most bcrypt fanboys seem to overlook one thing- key stretching. Just because cryptographic hashes are fast, doesn't mean we can't intentionally slow them down. This is key stretching.
Key stretching is the idea that you reuse internal state for calculating a new key. For cryptographic hash functions, this is "iterations" or "rotations". The idea is taking the cryptographic digest of the input, and using this digest as the input for another hashing round. In pseudocode, you could think of it like this:
salt = random() // programmatically determine a salt randomly password = input() // get the password from the user key = '' cost = 5000 for ROUND in 1 to cost: do digest = SHA512(salt, password, key) key = digest done
If our 25-GPU cluster could work through 50 billion SHA-512 cryptographic hashes per second, by forcing 5,000 SHA-512 calculations before getting to the desired hash, our 25-GPU cluster can now only work through 10 million SHA-512 hashes per second. As the iterative count is increased, the time it takes to calculate the resulting digest increases. As such, we have created a "sha512crypt" that has a similar cost parameter as bcrypt. Now the question remains- does it hold up?
To see if this "key stretching" idea holds up, I wrote two Python scripts- one using SHA-512, and the other using bcrypt. In both cases, I increase the cost parameter from a reasonable starting point, and increased it well beyond a reasonable expectation.
Here is my Python code for "test-sha512.py":
password = b'password'
cost = 5000
key = ''
m = hashlib.sha512()
for i in xrange(cost):
key = m.digest()
And here is my Python code for "test-bcrypt.py":
cost = 6
password = b'password'
salt = bcrypt.hashpw(password,bcrypt.gensalt(cost))
hash = bcrypt.hashpw(password, salt)
In both cases, I incremented the cost, then timed re-running the script. Of course, Python is an interpreted language, so the absolute times would be much lower if this were implemented in C, or assembly. Further, this was done on my aging T61 laptop. Running it on my 8-core i7 workstation with triple-channel DDR3 would show improved times. It not the times that are critical. What is critical is seeing the exponential back-off as the cost is increased.
Here is a table detailing my findings. Notice that the bcrypt cost increments by a single digit. Because it's binary exponential back-off, the times increase by a power of 2 at each iteration. I also adjusted the sha512crypt cost a little to more closely match the bcrypt timings, even though it's not a strict doubling of each cost value.
In the Python bcrypt implementation, the default cost is "10". For most modern GNU/Linux operating systems, when storing the user password in /etc/shadow with sha512crypt (yes, I didn't come up with the name), the default cost is 5,000 iterations. In both these cases, the cost can be adjusted. In the case of the Python bcrypt module, it's just passing the function with a numerical argument. In the case of GNU/Linux, it's editing PAM by adding "rounds=" to a config file.
As such, sha512crypt can be just as slow as bcrypt. Remember, we are trying to adjust for increased computing power that password crackers will have access to. In both cases, bcrypt and sha512crypt address that requirement.
Bcrypt's additional strength
So, if sha512crypt can operate with a cost parameter similar to bcrypt, and can provide that exponential back-off that we are looking for to slow down password brute force searching, then what's the point of bcrypt? Are there any advantages to running it? It turns out, there is, and I suspect this is a consequence of the design, and not something that was intentionally added.
What we would like is to prevent password crackers from using non-PC hardware on attacking the password database. SHA-2 functions, such as SHA-512, work very well on GPUs. SHA-2 functions work well on specialized hardware such as ASICs and FPGAs. As such, while we could make things slow for CPU or GPU crackers, those password crackers with specialized hardware would still have an upper hand on attacking the password database. Further, by addressing GPU cracking, and making it intentionally slow there, we make like more difficult for CPUs, which means hurting the honest user when trying to login to your web application. In other words, if I adjusted sha512crypt for GPU crackers, such that only 1,000 passwords per second could be achievable on a GPU, that might be a full second, or more, for the user logging into your CPU server. This may or may not be desirable.
So, how does bcrypt attack this problem? Part of the algorithm requires a lookup table stored in RAM that is constantly modified during execution. This turns out to work out very well on a standard PC where the CPU has exclusive access to RAM. This turns out to work out fairly poorly on a GPU, where the cores share the on-board memory, and each core must compete on the bus for access to those registers. As a result, any additional cracking speed is greatly minimized on a GPU when compared to the CPU. In other words, GPU password cracking with bcrypt isn't entirely effective either.
For SHA-2 functions, like SHA-512 however, this is not the case. SHA-2 functions use only 32-bit logic and arithmetic operations, which GPUs excel at. By using a GPU over a CPU for our sha512crypt function, a password cracker can get a couple to many orders of magnitude of additional cracking power.
So, the reason to use bcrypt isn't because "it's slow". The reason to use bcrypt is because "it's ineffective on GPUs".
A better alternative- scrypt
Unfortunately for bcrypt, however, due to its low memory requirement, bcrypt can be implemented in a field programmable gate array (FPGA) or custom ASICs. Understand that bcrypt was designed in 1999, when such specialized hardware had low gate counts, and was few and far bewteen. 15 years later, times have drastically changed. Bitcoin ASICS with SHA-256 FPGAs on board are common place. Hardware AES is common in CPUs and embedded systems. The fact of the matter is, these FPGAs, with their onboard, and fast RAM are well suited to bring bcrypt password cracking well into "fast" territory.
An alternative would be a solution that not only requires address registers to be constantly modified during algorithm execution, but to also exponentially bloat the memory requirement for the increased cost. scrypt addresses this shortcoming in bcrypt. Scrypt is another password key derivation function that was initially designed in 2009 by Colin Percival as part of the Tarsnap online backup service.
Scrypt has all of the advantages that bcrypt provides- baked in salt, exponential cost parameter, and ineffectiveness on GPUs, while also adding an exponential RAM requiremnt per the cost. Because of this RAM requirement, it is no longer cost efficient to build FPGAs with the necessary RAM.
Security, standards, and implementations
Scrypt is only 5 years young. This gives bcrypt a good 10 year head start. In terms of security, this is preferred. We want cryptographic functions that have withstood the test of time with cryptographers constantly attacking and analyzing their functions, primitives, and implementations. The longer it remains "unbroken", the more "secure" we deem the functions to be.
Bcrypt continues to be attacked and analyzed, and is showing no serious sign of weakness, 15 years later. This is good for the security of bcrypt. Scrypt however has seen less scrutiny, mostly due to its young age. However, it has been 5 years, and like bcrypt, no serious signs of weakness have been shown. By comparison, the SHA-2 family of functions was created in 2001, and has been scrutinized much more than bcrypt and scrypt combined, and also is not showing any serious signs of weakness. So, from a security standpoint, the SHA-2, bcrypt, and scrypt functions all seem to be fairly secure.
When looking at governing body standards, NIST has no paper on bcrypt or scrypt. They do recommend using PBKDF2 (another key derivation function (which I haven't explained here, but love)) for password storage, and NIST has standardized on SHA-2 for data integrity. Personally, while I like the ideas of bcrypt and scrypt, I would recommend sticking with the NIST recommendations with high iteration counts, as shown above. Until we see more standards boy interest in bcrypt and scrypt, IMO, you are taking a risk using them for password storage (less so for bcrypt than scrypt at least).
Finally, because of the newness of scrypt, there are less tools for its use in programming languages than bcrypt, and even more so for SHA-2. Further, most programming languages don't include either bcrypt or scrypt in their "standard library" of modules or functions, while SHA-2 is more generally found. And for those implementations, some 3rd party libraries are more trust worthy than others. Because you're dealing with password storage, it's critical you get this right.
While I love the algorithms behind bcrypt and scrypt, I've always advocated for using high iterative counts on the SHA-2 or PBKDF2 functions. Even further is the advocating of teaching people how to understand the concepts behind their password entropy, and improving their own online security. That is the weakest link, and needs the most work, IMO.
So, if you ask me, I'll tell you to use either PBKDF2 or SHA-2 with high iterative counts. Or, if you must absolutely use bcrypt, then I'll recommend that you use scrypt instead.