Before we start delving into the obscure attacks, it probably makes the most sense to get introduced to the most common attacks. The dictionary attack is one such attack. Previously we talked about the brute force attack, which is highly ineffective, and exceptionally slow and expensive to maintain. Here, we'll introduce a much more effective attack that will open up the ability to crack 15 character passwords, and longer, with ease.
The dictionary is another dumb search, except for one thing: an assumption is made that people choose passwords that are based on dictionary words, because adding mutations to the password requires more work, is more difficult to remember, and more difficult to type. Because humans are largely lazy by default, we take the lazy approach to password creation- base it on a dictionary word, and be done with it. After all, no one is really going to hack my account. Right?
A couple years ago, during the height of the Sony PlayStation 3 hacking saga, 77 million PlayStation Network accounts were leaked. These were accounts from all over the globe. Worse, SONY STORED 1 MILLION OF THOSE PASSWORDS IN PLAINTEXT! I'll let that sink in for a minute. These 1 million passwords were leaked to Bittorrent. So, we can do some analysis on the passwords themselves, such as length and difficulty. Troy Hunt did some amazing work on the analysis of the passwords, so this data should be credited to him, but let's look it over:
- 93% of the passwords were between 6 and 10 characters long.
- 50% of the passwords were less than 8 characters.
- Of the following character sets, only 4% of the passwords had 3 or more: numbers, uppercase, lowercase, everything else.
- 45% of the passwords were lowercase only.
- 99% of the passwords did not contain non-alphanumeric characters.
- 65% of the passwords can be found in a dictionary.
- Within Sony, there were two separate accounts: "Beauty" and "Delboca". Where there was a common email address between the accounts, 92% of the accounts used the same password between both.
- Comparing the Sony and Gawker hacks, where there was a common email address, 67% of those accounts used the same password.
- 82% of the passwords would fall victim to a Rainbow Table Attack (something we'll cover later).
With the cryptography circles I run in, these numbers are not very surprising. 65% of the words found in a dictionary is actually a bit low. I've seen the average sit more around 70%, which is troubling. This means that a dictionary attack is extremely effective, no matter how long your password is. If it can be found in a dictionary, you'll fall victim.
Creating a Dictionary
So, what exactly is a dictionary that can be used for this attack? Generally, it's nothing more than a word list, with one word on each line. Standard Unix operating systems have a dictionary installed when a spell checking utility is installed. This can be found in /usr/share/dict/words. For the case of my Debian GNU/Linux system, I have about 100,000 words in the dictionary:
$ wc -l /usr/share/dict/words 99171 /usr/share/dict/words
But, I can install a much larger wordlist:
$ sudo aptitude install wamerican-insane $ sudo select-default-wordlist $ wc -l /usr/share/dict/words 650722
Even though my word list has grown by 6x the previous size, this still pales in comparison to some dictionaries you can download online. The Openwall word list contains 40 million entries, and is over 500 MB in size. It consists of words from over 20+ languages, and includes passwords generated with pwgen(1). It will cost you $27.95 USD for the download, however. There are plenty of other word lists all over the Internet. Spend some time searching, and you can generate a decently sized word list on your own.
This will open the discussion for Rainbow Tables, something we'll discuss later on. However, with a precomputed dictionary attack, I can spend the time hashing all the values in my dictionary, and store them as a key/value pair, where the key is the hash, and the value is the password. This can save considerable time for the password cracking utility when doing the lookup. However, it comes at a cost; I must spend the time precomputing all the values in the dictionary. However, once they are computed, I can use this over and over for my lookups as needed. Disk space is also a concern. For a SHA1 hash, you'll be adding 40 bytes to every entry. For the Openwall word list, this means your dictionary will grow from 500 MB to 2 GB. Not a problem for today's storage, but certainiy something you should be aware of.
Rainbow tables are a version of the precomputed dictionary attack well look at later. The advantage of a rainbow table is savings on disk space for the cost of a bit longer lookup times. We still have precomputed hashes for dictionary words, but they don't occupy as much space.
Thwarting precomputed hashes can be accomplished by salting your password. I discuss password salts on my blog when discussing the shadowed password on Unix systems. Because hashing functions produce the same output for a given input, if that input changes, such as by adding a salt, the output will change. Even though your password was the same, by appending a salt to your password, the computed hash will be completely different. Even if I have your salt in my possession, precomputed dictionary attacks are of no use, because each salt for each account will likely be different, which means I need to precompute different dictionaries with different salts, a very costly task for both CPU and disk space.
However, if I have the salt in my possession, I can still use the salt in conjunction with a standard word list, to compute the desired hash. If I find the hash, I have found the word in your dictionary, even if I needed the salt to help me get there.
Because 65%-70% of people use dictionary words for their passwords, this makes the dictionary attack extremely attractive for attackers who have offline password databases. Even with the Openwall word list of 40 million words, most CPUs can exhaust that word list in seconds, meaning 70% of the passwords will be found in very little time with very little effort. Further, because 67% of the population or more use the same password across multiple accounts, if we know something about the accounts we've just attacked, we can now use that information to login to their bank, Facebook, email, Twitter and other accounts. For the effort, dictionary attacks are very valuable, and a first pick for many attackers.