It's important to understand that most of the password attacks to offline databases where only hashes are stored are extensions of either the brute force attack or the dictionary attack, or a hybrid combination of both. There isn't really anything new outside of those two basic attacks. The combination attack is one such attack where we combine two dictionaries to create a much larger one. This larger one becomes the basis for creating passphrase dictionaries.
Suppose we have two (very small) dictionaries. The first dictionary is a list of sizes while the second dictionary is a list of animals:
tiny small medium large huge
cat dog monkey mouse elephant
In order to combine these two dictionaries, we use a standard cross product between the two dictionaries. This means that there will be a total list of 25 words in our combined dictionary:
tinycat tinydog tinymonkey tinymouse tinyelephant smallcat smalldog smallmonkey smallmouse smallelephant mediumcat meduimdog mediummonkey mediummouse mediumelephant largecat largedog largemonkey largemouse largeelephant hugecat hugedog hugemonkey hugemouse hugeelephant
We have begun to assemble some crude rudimentary phrases. They may not make a lot of sense, but building that dictionary was cheap. I could create a script, similar to to the following, to have it build me that list:
while read A; do
while read B; do
done < /tmp/dict2.txt
done < /tmp/dict1.txt > /tmp/comb-dict.txt
Personal Attack Building
Let's extend the combination attack a bit to target personal data. Suppose dictionary one is a list of male names and dictionary two is a list of four-digit years, starting from year 0 through year 2013. We can then create a combined dictionary that has a list of males and their birth (or death) years. For those passwords that don't pass simple dictionary attacks, maybe these hashes are passwords of when their kid was born, or when their dad died. I've shoulder surfed a number of different people, and watched as they type in their password, and you would be surprised how many passwords meet this exact criteria. Something like "Christian1995".
Now that I have you thinking, it should be obvious now that we can create all sorts of personalized dictionaries. How about a list of male names, a list of female names, a list of surnames, and a list of dates in MMDDYY format? Or how about a list of cities, states, universities, sports teams? Lists of "last 4 digits of your SSN", fully qualified domain names of popular websites, and common patterns on keyboards, such as "qwert", "asdf" and "zxcv", or "741852963" (I've seen this hundreds of times, where people just swipe their finger down their 10-key).
Even though the success rate of getting a personalized password out of one of these dictionaries might be low compared to the common dictionary attack, it's still far more efficient than the brute force, and it's so trivial to make these dictionaries, that I could have one server continuing to create combination dictionaries, while another works on the dictionary lists that are produced.
Extending the Attack
There are always sites that are forcing numbers on you, as well as upper case characters, and non-alphanumeric characters. Believe it or not, but just adding a "0" or a "1" at the beginning and end of these dictionaries can greatly improve my chances of discovering your password. Or, even just adding an exclamation point at the beginning or end might be all that's needed to extend a standard dictionary.
However, we can get a bit more creative with little cost. Rather than just concatenating the words together, we can insert the dash "-" and the underscore "_" between our concatenated words. These are common characters to use when forced to use non-alphanumeric characters in passwords that must be 8 characters or longer. Passwords such as "i-love-you", or "Alice_Bob" are common ways to satisfy the requirement, as well as keeping the password easy to remember. And, of course, we can append "!" or "0" to our password dictionary for the numerical requirement.
According to Wikipedia, the Oxford English Standard Dictionary, which is the Go To for all things English definitions, is the following size:
As of 30 November 2005, the Oxford English Dictionary contained approximately 301,100 main entries.
This is smaller than what most Unix-like operating systems will ship, but it's targeted enough, that most people if relying on dictionary words for their passwords, won't deviate much from. So, creating a full three-word passphrase dictionary would consist of only 90,661,210,000 entries. Assuming the average word length is 9 characters long, plus the newline character, this would be about a 900 GB file. Certainly nothing to laugh at, but with compression, we should be able to get this down do about 250 GB, give or take. Now let's generate two or three of those files with various combinations of word separation or prepending/appending numbers or non-alphanumeric characters, and we have a great attack vector starting point.
It's important to note that these files only need to be generated once, then stored for long-term use. The cost of generating these files initially will be high, of course. For only $11,000 USD, you can purchase a Backblaze storage pod with 180 TB of raw disk to store these behemoth files, and many combinations of them. It may be a large initial expense, but the potential value of a cracked password may be worth it in the end- something many attackers and their parent companies might just consider. Further, you may even find some of these dictionaries online.
We must admit, that if we have gotten to this point in the game, we are getting desperate for the password. This attack is certainly more effective than the brute force, and it won't take us long to exhaust these combined dictionaries. However, the rate at which we pull passwords out of them will certainly be much smaller than our success for finding 70% of the passwords with a plain dictionary. That doesn't mean it's not worth it. Even if we find only 10% of the passwords out of our hashed database, that's great success for the effort put into it.