Image of the glider from the Game of Life by John Conway
Skip to content

{ entropy“” } Search Results

Random Passphrases Work, Even If They're Built From Known Passwords

Just this morning, security researcher Troy Hunt released a ZIP containing 306 million passwords that he's collected over the years from his ';--have i been pwned? service. As an extension, he created a service to provide either a password or a SHA-1 hash to see if your password has been pwnd.

Screenshot showing the entry text field to test your password from the new password checking service Troy Hunt has provided.

In 2009, the social network RockYou was breached, and 32 million accounts of usernames and passwords was released to the public Internet. No doubt those 32 million passwords are included in Troy Hunt's password dump. What's interesting, and the point of this post, is individually, each password from the RockYou breach will fail.

Collage of six screenshots of individual RockYou passwords failing Troy Hunt's password check. They were: caitlin, peachy, keeley, doreen, ursulet, & juggalo.

However, what would happen if you took 6 random RockYou passwords, and created a passphrase? Below is screenshot demonstrating just that using the above 6 randomly chosen RockYou passwords. Individually, they all fail. Combined, they pass.

Screenshot showing how creating the passphrase 'caitlinpeachykeeleydoreenursuletjuggalo' passes Troy Hunt's check.

Now, to be fair, I'm choosing these passwords from my personalized password generator. The list is the top 7,776 passwords from the 32 million RockYou dump. As such, you could use this list as a Diceware replacement with 5 dice. Regardless, each password is chosen at random from that list, and enough passwords are chosen to reach a 70-bits of entropy target, which happen to be 6 passwords. Mandatory screenshot below:

Screenshot showing my password generator using the RockYou passwords as a source for a strong passphrase. One screenshot is hyphenating the passwords, the other is not.

The point of this post, is that you can actually build a secure password for sites and services using previously breached passwords for your word list source, in this case, RockYou. The only conditions is that you have a word list large enough create a reasonable passphrase with few selections, and that the process picking the passwords for you is cryptographically random.

Electronic Slot Machines and Pseudorandom Number Generators

TL;DR

An Austrian casino company used a predictable pseudorandom number generator, rather than a cryptographically secure one, and people are taking advantage of it, and cashing out big.

The Story

Wired reported on an article about an amazing operation at beating electronic slot machines, by holding your phone to the slot machine screen for a time while playing, leaving the slot machine, then coming back an additional time, and cashing in big.

Unlike most slots cheats, he didn’t appear to tinker with any of the machines he targeted, all of which were older models manufactured by Aristocrat Leisure of Australia. Instead he’d simply play, pushing the buttons on a game like Star Drifter or Pelican Pete while furtively holding his iPhone close to the screen.

He’d walk away after a few minutes, then return a bit later to give the game a second chance. That’s when he’d get lucky. The man would parlay a $20 to $60 investment into as much as $1,300 before cashing out and moving on to another machine, where he’d start the cycle anew.

These machines were made by Austrian company Novomatic, and when Novomatic engineers learned of the problem, after a deep investigation, the best thing they could come up with, was that the random number generator in the machine was predictable:

Novomatic’s engineers could find no evidence that the machines in question had been tampered with, leading them to theorize that the cheaters had figured out how to predict the slots’ behavior. “Through targeted and prolonged observation of the individual game sequences as well as possibly recording individual games, it might be possible to allegedly identify a kind of ‘pattern’ in the game results,” the company admitted in a February 2011 notice to its customers.

The article, focused on a single incident in Missouri, mentions that the state vets the machines before they go into production:

Recognizing those patterns would require remarkable effort. Slot machine outcomes are controlled by programs called pseudorandom number generators that produce baffling results by design. Government regulators, such as the Missouri Gaming Commission, vet the integrity of each algorithm before casinos can deploy it.

On random number generators

I'll leave you to read the rest of the article. Suffice it to say, the Novomatic machines were using a predictable pseudorandom number generator after observing its output for a period of time. This poses some questions that should immediately start popping up in your head:

  1. What is the vetting process by states to verify the quality of the pseudorandom number generators in solt machines?
  2. Who is on that vetting commission? Is it made up of mathematicians and cryptographers? Or just a board of executives and politicians?
  3. Why aren't casino manufacturers using cryptographically secure pseudorandom number generators?

For me, that third item is the most important. No doubt, as the Wired article states, older machines just cannot be fixed. They need to be taken out of production. So long as they occupy casinos, convenience stores, and gas stations, they'll be attacked, and the owner will lose money. So let's talk about random number generators for a second, and see what the gambling industry can do to address this problem.

You can categorize random number generators into four categories:

  1. Nonsecure pseudorandom
  2. Cryptographically secure pseudorandom
  3. Chaotic true random
  4. Quantum true random

What I would be willing to bet, is that most electronic machines out there are of the "nonsecure pseudorandom" type of random number generator, and Novomatic just happened to pick a very poor one. Again, there likely isn't anything they can do about existing machines in production now, but what can they do moving forward? They should start using cryptographically secure pseudorandom number generators (CSPRNGs).

In reality, this is trivial. There are plenty of CSPRNGs to choose from. CSPRNGs can be broken down further into three subcategories:

  1. Designs based on cryptographic primitives.
  2. Number theoretic designs.
  3. Special-purpose designs.

Let's look at each of these in turn.

Designs based on cryptographic primitives.

These are generators that use things like block ciphers, stream ciphers, or hashing functions for the generator. There are some NIST and FIPS standardized designs:

  • NIST SP 800-90A rev. 1 (PDF): CTR_DRBG (a block cipher, such as AES in CTR mode), HMAC_DRBG (hash-based message authentication code), and Hash_DRBG (based on cryptographically secure hashing functions such as SHA-256).
  • ANSI X9.31 Appendix A.2.4: This is based on AES, and obsoletes ANSI X9.17 Appendix C, which is based on 3DES. It requires a high-precision clock to initially seed the generator. It was eventually obsoleted by ANSI X9.62-1998 Annex A.4.
  • ANSI X9.62-2005 Annex D: This standard is defines an HMAC_DRBG, similar to NIST SP 800-90A, using an HMAC as the cryptographic primitive. It obsoletes ANSI X9.62-1998 Annex A.4, and also requires a high-precision clock to initially seed the generator.

It's important that these designs are backtracking resistant, meaning that if you know the current state of the RNG, you cannot construct all previous states of the generator. The above standards are backtracking resistant.

Number theoretic designs

There are really only two current designs, that are based on either the factoring problem or the discrete logarithm problem:

  • Blum-Blum-Shub: This is generator based on the fact that it is difficult to compute the prime factors of very large composites (on the order of 200 or more digits in length). Due to the size of the prime factors, this is a very slow algorithm, and not practical generally.
  • Blum-Micali: This is a generator based on the discrete logarithm problem, when given two known integers "b" and "g", it is difficult to find "k" where "b^k = g". Like Blum-Blum-Shub, this generator is also very slow, and not practical generally.

Special-purpose designs

Thankfully, there are a lot of special purpose designs designed by cryptographers that are either stream ciphers that can be trivially ported to a CSPRNG, or deliberately designed CSPRNGs:

  • Yarrow: Created by cryptographer Bruce Schneier (deprecated by Fortuna)
  • Fortuna: Also created by Bruce Schneier, and obsoletes Yarrow.
  • ISAAC: Designed to address the problems in RC4.
  • ChaCha20: Designed by cryptographer Daniel Bernstein, our crypto Lord and Savior.
  • HC-256: The 256-bit alternative to HC-128, which is part of the eSTREAM portfolio.
  • eSTREAM portfolio: (7 algorithms- 3 hardware, 4 software)
  • Random123 suite: Contains four highly parallelizable counter-based algorithms, only two of which are cryptographically secure.

The solution for slot machines

So now what? Slot machine manufacturers should be using cryptographically secure algorithms in their machines, full stop. To be cryptographically secure, the generator:

  • Must past the next-bit test (you cannot predict the next bit any better than 50% probability).
  • Must withstand a state compromise (you cannot reconstruct past states of the generator based on the current state).

If those two properties are met in the generator, then the output will be indistinguishable from true random noise, and the generator will be unbiased, not allowing an adversary, such as someone with a cellphone monitoring the slot machine, to get the upperhand on the slot machine, and prematurely cash out.

However, the question should then be raised- "How do you properly seed the CSPRNG, so it starts in an unpredictable state, before release?" Easy, you have two options here:

  • Seed the CSPRNG with a hardware true RNG (HWRNG), such as a USB HWRNG, or....
  • Build the machine such that it collects environmental noise as entropy

The first point is much easier to achieve than the second. Slot machines likely don't have a lot of interrupts built into the system-on-a-chip (SoC). So aside from a microphone, video camera, or antenna recording external events, you're going to be hard-pressed to get any sort of high-quality entropy into the generator. USB TRNGs are available all over the web, and cheap. When the firmware is ready to be deployed, read 512-bits out of the USB generator, hash it with SHA-256, and save the resulting hash on disk as an "entropy file".

Then all that is left is when the slot machine boots up and shuts down:

  • On startup, read the "entropy file" saved from the previous shutdown, to seed the CSPRNG.
  • On shutdown, save 256-bits of data out of the generator to disk as an "entropy file".

This is how most operating systems have solved the problem with their built-in CSPRNGs. Provided that the very first "entropy file" was initially seeded with a USB true HWRNG, the state of every slot machine will be always be different, and will always be unpredictable. Also, 256-bits is more than sufficient to make sure the initial state of the generator is unpredictable; physics proves it.

Of course, the SoC could have a HWRNG onboard, but then you run the risk of hardware failure, and the generator becoming predictable. This risk doesn't exist with software-based CSPRNGs, so provided you can always save the state of the generator on disk at shutdown, and read it on startup, you'll always have an unpredictable slot machine.

Webcam Random Number Generation

A couple weeks ago, I purchased a lava lamp for $5 at a thrift store. It was in brand spanking new condition, and worked like a charm. The only thing going through my head at the time? I can't wait to point my webcam at it, and start generating some random numbers! Okay, well that, and mood lighting for the wife.

Anyway, I wrote a quickie Python script which will capture a frame from the webcam, hash it with a keyed BLAKE2, and output the result to a FIFO file to be processed. The BLAKE2 digest of the frame also becomes the key for the next BLAKE2 instance, making this script very CBC-like in execution (the first function is keyed from /dev/urandom, and each digest keys the next iteration).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/python

# Create true random seeds (near as we can tell) with your webcam.
#
# This script will use your webcam pointed at a source of entropy, keyed with
# random data from the OS CSPRNG. You could point the camera at:
#
#   * Lava lamps
#   * Plasma globes
#   * Double pendulums
#   * Rayleigh-Benard convection
#   * Brownian motion
#
# Performance is ~ 2 KiB/s.
# Requires pyblake2: https://pypi.python.org/pypi/pyblake2
#
# Released to the public domain.

import os
import cv2
import pyblake2

cap = cv2.VideoCapture(0)
webcamfile = '/tmp/webcamfile.fifo'
key = os.urandom(64)

try:
    os.mkfifo(webcamfile)
except OSError, e:
    print "Cannot create FIFO: {0}".format(e)
else:
    fifo = open(webcamfile, 'w+')

while True:
    ret, frame = cap.read()
    if not ret:
        break

    b2sum = pyblake2.blake2b(key)
    b2sum.update(frame)
    digest = b2sum.digest()
    key = digest

    fifo.write(digest)
    fifo.flush()

    cv2.imshow('webcamlamp', frame)
    k = cv2.waitKey(1) & 0xFF
    if k == 27:
        break

fifo.close()
os.remove(webcamfile)
cap.release()
cv2.destroyAllWindows()

As you'll notice in the code, you should point your webcam at a source of either chaotic randomness, like a lava lamp, or quantum randomness, like a plasma globe. Because the frame is whitened with a keyed BLAKE2, it could be considered as a true random number generator, or you could use it as a seed for a cryptographically secure pseudorandom number generator, such as those shipped with modern operating systems. If you do use this as a TRNG, realize that it's slow- it only operates at about 2 KiBps.

Here is a screenshot of the webcam itself looking at a USB desk plasma globe, that you can purchase of ThinkGeek for $10.

Webcam view of a plasma globe in operation.

The data is sent to a FIFO in /tmp/. If you don't do anything with the data, and let the buffer fill, the script will hang, until you read data out of the FIFO. As such, you could do something like this to reseed your CSPRNG (of course, it's not increasing the entropy estimate, just reseeding the generator):

$ < /tmp/webcamrng.fifo > /dev/random

Lava lamps and plasma globes are only the beginning. Anything quantum or chaotic that can be visually observed also works. Things like:

  • Double pendulums
  • Brownian motion
  • Rayleigh-Benard convection
  • CCD noise from the webcam itself
  • A bouncing ball on a sinusoidal vibrating table

So, there you have it. Plasma globes and lava lamps providing sufficiently random data via a webcam, either to be used as a secret seed, or as a TRNG itself. Any other systems that could be used to point a webcam at, or suggestions for improvement in the Python code, let me know in the comments.

How To Always Encrypt Chromium Saved Passwords On GNU/Linux - No Matter What

One of the things that has always bothered me about the Chromium project (the project the Google Chrome browser is based on) is that passwords are encrypted, if and only if your operating system provides an authentication API through your account login. For example, on Windows, is is accomplished through the "CryptProtectData" function. This function uses your existing account credentials when logging into your computer, as a "master key" to encrypt the passwords on your hard drive. For Mac OS X, this is accomplished with Keychain, and with GNU/Linux users, KWallet if you're running KDE or GNOME Keyring if you're running GNOME.

In all those cases, your saved passwords will be encrypted before getting saved to disk. But, what if you're like me, and do not fall into any of those situations? Now, granted, GNU/Linux and BSD users (you're welcome) make up about 3% of the desktop installs.

Graph showing operating system market share.

Of that 3%, although I don't have any numbers, maybe 2/3 run GNOME or KDE. That leaves 1 out of every 100 users where Chromium is not encrypting passwords on disk by default. For me, who lands in that 1%, this is unacceptable. So, I wanted a solution.

Before I go any further, let me identify the threat and adversary. The threat is offline disk analysis. I'm going to assume that you're keeping your operating system up-to-date with the latest security patches, and that your machine is not infected with malware. Instead, I'm going to assume that after you are finished using your machine, upgrading the hardware, or a hard drive fails, that the disk is discarded. I'm further going to assume that you either can't or didn't digitally wipe or physically destroy the drive once decommissioned. So, the threat is someone getting a hold of that drive, or laptop, or computer, and imaging the drive for analysis. This means that our adversary is a global adversary- it could be anyone.

Now, the obvious solution would be to run an encrypted filesystem on that drive. dm-crypt with or without LUKS makes this possible. But, let's assume you're not running FDE. Any options? In my case, I run eCryptfs, and store the Chromium data there, symbolically linking to it from the default location.

By default, Chromium stores its passwords in ~/.config/chromium/Default/Login\ Data. This is an SQLite 3.x database, and as mentioned, the passwords are stored in plaintext. A simple solution is to create an eCryptfs private directory, and symlink the database to that location. However, Chromium also stores cookies, caches, and other data in ~/.config/chromium/ that might be worth encrypting as well. So, you can just symlink the entire ~/.config/chromium/ directory to the eCryptfs mount.

I'll assume you've already setup eCryptfs and have it mounted to ~/Private/. If not, run the "ecryptfs-setup-private" command, and follow the prompts, then run "ecryptfs-mount-private" to get it mounted to ~/Private/.

Make sure Chromium is not running and move the ~/.config/chromium/ directory to ~/Private/. Then create the necessary symlink, so Chromium does not create a new profile:

$ mv ~/.config/chromium/ ~/Private/
$ ln -s ~/Private/chromium/ ~/.config/

At this point, all your Chromium data is now stored in your eCryptfs encrypted filesystem, and Chromium will follow the symlink, reading and writing passwords in the encrypted mount. This means, no matter if using KWallet or GNOME Keyring, or nothing at all, your passwords will be always be encrypted on disk. Of course, in the SQLite 3.x database, the passwords are still in plaintext, but the database file is encrypted in eCryptfs, thus giving us our security that we're looking for.

However, there is a caveat which needs to be mentioned. The entire security of the encryption rests solely on the entropy of your eCryptfs passphrase. If that passphrase does not have sufficient entropy to withstand a sophisticated attack from a well-funded organization (our global adversary), then all bets are off. Essentially, this eCryptfs solution is acting like a "master password", and all encryption strengths rests on your ability to use a strong password defined by Shannon entropy. Current best-practice to guard against an offline password cracking attack, is to pick a password with at least 128-bits of entropy. You can use zxcvbn.js from Dropbox to estimate your passphrase entropy, which I have installed at http://ae7.st/ent/ (no, I'm not logging passphrases- save the page offline, pull your network cable and run it locally if you don't believe me).

Linux Kernel CSPRNG Performance

I'm hardly the first one to notice this, but I was having a discussion in ##crypto on Freenode about the Linux kernel CSPRNG performance. It was mentioned that the kernelspace CSPRNG was "horrendously slow". Personally, I found the performance sufficient for me needs, but I decided to entertain his definition. I'm glad I did; I wasn't disappointed.

Pull up a terminal, and run the following command, passing 10GB of data from /dev/urandom to /dev/null:

$ dd if=/dev/urandom of=/dev/null bs=1M count=1024 iflag=fullblock  
1024+0 records in  
1024+0 records out  
1073741824 bytes (1.1 GB) copied, 80.1537 s, 13.4 MB/s
$ pv < /dev/urandom > /dev/null # cancel in a different terminal, unless you have "-S"
1.02GB 0:01:20 [13.3MB/s] [                   < =>                              ]

13.4 MBps of throughput for reading data directly out of the kernelspace CSPRNG. But, can we do better?

In the ##crypto channel, and as should be across development mailing lists, forums, groups, and discussion channels, I recommend that developers should not generally develop their own userspace CSPRNG. There are all sorts of pitfalls and traps waiting for you when you attempt it. Unless you know what you're doing, you could end up with a CSPRNG that isn't actually cryptographically secure (the "CS" in "CSPRNG").

However, what happens when I do actually run a userspace CSPRNG on the same machine? What can I expect out of performance? For example, I could implement AES-128 in CTR mode as a CSPRNG. In fact, we can do this with OpenSSL:

$ dd if=/dev/zero bs=10M count=1024 iflag=fullblock 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 15.3137 s, 701 MB/s
$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null
31.9GB 0:00:34 [ 953MB/s] [                                  < =>               ]

700-950 MBps (notice that dd(1) incurs a performance penalty). That's 52-70x the speed of reading the kernelspace CSPRNG directly. That's more than a full order of magnitude faster. However, this is on a box with AES-NI. What about disabling AES-NI on the same box? How badly does it damage performance, and how does it compare to reading the kernelspace CSPRNG? We can use OpenSSL speed(1SSL) to benchmark algorithms.

First, with AES-NI enabled:

$ openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr     468590.43k  1174849.02k  1873606.83k  2178642.60k  2244471.47k

And with AES-NI disabled:

$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)  
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      74272.21k    83315.43k   340393.30k   390135.47k   391279.96k

In this case, we see about a 5x performance improvement when using the AES-NI instruction set as compared to when not using it. That's significant. And even with AES-NI disabled in userspace, we're still outperforming /dev/urandom by almost 30x.

Interestingly enough, even the OpenBSD CSPRNG (different hardware than previously tested), which uses ChaCha20, outperforms the Linux CSPRNG (although its userspace CSPRNG with openssl(1) doesn't outperform kernelspace):

% dd if=/dev/urandom of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 13.630 secs (78775541 bytes/sec)
% dd if=/dev/zero bs=1M count=1024 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes transferred in 33.498 secs (32052998 bytes/sec)
% openssl speed -elapsed -evp aes-128-ctr 2> /dev/null
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      41766.37k    46930.74k    49593.54k    50669.32k    50678.33k

Roughly 78 MBps for OpenBSD on an Intel Xeon CPU running at 2.80GHz. Basically, six times the speed of the Linux kernel CSPRNG on an Intel Xeon CPU running at 2.67GHz.

So why is the Linux CSPRNG so slow? And, what can we do about it? Well, first, the kernel is using SHA-1 for its cryptographic primitive. In very loose terms, the CSPRNG hashes the input pool with SHA-1, and spits out the output to /dev/urandom. It's output is also its input, so its digesting its own output.

But, that's not all it's doing actually. The first function actually adds data into the input pool without increasing the entropy estimate. Then, after adding those bytes, the input pool is mixed with a Skein-like mixing function. Then some math is done to credit the entropy estimator, and the system is polled for data to add to the input entropy pool. Things like disk IO, CPU timings, interrupts, and user activity. Finally, we're ready to hash the data. This is done by extracting the data out of the input pool, and hashing it with SHA-1. But, we don't want any recognizable output, so the output is left-rotated and folded in half. Then, and only then, is the data ready for consumption.

W.T.F.

Unfortunately, the Linux kernel CSPRNG is not based on any sound theoretical security design. It's very much a hodge-podge home-brew design by developers who think they know what they're doing, when in reality, they don't. In 2013, a security audit and analysis was performed on the Linux kernel CSPRNG (PDF), and concluded that not only is it not robust, but it has some weaknesses:

In the literature, four security notions for a PRNG with input have been proposed: resilience (RES), forward security (FWD), backward security (BWD) and robustness (ROB), with the latter being the strongest notion among them.

(...snip...)

Distributions Used in Attacks based on the Entropy Estimator As shown in Section 5.4, LINUX uses an internal Entropy Estimator on each input that continuously refreshes the internal state of the PRNG. We show that this estimator can be fooled in two ways. First, it is possible to define a distribution of zero entropy that the estimator will estimate of high entropy, secondly, it is possible to define a distribution of arbitrary high entropy that the estimator will estimate of zero entropy. This is due to the estimator conception: as it considers the timings of the events to estimate their entropy, regular events (but with unpredictable data) will be estimated with zero entropy, whereas irregular events (but with predictable data) will be estimated with high entropy.

(...snip...)

As shown in Section 5.7, it is possible to build a distribution D0 of null entropy for which the estimated entropy is high (cf. Lemma 3) and a distribution D1 of high entropy for which the estimated entropy is null (cf. Lemma 4). It is then possible to mount attacks on both /dev/random and /dev/urandom, which show that these two generators are not robust.

(...snip...)

We have proposed a new property for PRNG with input, that captures how it should accumulate the entropy of the input data into the internal state. This property actually expresses the real expected behavior of a PRNG after a state compromise, where it is expected that the PRNG quickly recovers enough entropy. We gave a precise assessment of Linux PRNG /dev/random and /dev/urandom security. In particular, we prove that these PRNGs are not robust. These properties are due to the behavior of the entropy estimator and the mixing function used to refresh its internal state. As pointed by Barak and Halevi [BH05], who advise against using run-time entropy estimation, we have shown vulnerabilities on the entropy estimator due to its use when data is transferred between pools in Linux PRNG. We therefore recommend that the functions of a PRNG do not rely on such an estimator.

Finally, we proposed a construction that meets our new property in the standard model and we showed that it is noticeably more efficient than the Linux PRNGs. We therefore recommend to use this construction whenever a PRNG with input is used for cryptography.

TL;DR? The Linux CSPRNG does not meet the definitions of a secure CSPRNG per the PDF. It's not that it's theoretically broken, it's just not theoretically secure either. It's really nothing theoretically at all. This isn't great.

A replacement for random.c in the kernel would be to ditch the homebrew entropy collection, mixing, and output mangling, and instead, stick with AES-128 in CTR mode. Of course, as per the PDF, the entropy collectors need serious work, but if AES-128-CTR was deployed as the CSPRNG instead of SHA-1, then the generator could take advantage of hardware AES performance, which as I've shown, is exceptionally superior. It's frustrating, because the kernel already ships AES, so the code is already there. It's just not being utilized.

The Linux kernel could have 1 GBps in CSPRNG output, but is deliberately choosing not to. That's like having a V12 turbo-charged sleeper, without the turbo, and only firing on 3 of the 12 cylinders, with a duct taped muffler on the back.

Why does 1 GBps of performance matter? How about wiping hard drives or secure data removal in general? With 20 MBps, we can't even saturate a single drive in IOPS. With 1 GBps, we could saturate many simultaneously. As someone who wipes old employee workstations when they leave the company, backup servers with dozens of drives, or old decommissioned hardware, I see great benefit here.

Or, how about HTTPS web sites for a shared web hosting provider? I have seen countless times HTTPS and SSH connections lag due to waiting on the CSPRNG. Not that it's being intentionally blocked, but because the load is so intense on the server, it just can't generate enough cryptographic randomness to keep up with requests.

I'm sure there are plenty of other examples where end userspace applications could benefit with improved performance of the CSPRNG. And, as shown, it can't be that difficult to implement correctly. The real question is, of course, who will do the work and submit the patch?

Using Your Monitors As A Cryptographically Secure Pseudorandom Number Generator

File this under the "I'm bored and have nothing better to do" category. While coming into work this morning, I was curious if I could use my monitors as a cryptographically secure pseudorandom number generator (CSPRNG). I don't know what use this would have, if any, as your GNU/Linux operating system already ships a CSPRNG with /dev/urandom. So, in reality, there is really no need to write a userspace CSPRNG. But what the hell, let's give it a try anywho.

The "cryptographically secure" piece of this will come from the SHA-512 function. Basically, the idea is this:

  • Take a screenshot of your monitors.
  • Take the SHA-512 of that screenshot.
  • Resize the screenshot to 10% it's original size.
  • Take the SHA-512 of that resized file.
  • Take the SHA-512 of your previous two SHA-512 digests.
  • Take the last n-bits of that final digest as your random number.

Most GNU/Linux systems come with ImageMagick pre-installed, as well as the "sha512sum(1)" function. So thankfully, we won't need to install any software. So, here's a simple shell script that can achieve our goals:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/sh
# Produces random numbers in the range of [0, 65535].
# Not licensed. Released to the public domain.

cd ~/Private # assuming you have an encrypted filesystem mounted here

TS1=$(date +%Y%m%d%H%M%S%N)
import -window root ${TS1}.png
sha512sum ${TS1}.png > /tmp/SHA512SUMS

TS2=$(date +%Y%m%d%H%M%S%N)
convert -scale 10% ${TS1}.png ${TS2}.png
sha512sum ${TS2}.png >> /tmp/SHA512SUMS

DIGEST=$(sha512sum /tmp/SHA512SUMS)
printf "%d\n" 0x$(printf "$DIGEST" | awk '{print substr($1, 125, 128)}')

shred ${TS1}.png ${TS2}.png
rm ${TS1}.png ${TS2}.png
shred /tmp/SHA512SUMS
rm /tmp/SHA512SUMS

Running it for 10 random numbers:

$ for i in {1..10}; do sh monitor-csprng.sh; done
15750
36480
64651
7942
2367
10905
53889
9346
52726
63570

A couple things to note:

  • This is slow, due to taking the screenshot, and resizing it.
  • The data on your monitors should be sufficiently random. Chats, social updates, etc. The security of this will depend entirely on the entropy of the initial screenshot.
  • You really should be saving your screenshots to an encrypted filesystem, such as eCryptfs.
  • We're using timestamps with nanosecond accuracy to provide some additional entropy for the final SHA-512 digest.
  • This is using the last 4 hexadecimal characters to be converted to decimal. In reality, it could be anything, including some convoluted dynamic search algorithm in the string.

It's worth noting that the entropy of the initial screenshot is critical, which is actually difficult to accurately measure. So, it may help to have a chat window or more open, with recent chat logs. Same could be said for social update "walls", with the most recent updates (Twitter, Facebook, Goodreads, etc.). Having a clock with seconds ticking in a status bar can also help (although not unpredictable, at least semi-unique). Tabs in browsers, running applications, etc. The more unpredictable your workspace in the screenshot, the better off you'll be. But, people in general suck at randomness, so I'm not advocating this as something you should rely on for a cryptographically secure random number generator.

If you wanted, you could add this to a terminal, giving you a sort of "disco rave" before taking the screenshot:

1
2
3
4
5
6
7
#!/bin/sh
# Disco lights in your terminal
# No license. Released to the public domain

while true; do
    printf "\e[38;5;$(($(od -d -N 2 -A n /dev/urandom)%$(tput colors)))m•\e[0m"
done

Get that running first, then take your screenshot. But then, if you're reading data off of /dev/urandom, you might as well do that for your random numbers anyway...

Your GnuPG Private Key

This post is inspired by a discussion in irc://irc.freenode.net/#gnupg about Keybase and a blog post by Filippo Valsorda.

I was curious just exactly how my private key is encrypted. Turns out, gpg(1) can tell you directly:

$ gpg --output /tmp/secret-key.gpg --export-secret-keys 0x22EEE0488086060F
$ gpg --list-packets /tmp/secret-key.gpg
:secret key packet:
	version 4, algo 17, created 1095486266, expires 0
	skey[0]: [1024 bits]
	skey[1]: [160 bits]
	skey[2]: [1023 bits]
	skey[3]: [1023 bits]
	iter+salt S2K, algo: 3, SHA1 protection, hash: 2, salt: ad8d24911a490591
	protect count: 65536 (96)
	protect IV:  01 1e 07 58 4a b6 68 a0
	encrypted stuff follows
	keyid: 22EEE0488086060F
(...snip...)

Notice the line "iter+salt S2K, algo: 3, SHA1 protection, hash: 2, salt: ad8d24911a490591". In there, you see "algo: 3" and "hash: 2". What do those identifiers reference? If you refer to RFC4880, you can learn what they are:

Symmetric Encryption Algorithms

  1. Plaintext or unencrypted data
  2. IDEA
  3. 3DES
  4. CAST5
  5. Blowfish
  6. Reserved
  7. Reserved
  8. AES-128
  9. AES-192
  10. AES-256
  11. Twofish

Cryptographic Hashing Algorithms

  1. MD5
  2. SHA-1
  3. RIPEMD-160
  4. Reserved
  5. Reserved
  6. Reserved
  7. Reserved
  8. SHA-256
  9. SHA-384
  10. SHA-512
  11. SHA-224

I emphasized the defaults, which are CAST5 and SHA-1. So, your key is encrypted with the SHA-1 of your passphrase, which is used as the key for CAST5 to encrypt your private key. Thus, the whole security of your encrypted private key rests on the entropy of your passphrase, provided that sane defaults are chosen for the encryption and hashing algorithms, which they are.

CAST5 has been well analyzed and it is not showing any practical or near practical weaknesses. It is a sane default to chose for a symmetric encryption algorithm. However, CAST5 uses 64-bit blocks for encrypting and decrypting data, which may have some theoretical weaknesses. AES uses 128-bit blocks, and thus has a larger security margin. Because AES-256 is available as a symmetric encryption algorithm, there really is no reason to not use it, aside from feeling more secure.

SHA-1 is showing near practical attacks on blind collisions, but for use with keying a block cipher from a passphrase, it's still exceptionally secure. What is needed to break SHA-1 in this regard, is a pre-image attack. A pre-image attack is where you have the hash, but you do not know the input that created it. This is not brute force. This is able to break the algorithm in such a way, that provided with any hash, you can reliably produce its input. SHA-1 has a wide security margin here, so there really is nothing practical to worry about. However, with SHA-512 available, there is also really no reason why not to use a SHA-2 algorithm. In fact, aside from the increase security margin, SHA-512 is designed to work well on 64-bit platforms, but struggle with 32-bit. So this gives us an increased security margin, albeit negligible, against using something like SHA-256.

So, how can we change these? Turns out to be quite simple. All you need to do is specify the secret key symmetric encryption algorithm and hashing algorithm, then change your password (retype it with the same password if you don't want to change it):

$ gpg --s2k-cipher-algo AES256 --s2k-digest-algo SHA512 --edit-key 0x22EEE0488086060F
Secret key is available.

pub  1024D/0x22EEE0488086060F  created: 2004-09-18  expires: never       usage: SCA 
                               trust: unknown       validity: unknown
sub  1792g/0x7345917EE7D41E4B  created: 2004-09-18  expires: never       usage: E   
sub  2048R/0xCE7911B7FC04088F  created: 2005-07-04  expires: never       usage: S   
(...snip...)

gpg> passwd
(...snip...)
gpg> save

Now if we export our key, and look at the OpenPGP packets, we should see the new updates:

$ gpg --output /tmp/secret-key.gpg --export-secret-keys 0x22EEE0488086060F
$ gpg --list-packets /tmp/secret-key.gpg
:secret key packet:
	version 4, algo 17, created 1095486266, expires 0
	skey[0]: [1024 bits]
	skey[1]: [160 bits]
	skey[2]: [1023 bits]
	skey[3]: [1023 bits]
	iter+salt S2K, algo: 9, SHA1 protection, hash: 10, salt: 9c3dbf2880791f2e
	protect count: 65536 (96)
	protect IV:  db f5 e8 1c 98 03 99 7c 77 33 4e cd d3 3c 1f 4f
	encrypted stuff follows
	keyid: 22EEE0488086060F
(...snip...)

Now I have "algo: 9" which is "AES256" and "hash: 10" which is "SHA512" protecting my private key. I've gained a little bit extra security margin at the cost of retyping my passphrase. Not bad. Now, I'd like to pose you a question:

If I were to publish my encrypted GnuPG private key, think I'm crazy?

Let's look at this for a second. We know that the key is protected with AES-256. AES remains secure, after 15 years of intense scrutiny and analysis, and is showing no signs of wear. It's the most used symmetric encryption algorithm that protects HTTPS, SSH, VPN, OTR, Tor, and even your GnuPG encrypted emails. It protects social security numbers, credit card transactions, usernames and passwords, patient health records, hard drives, and on, and on, and on. AES is secure.

So, breaking the encrypted key will be at least as hard as breaking AES, which seems to be a long shot. So, you would be better off attacking my passphrase. Knowing that it was used as the key for AES, and that it is hashed with SHA-512, we can write a brute force algorithm to generate passphrases, hash them with SHA-512, and attempt at decrypting the AES key. After all, we have the salt and the IV right there in plaintext in the packets. Turns out, there are already projects for this.

This should be a breeze. Unless, of course, you have sufficient entropy behind your passphrase. What is sufficient entropy? Well, take some time searching "entropy" on my blog. You'll quickly learn that I recommend at least 80-bits. 100-bits would give you a paranoid security margin, even with the password being hashing with a fast hashing algorithm like SHA-512.

So, if the private key is encrypted with AES-256, and the passphrase has sufficient entropy to withstand a sophisticated attack, and it is hashed with SHA-512, then I should have no concern posting my encrypted key into the open, right?

Without further ado, here is my encrypted GnuPG private key:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

- -----BEGIN PGP PRIVATE KEY BLOCK-----
Version: GnuPG v1

lQHpBEFLyzoRBACXCUta5CK+DCgnXn9wkqUumkcbenibGPBe3Y8IEY4BjkdbGdTN
tiGB+Tvo0hzn2qzy4mNPlOx/LWZWF2MdwF3WS77wwIskMb8W314zhE2RS0G318YY
X7zMGSF+7QiNXNsW/d0t1RonYOKIS96zKOtFQZrTr//V8+1rxEa4rvO5dwCgul0s
pt2BUDqwoy2Q/5UKgnmrzmsD/37/3g5zXykvTH2P6BlgTdfnVvpOLDT3CyWlAynz
u5hdmgYNT50I2w5TstY+uViYhAbMiyIT1HwBRcaQh8hUWkzDGyzJF7pS4pZeD0M9
u0P7Cejm2+ENdOX66ablWjP7GLJRcToGxnAZ6hgPpWLen8lHYaUK//g4JJx8UJ/n
wifeA/9xYWDi3ur/fFCKQZIPV9Ziw1oL58su948yWRn2WN7m74+bSldkXzkc4jRe
Q51FpGBHMswRIJKB6yG1FbfLum8ppGbvtz9NrMMZuirguTWetX8aJrjr0ddGjTsY
uZPfKoUiqDUXSFc3hmVgQQQ4MFdD3XYy6AQTyI1vstCS/Tdn7P4JAwqcPb8ogHkf
LmDb9egcmAOZfHczTs3TPB9Pg1SJqjvSz7nKDY87EVmeM46YBaCs1XScaOF4Gs+x
u0LNAxlfX3xOUIWRtCdBYXJvbiBUb3BvbmNlIDxhYXJvbi50b3BvbmNlQGdtYWls
LmNvbT6IWQQTEQIAGQUCQUvLOgQLBwMCAxUCAwMWAgECHgECF4AACgkQIu7gSICG
Bg/+cACeM0EeO7gE85/OSwMzxjvQAGB53jgAnik6qvFWyQtvp71KElbpZUsa0YNj
nQIoBEFLy0IQBwCjVGmY/PmOtRHtBIuANfg9zf8thGXZtFZWEgzHLGUgSfIjb0di
F24mwiVw2k3gzqBKuFBJ633F3AhwlTBnXS3tLWQgwSrm3BcCOOn+wJvwgUXa0iBn
gXhcq/7IL7HnKsYiG9EFMI8mAd10t7zdsA/dS2xUmFNexQvUdra/uRU+eeQbrXYe
iwfymw2RCZROl4/QXA9/a/aTyUhKgkj2vieo0jh394h7grPZxw0lgCclvTN/0jGq
dPkp56NMDb0eGlVzWeEiseD1JxXeYaSeToJP3zmx+nFoiBa+VVVeUhzYAwADBQb+
P373jEOwDu63py4FWdMPMYNWv1xFWLYI5hWaTKPSRwG/NZICRDF+QNztSmVOhW7Z
SFY/nTHq5yFp+QID63VMGv5Cunse9QXAoarecyV2hllwUq7l7wHujJhvvqyEgL9G
ah/drkZMGe8btYihz/M4g5i1P2DPr4CL/46eZxgjmjuVw7Nb0UsgUPgGizPCbnJ3
ye1ahxc1dOX80Guh0ZDRfR/ehZkk07wN2H6KRrxFCAmDaCR9KxwGYbpepND0t4HG
qCMji37lzYUrS4PV6yK0DGekqF98xgiIVBYFjF0jwQD+CQMKnD2/KIB5Hy5g4K9F
3JFIPxw+L+Gdc2MSYuJI3Y5kpZluUmYYYYFgOH64/J8egeYSSKWUIPqMhnwBWtbW
GQCNdztQyIIi6mB3YiDeK2AMnhRq+PwwwGG1iEYEGBECAAYFAkFLy0IACgkQIu7g
SICGBg+MhQCfZ7FNu4wMtdifkblGkN5Qqj+cYWYAoLiipdgnnhPTP2z7SgOsxiR4
YI4wnQPEBELI4BIBCADCobEk1f0sByuV4p2moEmZIXXEJhzTolO2mmBLBSmbjPMg
OBpAFTmWYXJxo8oZnNeeTOWN/hsHV9rmC/c/bZ7FGX4u/pN0l3qOSoSO6m6yvM/O
q4idyl17SfqDR9AUdEwJIA9zfSomzmzfO/q3rIAOPNkd71RQ4YMkHn8SfQLaojxl
+N5pPB29c6Cy+hltP6JRibAHgYCkxQGLK6ZLQ1LTwkDLNxCH8qfmHvg7M3qcXxl4
KOgtb8kNZ3PpAvek1GNL/eaF7nnd/u5uHuJM5V7czVHjpMbBfRGgLzR6Tu6v8qm7
nJzhSErxC7lK0hXLgGuHlk26bj9pzb+lfBk7GnkRAAYp/gkDCpw9vyiAeR8uYPUV
mINtxpejZeID6EaNZ8wTKNB1e3yt/As2Svkpcl1ivosbADhxCTFSJ4RtmEynFPCw
qgYbjuTBK+Fv2LSd7CYHJm4zZ/hfiCBgL4jI3ZY52qHFnBhgHtLwza/eLeaC0i4+
RWUAN9JEW1Ygpsh0ApRd1UvY8dGoLF24fy3C2jeh5Q3SVw2SnhUryLU7u6gTTLt4
yskT+/pzdNrb2n3dvLLUhQJ50SeTroa/swB/SXnjWtDSRGpvv3HAStHSENaZCIrV
e/u/WSNd99nklpYdPwh3UFO9OOayYErYiprEhLC7JjdVbqbu2aIfrw+nS/PS+tlm
wQTYH/DKA6wIeSZK4y9Byfrx3oq4DPk9pge7n3z3oz9pzd+xG2GMvExwKiCUlcF5
he5Peb4sTsJz0epMpwcBXFNYyVO++rZvBTABhgY5MQAcympoBAAcspo8AvBWx2xK
SaIyp5XDtT8jqey5Jjo3qmIQvdpo5oQub73JCMDSHi7KZ/IkgtggpgDqNh9bT5ZE
C7d3lprPFQJNQtFTO2K8NecRkgatn9Imomy1DidlnsCuZjqfamNVssWhm0uNu8SC
aJYBtC9kZXhBLAy0mFCMfJQ8ysDzlG3+Fu9wE9tCQvkh7y0ZKo9Ortqnwqp6S0I3
2xMBOg2EMHRP+ADF3q/i2wlgbH0MHsiX4oMxd0eIZV+rJIQSnO547UvvzAUBjXjZ
1l23wmg8FctYIE0UPsTo1QNArGO1sIOdv807UMolB9lq1pKAXkJP0GPRZLv9bLDj
kkKLEBELXCCQA0vDH+N/eGXRdJbWv5i0mOXmxK2JAZZw8rsJF3XZ22PABrC0NaCP
qHftcq20PPMCRZ/TfjQmwlot495KmLUJo2G6JasdcEilyPH2GVUwiqCM2/wOWk4U
xX+FmA40NYmU64hJBBgRAgAJBQJCyOASAhsCAAoJECLu4EiAhgYPt9kAn3r0Hhf9
aTFyowH3pgqIUiaMVQo5AKCFWzcU23YT0E2/LZNl6Yqzcs113Q==
=SHc5
- -----END PGP PRIVATE KEY BLOCK-----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJWToHpAAoJEM55Ebf8BAiPk7kH+weVf8kVJRjSaSWE+Aft76iA
Nzj1cVUfpWoT/K139i3TMiZ6PpAQtCRyEakdxfeSfXiOz83pqmKSL5ADCdlRoxuB
HtkoLW6thETOs70mDrrsEQgBZgMYPMsiKG1W/M3xppRGZxUM7/UEXhjHYiThe1Qd
Dkwot+hu5EttQpu0kKFmPrviPpJOk0gJ5SQrhlROWCS+aT9TyhbswMRpSyurDZ2H
LppGk8EtBeWTsTf9AhemX1GFu4iJPwIDfZtiWOLGGjQn4ROqb/RWqLG254O//Gw6
jtDRGHIGyYk+2NQ6/gAKWI9Sxaz5kUxqKSzDU9WuDCE3peIB9HXM+ynFVKLgsXE=
=Rnfg
-----END PGP SIGNATURE-----

Now that this is posted, I should expect everyone to steal my GnuPG identity, parading around as me, forging signatures, correct? No, it's not happening. I trust the crypto. I trust the math. I trust the software.

Why? Why am I doing this? Well, a recent discussion popped up on IRC about Keybase. Some don't like the fact that they are encouraging you to upload your encrypted private key to the server. Some claim that it is "insecure". However, didn't we just logically and reasonably conclude that AES and SHA-2 are protecting my best interests? So, if it's insecure, then it can't be due to the crypto. It must be something else. Let's look at all the risk factors:

Keybase servers are compromised and you use the web interface
If the server is compromised, then the attackers can modify the code, providing malicious JavaScript to the browser, so when you successfully decrypt your private key, it can be sent and stored elsewhere under their control. There is nothing you can do here. You are screwed. This is a very valid concern.

Keybase servers are compromised and you use the command line interface
The attackers only have access to your encrypted private key. Provided you never use the web interface, all encryption and decryption as handled out-of-band. This means that regardless of the Keybase server compromise, the attackers will never actually get to forge signatures using your GnuPG private key.

Your local client is compromised
There is nothing I can do for you here. Keybase isn't to blame, and not even the gpg(1) client can protect you. You have bigger problems on your hands than the attackers gaining access to your unencrypted GnuPG private key.

I think a lot of GnuPG users are paranoid about their key getting leaked, because they are unsure of exactly how it is stored. Hopefully this post lays those fears to rest. If there are still concerns about the leak of encrypted private keys, then it's probably due to a fear of not fully understanding the strength of passwords and their entropy requirements. However, if you meet the necessary password entropy requirements, and your key is encrypted with CAST5/SHA-1 or AES-256/SHA-512, there is nothing wrong with keeping a backup of your GnuPG key on Github, AWS, Dropbox, or other "cloud" hosting solutions.

Trust the math. Trust the software.

By the way, if you do successfully recover my private key passphrase (and I know you won't), I would be interested in hearing from you. Send me a signed email with my private key, and I'll make it financially worth your while. 🙂

Do XKCD Passwords Work?

You'll always see comments on web forums, social sites, blog posts, and emails about "XKCD passwords". This is of course referring to the XKCD comic by Randall Munroe describing what he thinks is the best password generator:

What no one has bothered asking, is if this actually works.

Lorrie Faith Cranor, director of the Carnegie Mellon Usable Privacy and Security Laboratory at Carnegie Mellon University, a member of the Electronic Frontier Foundation Board of Directors, and Professor in the School of Computer Science and the Engineering and Public Policy Department at Carnegie Mellon University, did ask this question. In fact, she studied to the point, that she gave a TED talk on the subject. The transcript of her talk can be found here. Here are the relevant bits (emphasis mine):

Now another approach to better passwords, perhaps, is to use pass phrases instead of passwords. So this was an xkcd cartoon from a couple of years ago, and the cartoonist suggests that we should all use pass phrases, and if you look at the second row of this cartoon, you can see the cartoonist is suggesting that the pass phrase "correct horse battery staple" would be a very strong pass phrase and something really easy to remember. He says, in fact, you've already remembered it. And so we decided to do a research study to find out whether this was true or not. In fact, everybody who I talk to, who I mention I'm doing password research, they point out this cartoon. "Oh, have you seen it? That xkcd. Correct horse battery staple." So we did the research study to see what would actually happen.

So in our study, we used Mechanical Turk again, and we had the computer pick the random words in the pass phrase. Now the reason we did this is that humans are not very good at picking random words. If we asked a human to do it, they would pick things that were not very random. So we tried a few different conditions. In one condition, the computer picked from a dictionary of the very common words in the English language, and so you'd get pass phrases like "try there three come." And we looked at that, and we said, "Well, that doesn't really seem very memorable." So then we tried picking words that came from specific parts of speech, so how about noun-verb-adjective-noun. That comes up with something that's sort of sentence-like. So you can get a pass phrase like "plan builds sure power" or "end determines red drug." And these seemed a little bit more memorable, and maybe people would like those a little bit better. We wanted to compare them with passwords, and so we had the computer pick random passwords, and these were nice and short, but as you can see, they don't really look very memorable. And then we decided to try something called a pronounceable password. So here the computer picks random syllables and puts them together so you have something sort of pronounceable, like "tufritvi" and "vadasabi." That one kind of rolls off your tongue. So these were random passwords that were generated by our computer.

So what we found in this study was that, surprisingly, pass phrases were not actually all that good. People were not really better at remembering the pass phrases than these random passwords, and because the pass phrases are longer, they took longer to type and people made more errors while typing them in. So it's not really a clear win for pass phrases. Sorry, all of you xkcd fans. On the other hand, we did find that pronounceable passwords worked surprisingly well, and so we actually are doing some more research to see if we can make that approach work even better. So one of the problems with some of the studies that we've done is that because they're all done using Mechanical Turk, these are not people's real passwords. They're the passwords that they created or the computer created for them for our study. And we wanted to know whether people would actually behave the same way with their real passwords.

So, in her research, XKCD passwords really didn't work out that well. They are longer in length, so they take longer to type, which increases the chance for error, and people are no better at remembering on XKCD passphrase, than they are a short string of random characters.

To me, this is unsurprising. If you look at the history of my blogging on passwords, you'll find that I continually advocate true random events to build your passwords, maximizing entropy. In my last post, I even blogged two shell functions that you can use to build XKCD passwords, and "monkey passwords" (monkeys generating passwords by banging away at a keyboard). Both target 80-bits of entropy in the generation. Check out the lengths:

$ gen-monkey-pass 9
cxqwtw63taxdr3zn	uaq4tbt43japmm2q	mptwrxhhb486yfuv
-cb73b9-kgzhmww3	s45t3x6r9smw-7yr	hjkgzkha-qup4gh4
34c5rg4ksw-aprvk	uug-2vq7pfze6dnp	s4qx4eazbnrd2pqe

$ gen-xkcd-pass 9
sorestdanklyAlbanyluckyRamonaFowler   (sorest dankly Albany lucky Ramona Fowler)
towsscareslaudedrobinawardsrenal      (tows scares lauded robin awards renal)
thinkhazelsvealjuggedagingscareen     (think hazels veal jugged agings careen)
tarotpapawsNolanpacketAvonwiped       (tarot papaws Nolan packet Avon wiped)
surgesakimbohardercruelArjunablinds   (surges akimbo harder cruel Arjuna blinds)
amountlopsedgemeaslyCannoninseam      (amount lops edge measly Cannon inseam)
EssexIzmirwizesPattygroutszodiac      (Essex Izmir wizes Patty grouts zodiac)
hoursmailedslamsvowedallowspar        (hours mailed slams vowed allow spar)
AfghanNigelnutriadillmoldertrolly     (Afghan Nigel nutria dill molder trolly)

XKCD passwords average 32 characters to achieve 80-bits of entropy, compared to 16 characters that "monkey passwords" produce. And, according to the research done by Lorrie, people won't necessarily recall XKCD passwords any easier than "monkey passwords". So, if that's the case, then what's the point? Why bother? Why not just create "monkey passwords", and use a password manager?

Exactly. It's 2015. There are password managers for your browser, all versions of every desktop operating system, command-line based utilities for servers, and even apps for your smartphone. There are plenty of "cloud" synchronization services to make sure each instance is up-to-date. At this point, your passwords should:

  • Contain at least 80-bits of entropy.
  • Be truly random generated (no influence from you).
  • Be unique for each and every account.
  • Be protected with two-factor authentication, where available.
  • Be stored in a password manager, that is easily accessible.

You'll remember the ones you type in frequently, and you'll memorize them quickly. The others are stored for safe keeping, should you need to recall them.

Password Generation in the Shell

No doubt, some people use password generators- not many, but some. Unfortunately, this means relying on 3rd party utilities, where the source code may not always be available. Personally, I would rather be in full control of the entire generation stack. I know how to make sure plenty of entropy is available in the generation, and I know which sources of entropy to draw on to maximize the entropy estimate. As such, I don't use tools like pwgen(1), apg(1), or anything else. I rely strictly on /dev/urandom, grep(1), and other tools guaranteed to be on every BSD and GNU/Linux operating system.

As such, the script below has been successfully tested in various shells on Debian GNU/Linux, PC-BSD, FreeBSD, OpenBSD, NetBSD, and SmartOS. If you encounter a shell or operating system this script does not work in, please let me know. Thanks to all those who helped me test it and offered suggestions for improvement.

So, with that said, here they are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# No copyright. Released under the public domain.
# You really should have shuf(1) or shuffle(1) installed. Crazy fast.
shuff(){
    if [ $(command -v shuf) ]; then
        shuf -n "$1"
    elif [ $(command -v shuffle) ]; then
        shuffle -f /dev/stdin -p "$1"
    else
        awk 'BEGIN{
            "od -tu4 -N4 -A n /dev/urandom" | getline
            srand(0+$0)
        }
        {print rand()"\t"$0}'
| sort -n | cut -f 2 | head -n "$1"
    fi
}
gen_monkey_pass(){
    I=0
    [ $(printf "$1" | grep -E '[0-9]+') ] && NUM="$1" || NUM="1"
    until [ "$I" -eq "$NUM" ]; do
        I=$((I+1))
        LC_CTYPE=C strings /dev/urandom | \
            grep -o '[a-hjkmnp-z2-9-]' | head -n 16 | paste -s -d \\0 /dev/stdin
    done | column
}
gen_xkcd_pass(){
    I=0
    [ $(printf "$1" | grep -E '[0-9]+') ] && NUM="$1" || NUM="1"
    [ $(uname) = "SunOS" ] && FILE="/usr/dict/words" || FILE="/usr/share/dict/words"
    DICT=$(LC_CTYPE=C grep -E '^[a-zA-Z]{3,6}$' "$FILE")
    until [ "$I" -eq "$NUM" ]; do
        I=$((I+1))
        WORDS=$(printf "$DICT" | shuff 6 | paste -s -d ' ' /dev/stdin)
        XKCD=$(printf "$WORDS" | sed 's/ //g')
        printf "$XKCD ($WORDS)" | awk '{x=$1;$1="";printf "%-36s %s\n", x, $0}'
    done | column
}

Nothing fancy about them. The first function, "shuff" is really just a helper function for systems that might not have shuf(1) or shuffle(1) installed. It's used only in the "gen_xkcd_pass" function. The next function, "gen_monkey_pass" acts like monkeys banging on the typewriter. It reads /dev/urandom directly, reading the printable characters that come out of it, counting them to 16, and putting them in an orderly set of columns for output as seen below. The input is a total set of 32-characters, giving each character exactly 5-bits of entropy. So, at 16 characters, each password comes with exactly 80-bits of entropy. The character set was chosen to stay entirely lowercase plus digits, and remain unambiguous, so it's clear, and easy to type, even though it may still be hard to remember. The function can take a numerical argument, for generating exactly that many passwords:

$ gen_monkey_pass 24
awdq2zwwfcdgzqpm	t54zqxus77zsu6j6	-2h6dkp93bjdb496
thm9m9nusqxuewny	qmsv2vqw-4-q4b4d	ttbhpnh4n7nue5g8
ytt6asky765avkpr	grwhsfmyz872zwk3	mzq-5ytdv8zawhy6
zb46qgnt62k74xwf	uydrsh2axaz5-ymx	6knh32qj4yk885ea
vky55q2ubgaucdnh	5dhk9t97pfja9phj	rhn2qg734p83wnxs
-q2hb833c-54z-9j	t33shcc55e3kqcd6	q6fwn3396h4ygvq4
232hr73rkymerpyg	u2pq-3ytcpc79nb9	7hqqwqujz4mxa-en
jj9vdj3jtpjhwcp6	mqc97ktz-78tb2bp	q7-6jug86kqhjfxn

The last function, "gen_xkcd_pass" comes from the "correct horse battery staple" comic from XKCD. On every Unix system, there is a dictionary file installed at /usr/share/dict/words or /usr/dict/words. On Debian GNU/Linux, it contains 99,171 words (OpenBSD contains 234,979!). However, many of them have the apostrophe as a valid character. Taking out any punctuation and digits, we are left with just lowercase and uppercase characters for our words. Further, the total word space is limited to at least 3 characters in length and at most 6 characters in length. This leaves us with 19,198 words, or about 14.229-bits of entropy per word. This means generating at least 6 words to achieve an 80-bit entropy minimum. For clarity, the password is space-separated to the right in parens, to make it more clear what exactly the password is, as shown below. Even if all 6 words have 6 characters (the password is 36 characters in total), the formatted line will never be longer than 80 characters in width, making it fit perfectly in an 80x24 terminal. It also takes a numerical argument, for generating exactly that many passwords:

$ gen_xkcd_pass 8
flyersepticspantearruinedwoo         (flyer septic span tear ruined woo)
boasgiltCurrywaivegalsAndean         (boas gilt Curry waive gals Andean)
selectpugjoggedlargeArabicbrood      (select pug jogged large Arabic brood)
titshubbubAswancartharmedtaxi        (tits hubbub Aswan cart harmed taxi)
Reaganmodestslowleessamefoster       (Reagan modest slow lees same foster)
tussleFresnoJensentheirsNohhollow    (tussle Fresno Jensen theirs Noh hollow)
Laredoriffplunkbarredhikersrearm     (Laredo riff plunk barred hikers rearm)
demostiffnukesvarlethakegilt         (demo stiff nukes varlet hake gilt)

Of course, as you can see, some fairly obscure words pop out as a result, such as "filt" and "rearm". But then, you could think of it as expanding your vocabulary. If you install the "american-insane" dictionary, then you can get about 650,722 words in your total set, bringing your per-word entropy north of 16-bits. This would allow you to cut your number of generated words down to 5 instead of 6, to keep the 80-bits entropy minimum. But then, you also see far more obscure words than with the standard dictionary, and it will take a touch longer to randomize the file.

This script should be platform agnostic. If not, let me know what isn't exactly working in your shell or operating system, and why, and I'll try to address it.

The Kidekin TRNG Hardware Random Number Generator

Yesterday, I received my Kidekin TRNG hardware random number generator. I was eager to purchase this, because on the Tindie website, the first 2 people to purchase the RNG would get $50 off, making the device $30 total. I quickly ordered one. Hilariously enough, I received a letter from the supplier that I was their first customer! Hah!

Image of the Kidekin Digital TRNG

Upon opening the package, I noticed the size of the TRNG. It's roughly 10.5 cm from end-to-end which makes it somewhat awkward for a device sitting in your USB port on your laptop. It would work fine sitting in the back of a desktop or server, out of the way, but on my Thinkpad T61, it's a bit large to be sitting there 24/7 feeding my kernel CSPRNG.

Plugging the device in, the kernel actually sees two USB devices, not just one, and sets them up as /dev/ttyUSB0 and /dev/ttyUSB1. Curious. Downloading the software ZIP file from their webpage, and looking through it, the following UDEV rules are provided:


$ cat /etc/udev/rules.d/98-kidekin.rules 
#SYMLINK+= method works on more systems, if it does not on your system, please switch to the NAME= method.

#disable the unused port.
#SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="00", NAME="kidekin_dont_use", MODE="0000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"
SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="00", SYMLINK+="kidekin_dont_use", MODE="0000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"

#connect kidekin TRNG to /dev/random
#SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", NAME="kidekin_trng", MODE="0777", RUN+="/bin/stty raw -echo -crtscts -F /dev/kidekin_trng speed 3000000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"
SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", SYMLINK+="kidekin_trng", MODE="0777", RUN+="/bin/stty raw -echo -crtscts -F /dev/kidekin_trng speed 3000000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"
SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", RUN+="/etc/init.d/rng-tools restart"

This is a bit assuming, and a bit overdoing it IMO, so I simplified it, and setup the following:

SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", SYMLINK+="kidekin", MODE="0777", RUN+="/bin/stty raw -echo -crtscts -F /dev/kidekin speed 3000000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"

This avoids setting up a "do not use" symlink for the unnecessary USB device, and changes the symlink of the usable USB device to /dev/kidekin. This also doesn't restart rngd(8), as I'll administer that on my own. At this point, I am ready for testing.

First and foremost, I wanted to test its throughput:

$ dd if=/dev/kidekin count=1G | pv -a > /dev/null
[ 282KiB/s]

The device held stable at 282 KBps or roughly 2.2 Mbps. This is 75.2 KBps per dollar for my $30 purchase. Not bad.

The Kidekin is based on astable free running oscillators, or multivibrators. Unfortunately, a security proof does not accompany the device. So, while this may hold up to the suite of randomness tests, the output may not be cryptographically secure, and could also potentially be backdoored, as verifying the hardware is not easily doable. So, let's see if it at least holds up to the randomness tests. I created a 256 MB file, and ran the standard suites of tests:

$ dd if=/dev/kidekin of=entropy.kidekin bs=1M count=256 iflag=fullblock
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 928.326 s, 289 kB/s

At this point, I can start my testing. First, let's quantify the amount of entropy per byte, as well as some basic tests with ent(1):

$ ent entropy.kidekin
Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size
of this 268435456 byte file by 0 percent.

Chi square distribution for 268435456 samples is 248.92, and randomly
would exceed this value 59.56 percent of the times.

Arithmetic mean value of data bytes is 127.4924 (127.5 = random).
Monte Carlo value for Pi is 3.141825693 (error 0.01 percent).
Serial correlation coefficient is -0.000003 (totally uncorrelated = 0.0).

Everything good so far. How about the FIPS 140-2 tests for randomness:

$ rngtest < entropy.kidekin
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 2147483648
rngtest: FIPS 140-2 successes: 107292
rngtest: FIPS 140-2 failures: 82
rngtest: FIPS 140-2(2001-10-10) Monobit: 14
rngtest: FIPS 140-2(2001-10-10) Poker: 13
rngtest: FIPS 140-2(2001-10-10) Runs: 26
rngtest: FIPS 140-2(2001-10-10) Long run: 30
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=317.891; avg=7386.982; max=19073.486)Mibits/s
rngtest: FIPS tests speed: (min=6.563; avg=109.376; max=114.901)Mibits/s
rngtest: Program run time: 19261018 microseconds
$ echo $?
1

Again, so far so good. Some failures are expected with random input of this size. 82 failures versus 107292 successes is right on par with the tests. Now the Dieharder battery of tests:

$ dieharder -a < entropy.kidekin
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
        mt19937|  8.99e+07  | 722892634|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.87388974|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.25081726|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.80329585|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.87234234|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.27873738|  PASSED  
        diehard_opso|   0|   2097152|     100|0.05958924|  PASSED  
        diehard_oqso|   0|   2097152|     100|0.10540020|  PASSED  
         diehard_dna|   0|   2097152|     100|0.30006047|  PASSED  
diehard_count_1s_str|   0|    256000|     100|0.43809130|  PASSED  
diehard_count_1s_byt|   0|    256000|     100|0.29758303|  PASSED  
 diehard_parking_lot|   0|     12000|     100|0.78081639|  PASSED  
    diehard_2dsphere|   2|      8000|     100|0.58294587|  PASSED  
    diehard_3dsphere|   3|      4000|     100|0.04012616|  PASSED  
     diehard_squeeze|   0|    100000|     100|0.97651988|  PASSED  
        diehard_sums|   0|       100|     100|0.01875349|  PASSED  
        diehard_runs|   0|    100000|     100|0.17566659|  PASSED  
        diehard_runs|   0|    100000|     100|0.78887310|  PASSED  
       diehard_craps|   0|    200000|     100|0.16369886|  PASSED  
       diehard_craps|   0|    200000|     100|0.42148915|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.27534860|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.45190499|  PASSED  
         sts_monobit|   1|    100000|     100|0.88204376|  PASSED  
            sts_runs|   2|    100000|     100|0.15277754|  PASSED  
          sts_serial|   1|    100000|     100|0.71489026|  PASSED  
          sts_serial|   2|    100000|     100|0.85005457|  PASSED  
          sts_serial|   3|    100000|     100|0.77631916|  PASSED  
          sts_serial|   3|    100000|     100|0.81111751|  PASSED  
          sts_serial|   4|    100000|     100|0.72512842|  PASSED  
          sts_serial|   4|    100000|     100|0.68758000|  PASSED  
          sts_serial|   5|    100000|     100|0.69083583|  PASSED  
          sts_serial|   5|    100000|     100|0.09706031|  PASSED  
          sts_serial|   6|    100000|     100|0.52758972|  PASSED  
          sts_serial|   6|    100000|     100|0.27970465|  PASSED  
          sts_serial|   7|    100000|     100|0.07925569|  PASSED  
          sts_serial|   7|    100000|     100|0.25874891|  PASSED  
          sts_serial|   8|    100000|     100|0.33647659|  PASSED  
          sts_serial|   8|    100000|     100|0.80952471|  PASSED  
          sts_serial|   9|    100000|     100|0.99948911|   WEAK   
          sts_serial|   9|    100000|     100|0.32461849|  PASSED  
          sts_serial|  10|    100000|     100|0.69360795|  PASSED  
          sts_serial|  10|    100000|     100|0.96022345|  PASSED  
          sts_serial|  11|    100000|     100|0.91349333|  PASSED  
          sts_serial|  11|    100000|     100|0.95918606|  PASSED  
          sts_serial|  12|    100000|     100|0.69821905|  PASSED  
          sts_serial|  12|    100000|     100|0.57652285|  PASSED  
          sts_serial|  13|    100000|     100|0.28393582|  PASSED  
          sts_serial|  13|    100000|     100|0.45849491|  PASSED  
          sts_serial|  14|    100000|     100|0.30832853|  PASSED  
          sts_serial|  14|    100000|     100|0.89099315|  PASSED  
          sts_serial|  15|    100000|     100|0.87022105|  PASSED  
          sts_serial|  15|    100000|     100|0.06938123|  PASSED  
          sts_serial|  16|    100000|     100|0.79568629|  PASSED  
          sts_serial|  16|    100000|     100|0.53218489|  PASSED  
         rgb_bitdist|   1|    100000|     100|0.38552808|  PASSED  
         rgb_bitdist|   2|    100000|     100|0.79403454|  PASSED  
         rgb_bitdist|   3|    100000|     100|0.66811643|  PASSED  
         rgb_bitdist|   4|    100000|     100|0.84954470|  PASSED  
         rgb_bitdist|   5|    100000|     100|0.90198903|  PASSED  
         rgb_bitdist|   6|    100000|     100|0.98808244|  PASSED  
         rgb_bitdist|   7|    100000|     100|0.25730860|  PASSED  
         rgb_bitdist|   8|    100000|     100|0.43237015|  PASSED  
         rgb_bitdist|   9|    100000|     100|0.90916135|  PASSED  
         rgb_bitdist|  10|    100000|     100|0.81131338|  PASSED  
         rgb_bitdist|  11|    100000|     100|0.31361128|  PASSED  
         rgb_bitdist|  12|    100000|     100|0.40786889|  PASSED  
rgb_minimum_distance|   2|     10000|    1000|0.03358258|  PASSED  
rgb_minimum_distance|   3|     10000|    1000|0.99298827|  PASSED  
rgb_minimum_distance|   4|     10000|    1000|0.47721533|  PASSED  
rgb_minimum_distance|   5|     10000|    1000|0.86641982|  PASSED  
    rgb_permutations|   2|    100000|     100|0.10084049|  PASSED  
    rgb_permutations|   3|    100000|     100|0.99560585|   WEAK   
    rgb_permutations|   4|    100000|     100|0.42217190|  PASSED  
    rgb_permutations|   5|    100000|     100|0.95466090|  PASSED  
      rgb_lagged_sum|   0|   1000000|     100|0.64120688|  PASSED  
      rgb_lagged_sum|   1|   1000000|     100|0.22106106|  PASSED  
      rgb_lagged_sum|   2|   1000000|     100|0.41244281|  PASSED  
      rgb_lagged_sum|   3|   1000000|     100|0.98880097|  PASSED  
      rgb_lagged_sum|   4|   1000000|     100|0.78380177|  PASSED  
      rgb_lagged_sum|   5|   1000000|     100|0.25533777|  PASSED  
      rgb_lagged_sum|   6|   1000000|     100|0.78150371|  PASSED  
      rgb_lagged_sum|   7|   1000000|     100|0.53903267|  PASSED  
      rgb_lagged_sum|   8|   1000000|     100|0.04436257|  PASSED  
      rgb_lagged_sum|   9|   1000000|     100|0.77174302|  PASSED  
      rgb_lagged_sum|  10|   1000000|     100|0.54862612|  PASSED  
      rgb_lagged_sum|  11|   1000000|     100|0.48691334|  PASSED  
      rgb_lagged_sum|  12|   1000000|     100|0.06308057|  PASSED  
      rgb_lagged_sum|  13|   1000000|     100|0.42530804|  PASSED  
      rgb_lagged_sum|  14|   1000000|     100|0.86907366|  PASSED  
      rgb_lagged_sum|  15|   1000000|     100|0.66262930|  PASSED  
      rgb_lagged_sum|  16|   1000000|     100|0.85485044|  PASSED  
      rgb_lagged_sum|  17|   1000000|     100|0.39817394|  PASSED  
      rgb_lagged_sum|  18|   1000000|     100|0.90608610|  PASSED  
      rgb_lagged_sum|  19|   1000000|     100|0.94996515|  PASSED  
      rgb_lagged_sum|  20|   1000000|     100|0.78715690|  PASSED  
      rgb_lagged_sum|  21|   1000000|     100|0.93364519|  PASSED  
      rgb_lagged_sum|  22|   1000000|     100|0.84438533|  PASSED  
      rgb_lagged_sum|  23|   1000000|     100|0.77439531|  PASSED  
      rgb_lagged_sum|  24|   1000000|     100|0.12530311|  PASSED  
      rgb_lagged_sum|  25|   1000000|     100|0.79035917|  PASSED  
      rgb_lagged_sum|  26|   1000000|     100|0.93286961|  PASSED  
      rgb_lagged_sum|  27|   1000000|     100|0.32567247|  PASSED  
      rgb_lagged_sum|  28|   1000000|     100|0.39563718|  PASSED  
      rgb_lagged_sum|  29|   1000000|     100|0.15628693|  PASSED  
      rgb_lagged_sum|  30|   1000000|     100|0.69368810|  PASSED  
      rgb_lagged_sum|  31|   1000000|     100|0.00197963|   WEAK   
      rgb_lagged_sum|  32|   1000000|     100|0.23325783|  PASSED  
     rgb_kstest_test|   0|     10000|    1000|0.18940877|  PASSED  
     dab_bytedistrib|   0|  51200000|       1|0.57007834|  PASSED  
             dab_dct| 256|     50000|       1|0.76567665|  PASSED  
Preparing to run test 207.  ntuple = 0
        dab_filltree|  32|  15000000|       1|0.60537852|  PASSED  
        dab_filltree|  32|  15000000|       1|0.78894908|  PASSED  
Preparing to run test 208.  ntuple = 0
       dab_filltree2|   0|   5000000|       1|0.11775507|  PASSED  
       dab_filltree2|   1|   5000000|       1|0.34799105|  PASSED  
Preparing to run test 209.  ntuple = 0
        dab_monobit2|  12|  65000000|       1|0.69182598|  PASSED  

Finally, a visual check on the data, even though it's safe to assume that it's "true random" given the previous testing:

$ dd if=white.bmp of=entropy.kidekin bs=1 count=54 conv=notrunc
54+0 records in
54+0 records out
54 bytes (54 B) copied, 0.000547208 s, 98.7 kB/s
$ gimp entropy.kidekin # convert to grayscale, export as "entropy.png"
$ optipng entropy.png
** Processing: entropy.png
512x512 pixels, 8 bits/pixel, grayscale
Input IDAT size = 250107 bytes
Input file size = 250564 bytes

Trying:
  zc = 9  zm = 8  zs = 0  f = 0		IDAT size = 215319
  zc = 9  zm = 8  zs = 1  f = 0		IDAT size = 214467
  zc = 1  zm = 8  zs = 2  f = 0		IDAT size = 214467
  zc = 9  zm = 8  zs = 3  f = 0		IDAT size = 214467
                               
Selecting parameters:
  zc = 1  zm = 8  zs = 2  f = 0		IDAT size = 214467

Output IDAT size = 214467 bytes (35640 bytes decrease)
Output file size = 214564 bytes (36000 bytes = 14.37% decrease)

And the result is:

RNG visual output of the Kidekin TRNG

My conclusion of the Kidekin TRNG is positive. I love the throughput of the device, loved the price, and aside from the UDEV rule, it is plug-and-play. Unfortunately, the TRNG is a bit on the big side for a physical device, and because it doesn't come with a security proof, and the hardware design is closed, I would be skeptical to trust it for your random numbers directly. Instead, I would recommend adding it the Linux kernel's CSPRNG, and rely on /dev/urandom instead. This is trivial with rngd(8). But, overall, I am very pleased with the device, and which I had actually purchased a second one.

Additional Testing Of The rtl-sdr Dongle As A HWRNG

A couple days ago, I put up a post about using the Realtek SDR dongles as a hardware true random number generator. I only tested the randomness of a 512 MB file. I thought this time, I would but a bit more stock into it. In this case, I let it run for a while, until it was 1.8 GB in size. Interestingly enough, it stopped getting bigger after that point. Not sure why. However, I ran more tests on that 1.8 GB file. Creating this file took its time:

$ tail -f /run/rtl_entropy.fifo | dd of=random.img iflag=fullblock
3554130+0 records in
3554130+0 records out
1819714560 bytes (1.8 GB) copied, 3897.22 s, 467 kB/s

This filled up a bit faster than I had previously tested, going at a clip of about 3.826 Mbps.

Now it was time for the testing:

$ ent random.img
Entropy = 8.000000 bits per byte.

Optimum compression would reduce the size
of this 1819714560 byte file by 0 percent.

Chi square distribution for 1819714560 samples is 246.86, and randomly
would exceed this value 63.11 percent of the times.

Arithmetic mean value of data bytes is 127.4990 (127.5 = random).
Monte Carlo value for Pi is 3.141611317 (error 0.00 percent).
Serial correlation coefficient is 0.000013 (totally uncorrelated = 0.0).

It passes with flying colors on entropy estimation, compression, chi-square distributions, arithmetic mean, the Monte Carlo estimation for Pi, and serial correlation. Testing further, I ran it through the FIPS 140-2 tests:

$ rngtest < random.img
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 14557716480
rngtest: FIPS 140-2 successes: 727288
rngtest: FIPS 140-2 failures: 597
rngtest: FIPS 140-2(2001-10-10) Monobit: 99
rngtest: FIPS 140-2(2001-10-10) Poker: 57
rngtest: FIPS 140-2(2001-10-10) Runs: 210
rngtest: FIPS 140-2(2001-10-10) Long run: 233
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=114.212; avg=6626.942; max=9536.743)Mibits/s
rngtest: FIPS tests speed: (min=61.133; avg=147.877; max=151.377)Mibits/s
rngtest: Program run time: 96034230 microseconds
You have new mail.
$ echo $?
1

Finally, the beast of beasts, I ran it through every Dieharder test. This took some time to complete. Here is a listing of the tests that it went through:

$ dieharder -l
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
Installed dieharder tests:
 Test Number	                     Test Name	              Test Reliability
===============================================================================
  -d 0  	                  Diehard Birthdays Test	      Good
  -d 1  	                     Diehard OPERM5 Test	      Good
  -d 2  	          Diehard 32x32 Binary Rank Test	      Good
  -d 3  	            Diehard 6x8 Binary Rank Test	      Good
  -d 4  	                  Diehard Bitstream Test	      Good
  -d 5  	                            Diehard OPSO	   Suspect
  -d 6  	                       Diehard OQSO Test	   Suspect
  -d 7  	                        Diehard DNA Test	   Suspect
  -d 8  	      Diehard Count the 1s (stream) Test	      Good
  -d 9  	        Diehard Count the 1s Test (byte)	      Good
  -d 10  	                Diehard Parking Lot Test	      Good
  -d 11  	Diehard Minimum Distance (2d Circle) Test	      Good
  -d 12  	Diehard 3d Sphere (Minimum Distance) Test	      Good
  -d 13  	                    Diehard Squeeze Test	      Good
  -d 14  	                       Diehard Sums Test	Do Not Use
  -d 15  	                       Diehard Runs Test	      Good
  -d 16  	                      Diehard Craps Test	      Good
  -d 17  	            Marsaglia and Tsang GCD Test	      Good
  -d 100  	                        STS Monobit Test	      Good
  -d 101  	                           STS Runs Test	      Good
  -d 102  	           STS Serial Test (Generalized)	      Good
  -d 200  	               RGB Bit Distribution Test	      Good
  -d 201  	   RGB Generalized Minimum Distance Test	      Good
  -d 202  	                   RGB Permutations Test	      Good
  -d 203  	                     RGB Lagged Sum Test	      Good
  -d 204  	        RGB Kolmogorov-Smirnov Test Test	      Good
  -d 205  	                       Byte Distribution	      Good
  -d 206  	                                 DAB DCT	      Good
  -d 207  	                      DAB Fill Tree Test	      Good
  -d 208  	                    DAB Fill Tree 2 Test	      Good
  -d 209  	                      DAB Monobit 2 Test	      Good

So here are the results:

 $ dieharder -a < random.img
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
        mt19937|  1.25e+08  | 169223456|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.91937112|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.77213572|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.04709503|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.93031877|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.12183977|  PASSED  
        diehard_opso|   0|   2097152|     100|0.96023625|  PASSED  
        diehard_oqso|   0|   2097152|     100|0.61237304|  PASSED  
         diehard_dna|   0|   2097152|     100|0.66045974|  PASSED  
diehard_count_1s_str|   0|    256000|     100|0.16999968|  PASSED  
diehard_count_1s_byt|   0|    256000|     100|0.00992823|  PASSED  
 diehard_parking_lot|   0|     12000|     100|0.69592283|  PASSED  
    diehard_2dsphere|   2|      8000|     100|0.95358410|  PASSED  
    diehard_3dsphere|   3|      4000|     100|0.89028448|  PASSED  
     diehard_squeeze|   0|    100000|     100|0.81631204|  PASSED  
        diehard_sums|   0|       100|     100|0.03559934|  PASSED  
        diehard_runs|   0|    100000|     100|0.75027140|  PASSED  
        diehard_runs|   0|    100000|     100|0.43076351|  PASSED  
       diehard_craps|   0|    200000|     100|0.57749359|  PASSED  
       diehard_craps|   0|    200000|     100|0.00599436|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.60121369|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.04254338|  PASSED  
         sts_monobit|   1|    100000|     100|0.94352358|  PASSED  
            sts_runs|   2|    100000|     100|0.77549833|  PASSED  
          sts_serial|   1|    100000|     100|0.46198961|  PASSED  
          sts_serial|   2|    100000|     100|0.46002706|  PASSED  
          sts_serial|   3|    100000|     100|0.73076110|  PASSED  
          sts_serial|   3|    100000|     100|0.90967100|  PASSED  
          sts_serial|   4|    100000|     100|0.32002297|  PASSED  
          sts_serial|   4|    100000|     100|0.07478887|  PASSED  
          sts_serial|   5|    100000|     100|0.27486408|  PASSED  
          sts_serial|   5|    100000|     100|0.57409336|  PASSED  
          sts_serial|   6|    100000|     100|0.05095556|  PASSED  
          sts_serial|   6|    100000|     100|0.06341272|  PASSED  
          sts_serial|   7|    100000|     100|0.00941089|  PASSED  
          sts_serial|   7|    100000|     100|0.53679805|  PASSED  
          sts_serial|   8|    100000|     100|0.00122125|   WEAK   
          sts_serial|   8|    100000|     100|0.16239101|  PASSED  
          sts_serial|   9|    100000|     100|0.24007712|  PASSED  
          sts_serial|   9|    100000|     100|0.02659941|  PASSED  
          sts_serial|  10|    100000|     100|0.64616186|  PASSED  
          sts_serial|  10|    100000|     100|0.78783799|  PASSED  
          sts_serial|  11|    100000|     100|0.77618602|  PASSED  
          sts_serial|  11|    100000|     100|0.33875893|  PASSED  
          sts_serial|  12|    100000|     100|0.50423715|  PASSED  
          sts_serial|  12|    100000|     100|0.77528158|  PASSED  
          sts_serial|  13|    100000|     100|0.57625144|  PASSED  
          sts_serial|  13|    100000|     100|0.73422196|  PASSED  
          sts_serial|  14|    100000|     100|0.40891605|  PASSED  
          sts_serial|  14|    100000|     100|0.48542772|  PASSED  
          sts_serial|  15|    100000|     100|0.67319390|  PASSED  
          sts_serial|  15|    100000|     100|0.74730027|  PASSED  
          sts_serial|  16|    100000|     100|0.67519158|  PASSED  
          sts_serial|  16|    100000|     100|0.73171087|  PASSED  
         rgb_bitdist|   1|    100000|     100|0.87216594|  PASSED  
         rgb_bitdist|   2|    100000|     100|0.18831902|  PASSED  
         rgb_bitdist|   3|    100000|     100|0.16757216|  PASSED  
         rgb_bitdist|   4|    100000|     100|0.05327115|  PASSED  
         rgb_bitdist|   5|    100000|     100|0.75278396|  PASSED  
         rgb_bitdist|   6|    100000|     100|0.64749144|  PASSED  
         rgb_bitdist|   7|    100000|     100|0.20311557|  PASSED  
         rgb_bitdist|   8|    100000|     100|0.39994123|  PASSED  
         rgb_bitdist|   9|    100000|     100|0.52805289|  PASSED  
         rgb_bitdist|  10|    100000|     100|0.96091722|  PASSED  
         rgb_bitdist|  11|    100000|     100|0.97794399|  PASSED  
         rgb_bitdist|  12|    100000|     100|0.75009561|  PASSED  
rgb_minimum_distance|   2|     10000|    1000|0.58923867|  PASSED  
rgb_minimum_distance|   3|     10000|    1000|0.54294743|  PASSED  
rgb_minimum_distance|   4|     10000|    1000|0.59446131|  PASSED  
rgb_minimum_distance|   5|     10000|    1000|0.00047025|   WEAK   
    rgb_permutations|   2|    100000|     100|0.89040191|  PASSED  
    rgb_permutations|   3|    100000|     100|0.47917416|  PASSED  
    rgb_permutations|   4|    100000|     100|0.30964668|  PASSED  
    rgb_permutations|   5|    100000|     100|0.70217495|  PASSED  
      rgb_lagged_sum|   0|   1000000|     100|0.12796648|  PASSED  
      rgb_lagged_sum|   1|   1000000|     100|0.15077254|  PASSED  
      rgb_lagged_sum|   2|   1000000|     100|0.31141471|  PASSED  
      rgb_lagged_sum|   3|   1000000|     100|0.94974697|  PASSED  
      rgb_lagged_sum|   4|   1000000|     100|0.99256987|  PASSED  
      rgb_lagged_sum|   5|   1000000|     100|0.67854004|  PASSED  
      rgb_lagged_sum|   6|   1000000|     100|0.08600877|  PASSED  
      rgb_lagged_sum|   7|   1000000|     100|0.91633363|  PASSED  
      rgb_lagged_sum|   8|   1000000|     100|0.06794590|  PASSED  
      rgb_lagged_sum|   9|   1000000|     100|0.59024027|  PASSED  
      rgb_lagged_sum|  10|   1000000|     100|0.59285975|  PASSED  
      rgb_lagged_sum|  11|   1000000|     100|0.87178336|  PASSED  
      rgb_lagged_sum|  12|   1000000|     100|0.63401541|  PASSED  
      rgb_lagged_sum|  13|   1000000|     100|0.47202172|  PASSED  
      rgb_lagged_sum|  14|   1000000|     100|0.34616699|  PASSED  
      rgb_lagged_sum|  15|   1000000|     100|0.97221211|  PASSED  
      rgb_lagged_sum|  16|   1000000|     100|0.95576739|  PASSED  
      rgb_lagged_sum|  17|   1000000|     100|0.32367098|  PASSED  
      rgb_lagged_sum|  18|   1000000|     100|0.92792046|  PASSED  
      rgb_lagged_sum|  19|   1000000|     100|0.58128429|  PASSED  
      rgb_lagged_sum|  20|   1000000|     100|0.78197001|  PASSED  
      rgb_lagged_sum|  21|   1000000|     100|0.86068846|  PASSED  
      rgb_lagged_sum|  22|   1000000|     100|0.22496908|  PASSED  
      rgb_lagged_sum|  23|   1000000|     100|0.52387665|  PASSED  
      rgb_lagged_sum|  24|   1000000|     100|0.52748770|  PASSED  
      rgb_lagged_sum|  25|   1000000|     100|0.96442902|  PASSED  
      rgb_lagged_sum|  26|   1000000|     100|0.51298847|  PASSED  
      rgb_lagged_sum|  27|   1000000|     100|0.99123470|  PASSED  
      rgb_lagged_sum|  28|   1000000|     100|0.69774674|  PASSED  
      rgb_lagged_sum|  29|   1000000|     100|0.83646714|  PASSED  
      rgb_lagged_sum|  30|   1000000|     100|0.98573851|  PASSED  
      rgb_lagged_sum|  31|   1000000|     100|0.23580471|  PASSED  
      rgb_lagged_sum|  32|   1000000|     100|0.19150884|  PASSED  
     rgb_kstest_test|   0|     10000|    1000|0.67771558|  PASSED  
     dab_bytedistrib|   0|  51200000|       1|0.07152541|  PASSED  
             dab_dct| 256|     50000|       1|0.53841656|  PASSED  
Preparing to run test 207.  ntuple = 0
        dab_filltree|  32|  15000000|       1|0.09092747|  PASSED  
        dab_filltree|  32|  15000000|       1|0.83382174|  PASSED  
Preparing to run test 208.  ntuple = 0
       dab_filltree2|   0|   5000000|       1|0.37363586|  PASSED  
       dab_filltree2|   1|   5000000|       1|0.26890999|  PASSED  
Preparing to run test 209.  ntuple = 0
        dab_monobit2|  12|  65000000|       1|0.80810458|  PASSED  

I don't have an image to look at to visually verify that there are no obvious patterns. At 1.8 GB, I feel that it would be just a bit too unwieldy anyway. So, I'll need to trust the previous tests for randomness that the data really is random. After these 3 series of tests, I can only conclude that using a Realtek SDR as a HWRNG will generate as "true random" data as you can hope for.

Hardware RNG Through an rtl-sdr Dongle

An rtl-sdr dongle allows you to receive radio frequency signals to your computer through a software interface. You can listen to Amateur Radio, watch analog television, listen to FM radio broadcasts, and a number of other things. I have a friend to uses it to monitor power usage at his house. However, I have a different use- true random number generation.

The theory behind the RNG is by taking advantage of radio frequency noise such as atmospheric noise. which is caused by natural occurrences, such as weak galactic radiation from the center of our Milky Way Galaxy to the stronger local and remote lightning strikes. It's estimated that roughly 40 lightning strikes are hitting the Earth every second, which equates to about 3.5 million strikes per 24 hour period. Interestingly enough, this provides a great deal of entropy for a random number generator.

Check out Blitzortung. It is a community run site, where volunteers can setup lightning monitoring stations and submit data to the server. Of course, it isn't an accurate picture of the entire globe, but you can at least get some idea of the scope of lightning strikes around the continents.

Lightning Map of the United States

Unfortunately, however, the rtl-sdr dongle won't get down to the frequencies necessary for sampling atmospheric noise; about 100 KHz to 10 MHz, and above 10 GHz. However, it can sample cosmic noise, man-made (urban and suburban) noise, solar noise, thermal noise, and other terrestrial noises that are well within the tuner frequency range of the dongle.

In order to take advantage of this, you obviously need an rtl-sdr dongle. They're quite cheap, about $15 or so, and plug in via USB with an external antenna. Of course, the larger the antenna, the more terrestrial noise you'll be able to observe. With a standard telescoping antenna, I can observe about 3 Mbps of true random data.

The other piece, however, will be compiling and installing the rtl-entropy software. This will provide a FIFO file for observing the random data. Reading the random data can be done as you would read any regular file:

$ sudo rtl_entropy -b -f 74M
$ tail -f /run/rtl_entropy.fifo | dd of=/dev/null
^C8999+10 records in
9004+0 records out
4610048 bytes (4.6 MB) copied, 13.294 s, 347 kB/s

That's roughly 2.8 Mbps. Not bad for $15. Notice, that I passed the "-b" switch to detach the PID from the controlling TTY and background. Further, I am not tuning to the default frequency of 70 MHz, which is part of Band I in the North America band plan for television broadcasting. Instead, I am tuning to 74 MHz, which is in the middle of a break in the band plan, where no television broadcasting should be transmitted. Of course, you'll need to make sure you are tuning to a frequency that is less likely to encounter malicious interference. Even though the rtl_entropy daemon has built-in debiasing and FIPS randomness testing, a malicious source could interrupt with the operation of the output by transmitting on the frequency that you are listening to.

In order to guarantee that you have random data, you should send it through a battery of standardized tests for randomness. One popular test for randomness are the FIPS 140-2 tests. Suppose I create a 512 MB file from my sdr-rtl dongle, I can test it as follows:

$ rngtest < random.img
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 83886080
rngtest: FIPS 140-2 successes: 4190
rngtest: FIPS 140-2 failures: 4
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 1
rngtest: FIPS 140-2(2001-10-10) Runs: 1
rngtest: FIPS 140-2(2001-10-10) Long run: 2
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=174.986; avg=4379.165; max=4768.372)Mibits/s
rngtest: FIPS tests speed: (min=113.533; avg=147.777; max=150.185)Mibits/s
rngtest: Program run time: 560095 microseconds

It's expected to see some failures, but they should be outliers. There is also the Dieharder battery of randomness tests. This will take substantially longer to work through, but it can be done. Here are the first few lines:

$ dieharder -a < random.img 
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
        mt19937|  1.30e+08  | 334923062|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.98331589|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.12201131|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.69993313|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.55365877|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.85077208|  PASSED  
        diehard_opso|   0|   2097152|     100|0.76171650|  PASSED  

The whole dieharder results of my 512 MB random file can be found here.

Last, but not least, it helps to observe the data visually. In this image, I created a plain white file in Gimp, that was 600x600 pixels in size. I then counted the number of bytes in that file, and generated an equally sized random binary data file. Finally, I added the bitmap header to the file, converted it to a PNG file, optimized it, and uploaded it here. The steps are as follows:

$ gimp # create 600x600px plain white file and save as 16-bit "white.bmp"
$ ls -l white.bmp | awk '{print $5}'
720138
$ tail -f /run/rtl_entropy.fifo| dd of=random.img bs=1 count=720138 iflag=fullblock
720138+0 records in
720138+0 records out
720138 bytes (720 kB) copied, 24.8033 s, 29.0 kB/s
$ dd if=white.bmp of=random.img bs=1 count=54 conv=notrunc
$ gimp random.img # export to PNG file

When viewing the output, there should be no obvious patterns in the output. As an example:

Visual representation of random

For more practical use, here is a quick application for generating 80-bit entropy unambiguous passwords:

$ for i in {1..10}; do
> strings /run/rtl_entropy.fifo | grep -o '[a-hjkmnp-z2-9.]' | head -n 16 | tr -d '\n'; echo
> done
8dfn42w6dagqnt4z
2vcsqu6sew.g6pp2
kv9nstj4gq39x5f.
wmpdpy7yz75xrhkh
.4ra2b38hmbf5jw5
7ngyk3c58k3eeq7c
8e4t8ts3ykhckdst
9g6yqqce.bxrrhpb
xwnw6mtk8njv76b2
xdmd89n68f.kcthp

Obviously, the practical uses here can be for Monte Carlo simulations, game theory, gambling, cryptography, and other practical uses where high quality randomness is needed. Unfortunately, I can seem to get rngd(8) to add the /run/rtl_entropy.fifo file as a hardware device. So, I can't feed the Linux CSPRNG with with the dongle, other than "dd if=/run/rtl_entropy.fifo of=/dev/random", which doesn't increase the entropy estimate, of course.

SHA512crypt Versus Bcrypt

On the Internet, mostly in crypto circles, you'll see something like the following in a comment, forum post, on a mailing list, other otherwise:

Do not use fast hashes to store passwords on disk. Use bcrypt.

In most cases, however, the understanding of why to use bcrypt isn't entirely clear. You'll hear the standard answer "It's slow", without a real understand as to "how slow?" nor as to "why is it slow?". In this post, I want to explain why bcrypt is slow, some misconceptions about using fast hashes, and where the real strength of bcrypt lies (hint- it's not speed). Finally, I'll close with an alternative that many are starting to talk about as a possible replacement to bcrypt.

First, when people are talking about using bcrypt for password hashing, they are referring to the bcrypt cryptographic key derivation function, designed by Niels Provos and David Mazières. Bcrypt is designed to be intentionally slow and expensive. It was designed specifically with password storage in mind. The motivation is clear- if a password database of any kind is leaked to the Internet, it should be cost prohibitive for password crackers to make any sort of progress recovering the unknown passwords from the known hashes.

bcrypt algorithm
How does bcrypt work though? What is the algorithm? According to the paper, the core bcrypt function in pseudocode is as follows:

bcrypt(cost, salt, input)
    state = EksBlowfishSetup(cost, salt, input)
    ctext = "OrpheanBeholderScryDoubt" //three 64-bit blocks
    repeat (64)
        ctext = EncryptECB(state, ctext) //encrypt using standard Blowfish in ECB mode
    return Concatenate(cost, salt, ctext)

The first function, "EksBlowfishSetup(cost, salt, input)" in the algorithm is defined as follows:

EksBlowfishSetup(cost, salt, key)
    state = InitState()
    state = ExpandKey(state, salt, key)
    repeat (2^cost) // exponential cost by powers of 2
        state = ExpandKey(state, 0, key)
        state = ExpandKey(state, 0, salt)
    return state

In the "EksBlowfishSetup", you'll notice the "repeat" step uses a binary exponential parameter. As the cost is increased, the time it will take to finish the algorithm will take exponentially longer. Bcrypt was designed with this cost parameter to adjust for Moore's law. As computing strength continues to improve, bcrypt should be flexible in its design to adjust for those advancements. This is why the cost parameter is baked into bcrypt, and why people call it "slow".

Finally, you'll notice the "ExpandKey(state, salt, key)" function in the algorithm. It is defined as follows:

ExpandKey(state, salt, key)
    for(n = 1..18)
        P_n  key[32(n-1)..32n-1] XOR P_n //treat the key as cyclic
    ctext = Encrypt(salt[0..63])
    P_1 = ctext[0..31]
    P_2 = ctext[32..63]
    for(n = 2..9)
        ctext = Encrypt(ctext XOR salt[64(n-1)..64n-1]) //encrypt using the current key schedule and treat the salt as cyclic
        P_2n-1) = ctext[0..31]
        P_2n = ctext[32..63]
    for(i = 1..4)
        for(n = 0..127)
            ctext = Encrypt(ctext XOR salt[64(n-1)..64n-1]) //as above
            S_i[2n] = ctext[0..31]
            S_i[2n+1] = ctext[32..63]
    return state

Because bcrypt was based on Blowfish, the "ExpandKey(state, 0, key)" function used in the "EksBlowfishSetup" function is the same as regular Blowfish key schedule since all XORs with the all-zero salt value are ineffectual. The bcrypt "ExpandKey(state, 0, salt)" function is similar, but uses the salt as a 128-bit key.

Also, to clarify, a 128-bit salt is also baked into the algorithm, as you can see. This is to prevent the building of lookup tables for bcrypt, such as rainbow tables. Salts do not slow down crackers, and it's assumed that salts will be leaked with the database. All salts provide is the protection against using a hash lookup table to find the originating plaintext. Because salts are baked into bcrypt, bcrypt lookup tables will never exist. This forces password crackers to brute force the hash.

Understanding password security
There are a few key security elements related to passwords that you must understand. They are the following:

  1. The unpredictability measurement, aka "entropy", of the password provided by the user.
  2. The speed at which brute forcing passwords can commence.
  3. The cryptographic strength of the function itself.

I ordered these for a specific reason- the most likely "weak link" in the chain of password security is password the user provides. History of leaked password databases have shown us that. If users understood real strength behind passwords, they would understand the basic concepts of entropy, even if they weren't familiar with the term itself. If entropy levels were high in all user's passwords, no matter what, then the success of recovering passwords from hashes via brute force would be ineffective. But, 70-80%, and better, of password databases are recovered, because of this simple concept not getting applied. The speed at which password crackers brute forced their way through the hashes in the database would no longer matter, because no amount of practical computing power would be able to work fast enough within the death of the Universe, to recover the user's password.

Sadly, this just isn't the case. People suck as picking passwords.

Key stretching
So, we need to compensate for users picking bad passwords, and bcrypt makes a great leap in this regard. Because of the cost parameter which is part of the algorithm, we can adjust the cost to make password hashing intentionally slow. And, as computing power increases, the cost parameter can continue to be adjusted to compensate. This is what most people understand, when they claim that "bcrypt is slow".

The argument is that cryptographic hashes are designed to be fast, fast, fast. And they're right. Cryptographic hash functions are designed to provide data integrity regardless of the size of the input. If I have a 4.7 GB CD image, I should be able to calculate its digest in reasonable time, so when I transfer the image to another computer, I can recalculate the digest, and compare that the two digests match, in reasonable time.

This would seem like a Bad Thing for password storage, because passwords are short (much shorter than 4.7 GB at least), so password crackers would be able to guess millions or billions of passwords per second using a fast cryptographic hash. You're right, for the most part. Ars Technica ran a story on password cracking with a 25-GPU cluster. It achieves a speed of 350 billion NTLM passwords per second, which means every conceivable Windows NTLM password can be recovered in less than 6 hours using this behemoth. It can work MD5 at 180 billion per second, or 63 billion per second with SHA-1.

At these speeds, the argument against using fast cryptographic hashes to store passwords is sounding pretty strong. Except, that Ars Technica article, and most bcrypt fanboys seem to overlook one thing- key stretching. Just because cryptographic hashes are fast, doesn't mean we can't intentionally slow them down. This is key stretching.

Key stretching is the idea that you reuse internal state for calculating a new key. For cryptographic hash functions, this is "iterations" or "rotations". The idea is taking the cryptographic digest of the input, and using this digest as the input for another hashing round. In pseudocode, you could think of it like this:

salt = random() // programmatically determine a salt randomly
password = input() // get the password from the user
key = ''
cost = 5000

for ROUND in 1 to cost: do
    digest = SHA512(salt, password,  key)
    key = digest
done

If our 25-GPU cluster could work through 50 billion SHA-512 cryptographic hashes per second, by forcing 5,000 SHA-512 calculations before getting to the desired hash, our 25-GPU cluster can now only work through 10 million SHA-512 hashes per second. As the iterative count is increased, the time it takes to calculate the resulting digest increases. As such, we have created a "sha512crypt" that has a similar cost parameter as bcrypt. Now the question remains- does it hold up?

Practical examples
To see if this "key stretching" idea holds up, I wrote two Python scripts- one using SHA-512, and the other using bcrypt. In both cases, I increase the cost parameter from a reasonable starting point, and increased it well beyond a reasonable expectation.

Here is my Python code for "test-sha512.py":

1
2
3
4
5
6
7
8
9
#!/usr/bin/python
import hashlib
password = b'password'
cost = 5000
key = ''
m = hashlib.sha512()
for i in xrange(cost):
    m.update(key+password)
    key = m.digest()

And here is my Python code for "test-bcrypt.py":

1
2
3
4
5
6
#!/usr/bin/python
import bcrypt
cost = 6
password = b'password'
salt = bcrypt.hashpw(password,bcrypt.gensalt(cost))
hash = bcrypt.hashpw(password, salt)

In both cases, I incremented the cost, then timed re-running the script. Of course, Python is an interpreted language, so the absolute times would be much lower if this were implemented in C, or assembly. Further, this was done on my aging T61 laptop. Running it on my 8-core i7 workstation with triple-channel DDR3 would show improved times. It not the times that are critical. What is critical is seeing the exponential back-off as the cost is increased.

Here is a table detailing my findings. Notice that the bcrypt cost increments by a single digit. Because it's binary exponential back-off, the times increase by a power of 2 at each iteration. I also adjusted the sha512crypt cost a little to more closely match the bcrypt timings, even though it's not a strict doubling of each cost value.

bcrypt sha512crypt
cost time iterations time
6 0.032 5,000 0.036
7 0.045 10,000 0.047
8 0.064 20,000 0.064
9 0.114 40,000 0.105
10 0.209 80,000 0.191
11 0.384 160,000 0.368
12 0.745 320,000 0.676
13 1.451 640,000 1.346
14 2.899 1,280,000 2.696
15 5.807 2,560,000 5.347
16 11.497 5,500,000 11.322
17 22.948 11,000,000 22.546
18 45.839 22,000,000 45.252
19 1:31.95 44,000,000 1:30.14
20 3:07.27 88,000,000 3:07.52

In the Python bcrypt implementation, the default cost is "10". For most modern GNU/Linux operating systems, when storing the user password in /etc/shadow with sha512crypt (yes, I didn't come up with the name), the default cost is 5,000 iterations. In both these cases, the cost can be adjusted. In the case of the Python bcrypt module, it's just passing the function with a numerical argument. In the case of GNU/Linux, it's editing PAM by adding "rounds=" to a config file.

As such, sha512crypt can be just as slow as bcrypt. Remember, we are trying to adjust for increased computing power that password crackers will have access to. In both cases, bcrypt and sha512crypt address that requirement.

Bcrypt's additional strength
So, if sha512crypt can operate with a cost parameter similar to bcrypt, and can provide that exponential back-off that we are looking for to slow down password brute force searching, then what's the point of bcrypt? Are there any advantages to running it? It turns out, there is, and I suspect this is a consequence of the design, and not something that was intentionally added.

What we would like is to prevent password crackers from using non-PC hardware on attacking the password database. SHA-2 functions, such as SHA-512, work very well on GPUs. SHA-2 functions work well on specialized hardware such as ASICs and FPGAs. As such, while we could make things slow for CPU or GPU crackers, those password crackers with specialized hardware would still have an upper hand on attacking the password database. Further, by addressing GPU cracking, and making it intentionally slow there, we make like more difficult for CPUs, which means hurting the honest user when trying to login to your web application. In other words, if I adjusted sha512crypt for GPU crackers, such that only 1,000 passwords per second could be achievable on a GPU, that might be a full second, or more, for the user logging into your CPU server. This may or may not be desirable.

So, how does bcrypt attack this problem? Part of the algorithm requires a lookup table stored in RAM that is constantly modified during execution. This turns out to work out very well on a standard PC where the CPU has exclusive access to RAM. This turns out to work out fairly poorly on a GPU, where the cores share the on-board memory, and each core must compete on the bus for access to those registers. As a result, any additional cracking speed is greatly minimized on a GPU when compared to the CPU. In other words, GPU password cracking with bcrypt isn't entirely effective either.

For SHA-2 functions, like SHA-512 however, this is not the case. SHA-2 functions use only 32-bit logic and arithmetic operations, which GPUs excel at. By using a GPU over a CPU for our sha512crypt function, a password cracker can get a couple to many orders of magnitude of additional cracking power.

So, the reason to use bcrypt isn't because "it's slow". The reason to use bcrypt is because "it's ineffective on GPUs".

A better alternative- scrypt
Unfortunately for bcrypt, however, due to its low memory requirement, bcrypt can be implemented in a field programmable gate array (FPGA) or custom ASICs. Understand that bcrypt was designed in 1999, when such specialized hardware had low gate counts, and was few and far bewteen. 15 years later, times have drastically changed. Bitcoin ASICS with SHA-256 FPGAs on board are common place. Hardware AES is common in CPUs and embedded systems. The fact of the matter is, these FPGAs, with their onboard, and fast RAM are well suited to bring bcrypt password cracking well into "fast" territory.

An alternative would be a solution that not only requires address registers to be constantly modified during algorithm execution, but to also exponentially bloat the memory requirement for the increased cost. scrypt addresses this shortcoming in bcrypt. Scrypt is another password key derivation function that was initially designed in 2009 by Colin Percival as part of the Tarsnap online backup service.

Scrypt has all of the advantages that bcrypt provides- baked in salt, exponential cost parameter, and ineffectiveness on GPUs, while also adding an exponential RAM requiremnt per the cost. Because of this RAM requirement, it is no longer cost efficient to build FPGAs with the necessary RAM.

Security, standards, and implementations
Scrypt is only 5 years young. This gives bcrypt a good 10 year head start. In terms of security, this is preferred. We want cryptographic functions that have withstood the test of time with cryptographers constantly attacking and analyzing their functions, primitives, and implementations. The longer it remains "unbroken", the more "secure" we deem the functions to be.

Bcrypt continues to be attacked and analyzed, and is showing no serious sign of weakness, 15 years later. This is good for the security of bcrypt. Scrypt however has seen less scrutiny, mostly due to its young age. However, it has been 5 years, and like bcrypt, no serious signs of weakness have been shown. By comparison, the SHA-2 family of functions was created in 2001, and has been scrutinized much more than bcrypt and scrypt combined, and also is not showing any serious signs of weakness. So, from a security standpoint, the SHA-2, bcrypt, and scrypt functions all seem to be fairly secure.

When looking at governing body standards, NIST has no paper on bcrypt or scrypt. They do recommend using PBKDF2 (another key derivation function (which I haven't explained here, but love)) for password storage, and NIST has standardized on SHA-2 for data integrity. Personally, while I like the ideas of bcrypt and scrypt, I would recommend sticking with the NIST recommendations with high iteration counts, as shown above. Until we see more standards boy interest in bcrypt and scrypt, IMO, you are taking a risk using them for password storage (less so for bcrypt than scrypt at least).

Finally, because of the newness of scrypt, there are less tools for its use in programming languages than bcrypt, and even more so for SHA-2. Further, most programming languages don't include either bcrypt or scrypt in their "standard library" of modules or functions, while SHA-2 is more generally found. And for those implementations, some 3rd party libraries are more trust worthy than others. Because you're dealing with password storage, it's critical you get this right.

Conclusion
While I love the algorithms behind bcrypt and scrypt, I've always advocated for using high iterative counts on the SHA-2 or PBKDF2 functions. Even further is the advocating of teaching people how to understand the concepts behind their password entropy, and improving their own online security. That is the weakest link, and needs the most work, IMO.

So, if you ask me, I'll tell you to use either PBKDF2 or SHA-2 with high iterative counts. Or, if you must absolutely use bcrypt, then I'll recommend that you use scrypt instead.

Use /dev/random Instead Of /dev/null

While writing a shell script the other day, I was redirecting some output to /dev/null, as normal, when something dawned on me. Why don't I redirect my output to /dev/random instead? After all, both Linux random devices are writable by everyone on the system:

$ ls -l /dev/*random
crw-rw-rw- 1 root root 1, 8 Nov 13 15:14 /dev/random
crw-rw-rw- 1 root root 1, 9 Nov 13 15:14 /dev/urandom

Knowing what I know about the Linux cryptographic pseudorandom number generator (CSPRNG), I know that any bits put into the CSPRNG input pool are hashed with the SHA1 cryptographic hash function 512-bits at a time. This includes any data you redirect to it from the shell, as well as output from itself. When data is fed into the CSPRNG input pool, the RNG is reseeded.

To understand this concept of seeding an RNG, let's assume for a moment that the only source of input for the RNG is its own output. If this were the case, we would only need a starting value to "seed" the RNG, then let it run by hashing its own digests. In this scenario, each digest is chosen deterministically, and if we know the input that seeded the RNG, we can predict all of its future outputs.

Think of this scenario like a progress bar. For the SHA1 cryptographic hash, there are 2^160 possible unique digests. Theoretically, our RNG should be able to work through all 2^160 digests only once before starting over, provided there is enough time to do so (we know now this isn't the case, which means SHA1 has been weakened in terms of collision attacks). However, when you change the input by providing something other than the next digest in the queue, you change the next starting point of the RNG. It's as though you've "skipped" to a non-sequential location in your progress bar.

Now, consider constantly reseeding your RNG. This is what your Linux system is actually doing. It's constantly processing timing events from disk IO, network packets, keyboard presses, mouse movements, etc. All these inputs get collected into an "entropy pool", which is then hashed with SHA1 512-bits at a time, as we mentioned earlier. This input changes the sequential ordering of the digest outputs, making the result unpredictable, and non-deterministic.

So, when working on the shell, by redirecting your output to /dev/random, you reseed the CSPRNG, meaning you have changed the digest output ordering to something different than what it would have been had you not redirected those bits. In fact, the more you send data to the CSRNG, the more you reseed it, forever altering the path it takes on its "progress bar".

Now, you may ask why not have some userspace PID running in the background that is always reseeding the CSPRNG? Sure, you can. In this case, I would recommend running Haveged on your system. Haveged will probe much more hardware events on the system than the default install will, and keep the entropy pool topped off at full. The CSPRNG will be constantly reseeded. However, for shell scripts, redirecting to /dev/random instead of /dev/null works.

My only concern with redirecting to /dev/random would be bandwidth concerns. Doing a simple and crude benchmark comparing /dev/null to /dev/random, I get the following on my workstation:

$ for I in {1..5}; do dd if=CentOS-6.5-x86_64-bin-DVD1.iso of=/dev/random; done
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 81.3842 s, 54.9 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 76.4597 s, 58.4 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 74.6036 s, 59.9 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 75.4946 s, 59.2 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 74.375 s, 60.1 MB/s
$ for I in {1..5}; do dd if=CentOS-6.5-x86_64-bin-DVD1.iso of=/dev/null; done  
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 59.325 s, 75.3 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 56.5847 s, 79.0 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 54.4541 s, 82.1 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 56.0187 s, 79.8 MB/s
8726528+0 records in
8726528+0 records out
4467982336 bytes (4.5 GB) copied, 57.0039 s, 78.4 MB/s

Seems I get slightly better throughput with /dev/null, which isn't surprising. So, unless you know you need the throughput, I would recommend sending your data to /dev/random over /dev/null.

Playing Card Ciphers

For the past couple of weeks, I've been focused heavily on hand ciphers for field agents. Although I'm certainly no expert on cryptography, aside from the One Time Pad (OTP), I've had a hard time finding any hand cipher that would be considered secure in the computer age. It's certainly no doubt that field agents are very likely using computing with SSL and GPG, among other crypto tools, to communicate with each other. The romantic days of "Spy versus Spy" encrypting and decrypting notes by hand, doing dead drops in a tree stump in a park, and speaking 15 foreign languages fluently, are probably over. Regardless, I could not bring myself to believe that there were absolutely no secure hand ciphers to study.

I had already been familiar with Bruce Schneier's "Solitaire" card cipher, and always considered that a fairly creative use for a hand cipher. Despite its bias, it's still very secure, although slow to execute by hand, and very error prone. But this got me thinking- has anyone else attempted to create hand ciphers with playing cards, and if so, how far have they taken it?

After searching the Internet, I found five additional playing card ciphers, and implemented one mechanical cipher into a playing card cipher, Each comes with their own varying levels of security. While I was at it, I created my own playing card cipher, and I'm still currently evaluating its security. That brings to total list to eight playing card ciphers:

As you can see, I've spent a great deal of time learning each of the algorithms, and have typed them up by hand on my own personal wiki. Quadibloc is the only card cipher, at the time of this writing, that I am still learning and working on. I'm hoping this can be a centralized repository for all things playing card ciphers. My goals for this personal project are:

  • Publish software implementations of each card cipher.
  • Publish online videos giving a tutorial of each card cipher
  • Learn the strengths and weaknesses of each card cipher.

The big advantage of using playing cards, in my opinion, is the ability for the deck to maintain state while working through the algorithm. A standard deck of cards can have a maximum key space of 52! which is about 238-bits of entropy. This is likely larger than many SSL keys on the Internet protecting your bank login. So, provided the algorithm is non-linear, mixes the deck thoroughly on each round, and is not biased, it is possible that the algorithm could resist attacks from even the most well funded adversaries.

There is still work to be done, and I doubt this will be of any value to the general cryptographic community. After all, the OTP can be mathematically proven to be unbreakable, and computer algorithms are fast and error-free. So, who in their right mind would want to learn hand ciphers with playing cards, when they won't have a mathematical proof of unbreakability, they're slow, and error-prone?

Me.