Image of the glider from the Game of Life by John Conway
Skip to content

The Linux Random Number Generator

There is a lot of misinformation out there about /dev/random and /dev/urandom with regards to the Linux kernel. Unfortunately, not even the random(4) manpage seems to get it right. So, rather than argue with everyone on the Internet, I'll post the inner workings of the Linux kernel with respect to randomness here.

If you're not interested in reading the full post, then call this a good summary:

  • There are 3 separate and distinct entropy pools in the Linux kernel.
  • /dev/random and /dev/urandom both use the same CSPRNG.
  • Use /dev/urandom.

The Entropy Pools
There are three arrays which store unsigned integers called entropy pools in the Linux kernel, each which maintain their own state. They are the input pool, the blocking pool, and the non-blocking pool. The input pool maintains a maximum internal state of 4096 bits. When looking at the value of "/proc/sys/kernel/random/entropy_avail", you are looking at the current estimate of the input pool state. The blocking and non-blocking pools each keep an internal state of just 1024 bits.

Below is an image showing the architecture of the Linux kernel random number generators. Each of the three pools are green. You'll notice that the input pool is fed with data from external events, such as timing interrupt requests, disk and network I/O, as well as human input on the keyboard and from the mouse. The data is hashed with SHA1 as part of the mixing functions in the cryptographically secure pseudorandom number generator (CSPRNG).


The blocking and non-blocking pools both feed from the same CSPRNG to produce their random output. There are three interfaces which user space can access to get this data from the CSPRNG. The /dev/random character device will block when the blocking entropy pool is exhausted, whereas both the /dev/urandom character device, and the get_random_bytes() system call will not block, even when entropy has been exhausted.

What's important to understand here, is that both the blocking pool, from which /dev/random draws its data, and the non-blocking pool, from which /dev/urandom draws its data, get their data from the same CSPRNG.

Theoretic versus Computational Security
In fact, there are really just two types of security that you should be aware of: theoretic security and computational, or practical security.

Theoretic Security: This is security which can be proven through a logical mathematical proof. Two types of proofs come to mind (there are probably others): The One Time Pad, and Shamir's Secret Sharing. Both of these proposals can be implemented in software algorithms, even though there may be some difficulty in the implementation.

Computational Security: This is security where algorithms are build from proofs that demonstrate a weakness in classical computing, such as the discrete logarithm problem and the quadratic residuosity problem. It is known that it takes considerable energy, time, and effort to work through these problems when certain criteria is met, such as factoring out primes from a quadratic with hundreds or thousands of digits in the number. Because it's computationally difficult to solve these problems, we consider algorithms that are built from them, computationally secure.

Probably every cryptographic algorithm that you are aware of, such as AES, RSA, SHA, and so forth, are computationally secure, not theoretically secure. This means, that an adversary with unlimited time, resources, and processing power could break computationally secure algorithms. This is why bit sizes are ever increasing- as processing power and computing "tricks" improve, the size of the problem needs to increase.

Random Numbers
There are basically two ways which you can generate random numbers- in hardware or in software (if you are getting random numbers from a Geiger Counter, even though radioactive decay is found in nature, you are using a hardware device to feed it into the computer). In both cases, however, the sequences of numbers could be "true random" or "pseudorandom".

First, I'm not going to jump into the long argument that is "true randomness". This quickly turns into a philosophical debate. Some will say that "true random" only exists in quantum nature, while others will explain that even quantum mechanics lives by mathematical equations and predictable output that we just haven't discovered yet. For me, I'm content with calling the output of some quantum events "true random". While all mathematical algorithms in software can only produce pseudorandom numbers, some mathematical algorithms can have such a large cycle, and their algorithm computationally secure, that their output is indistinguishable from true random.

Further, even though you have a "true random" number from nature, how do you say a number by itself is random? It's usually more practical to call a sequence of numbers unpredictable, rather than "random". For cryptography, that's really the only thing that matters. Consider the following sequences of text- one was encrypted with AES, one is "true random" data fed from a hardware random number generator, and the other sequence was pseudorandomly generated:


Looking at those three sequences, one has a very rigid structure. It is legitimate encrypted data with AES, while the other two strings are 16 bytes of data read from either /dev/urandom or /dev/hwrng. While examining the sequences of bits, is it computationally feasible to determine the underlying structure that was used to build the encrypted ciphertext, and get back to the original plaintext, without the key? From a theoretic perspective, the sequence of encrypted bits should appear indistinguishable from a truly unpredictable sequence of bits, such as those grabbed from the hardware random number generator.

I'll give you the solution:

$ echo "Random numberS" | aespipe | xxd -ps
Password: 12345678901234567890

However, suppose you have "true random" numbers; maybe you got them from radioactive decay, diode breakdown, or atmospheric noise. Regardless, you have a sequence of bits that are as "true random" as you could get. So, now what are you going to do with them? If you're going to study some informational theoretic security algorithm, you might be doing the world some good. However, if you're going to subject these numbers to a computationally secure, man-made, software algorithm, running on a deterministic computer, what was gained by having those "true random" numbers? The weakness will then lie in the software algorithm, the unpredictable internal state of the computer, and human factors; not necessarily the numbers from which it was drawn. Subjecting "true random" numbers to a theoretic algorithm is one thing; subjecting them to a man-made computer software algorithm that is at best computationally secure, is another. In all honesty, you are not much better off with a CSPRNG giving you the numbers than the you are the Universe giving them to you, if all you need to do is generate a long-term GPG or SSL key.

Don't get me wrong. I'm not suggesting that having "true random" numbers is not a good thing. It is. Injecting "true random" numbers into the input entropy pool of the kernel reseeds the CSPRNG, putting it in a "true random" state, from which it can build your pseudorandom numbers. Having "true random" numbers is a good thing, and increasing the input pool entropy is a great thing. In fact, due to the nature of "reseeding" the CSPRNG, the Linux random number generator "self heals". In other words, if the internal state of the system was known at a fixed point in time, reseeding the CSPRNG with entropy feed by "true random" numbers, changes the internal state of the random number generator. This means that although all future numbers could be determined, because the state of the system was compromised, future output has been changed, because the internal state of the system has changed.

A self healing CSPRNG is a great thing to have. And guess what? If you're constantly feeding the input pool with "true random" noise, then the output of /dev/urandom will be completely indistinguishable from "true random" data sequences, due to this reseeding.

/dev/random versus /dev/urandom
And now we get to the meat of the issue- which output interface to use for "the most demanding cryptographic computational long-term security"? First thing is first- both /dev/random and /dev/urandom are fed data from the same CSPRNG. Let me repeat that, with a bit more emphasis: Both /dev/random and /dev/urandom are fed data from the same CSPRNG. In fact, most who understand this, will explain that you should use /dev/urandom, and NOT /dev/random. The problem is, that /dev/random has the horrible bug of blocking when the input entropy pool is exhausted. So, if your application relies on /dev/random to get random numbers, there are pauses and delays until there is enough entropy in the input pool to reseed the CSPRNG, so you can get more data.

The point? Stop using /dev/random and start using /dev/urandom. To back up my claim, here are some people who back me up:

Here, cryptographer Dr. Daniel Bernstein puts me, personally, in my place on a mailing list (I've learned the error of that line of logic since then):

Cryptographers are certainly not responsible for this superstitious
nonsense. Think about this for a moment: whoever wrote the /dev/random
manual page seems to simultaneously believe that

(1) we can't figure out how to deterministically expand one 256-bit
/dev/random output into an endless stream of unpredictable keys
(this is what we need from urandom), but

(2) we _can_ figure out how to use a single key to safely encrypt
many messages (this is what we need from SSL, PGP, etc.).

For a cryptographer this doesn't even pass the laugh test.

I'm not saying that /dev/urandom has a perfect API. It's disappointingly
common for vendors to deploy devices where the randomness pool has never
been initialized; BSD /dev/urandom catches this configuration bug by
blocking, but Linux /dev/urandom (unlike Linux /dev/random) spews
predictable data, causing (e.g.) the widespread RSA security failures
documented on But fixing this configuration bug
has nothing to do with the /dev/random superstitions.

Dr. Bernstein continues in a paper about the problems of having too much entropy, and why that could be a bad thing. However, he repeats the same thing as above: use /dev/urandom. Dr. Bernstein also created a cryptographic library called "NaCL", which relies on /dev/urandom for its randomness.

Cryptographer Dr. Thomas Pornin also explains the nature of /dev/urandom:

/dev/urandom yields data which is indistinguishable from true randomness, given existing technology. Getting "better" randomness than what /dev/urandom provides is meaningless, unless you are using one of the few "information theoretic" cryptographic algorithm, which is not your case (you would know it).

In an additional post, he mentions:

There are many applications which read /dev/random as a kind of ritual, as if it was "better" than /dev/urandom, probably on a karmic level. This is plain wrong, especially when the alea is to be used with classical cryptographic algorithms (e.g. to generate a SSH server public key). Do not do that. Instead, use /dev/urandom and you will live longer and happier. Even for one-time pad.

Even Bram Cohen agrees.

In terms of software packages, the Go language uses /dev/urandom as its source for a CSPRNG, where available. OpenSSL uses /dev/urandom when available. OpenSSH will use OpenSSL's random number source, if compiled with OpenSSL support, which uses /dev/urandom. OpenVPN will also use OpenSSL's random number source, unless compiled with PolarSSL. Guess what? /dev/urandom.

The only software package I can think of that does not use /dev/urandom, but instead deliberately uses /dev/random, is when generating keys with GnuPG. However, this is only to guarantee that there has been sufficient entropy in the system to guarantee "randomness". When the input entropy pool is exhausted, /dev/urandom will happily give away a non-blocking stream of random data, where /dev/random will block. Adding non-malicious entropy to your system is not a bad thing. As we mentioned earlier, reseeding the CSPRNG with entropy from random sources introduces a "self-healing" RNG. Thus, GnuPG ensures that there is enough entropy to reseed the CSPRNG, such that the state of your system has "healed", and is in a completely different state than before you started. However, if you just linked /dev/random to /dev/urandom, and generated your GPG key, it would be no less secure, provided that the system you generated the key on is not compromised in any fashion.

The point? Use /dev/urandom.

Cryptographically Secure Pseudorandom Number Generators
I know what you're thinking- a pseudorandom number generator could not possibly be as "good" as a true random number generator. In many cases, this is true. There are pseudorandom number generators which are not cryptographically (which means "computationally") secure. Mersenne Twister comes to mind. It's "fast enough" and will provide pseudorandom numbers as long as you wish. It's the pseudorandom number generator for a great deal of programming languages, such as Python, and rightfully so. It's a good choice. But, it's not computationally strong. After observing a the sequence for a space of time, enough information can be gathered that will allow the attacker to predict the remaining sequence.

So, the question remains- can you have a computationally strong pseudorandom number generator? The answer is a big resounding YES! Wikipedia to the rescue:

  • The Yarrow algorithm (used in FreeBSD and Mac OS X for both /dev/random and /dev/urandom).
  • The Fortuna algorithm.
  • CryptGenRandom as used in Microsoft Windows.
  • ISAAC which is based on the RC4 cipher.
  • arc4random used in OpenBSD until 5.5, which is also based on the RC4 cipher.
  • AES-CTR DRBG (Deterministic Random Bit Generator) is often used as a random number generator. AES in CTR mode involves incrementing a counter with a random key, and using the encrypted output as the sequence of random numbers. Provided the initial counter and key are kept secret, this is cryptographically secure. In reality, any block cipher in CTR mode will work here.
  • The ANSI X9.17 standard, also adopted by FIPS, explains the use of Triple Data Encryption Algorithm (TDEA) keying option #2 as a CSPRNG.

There are a couple of theoretic designs which are computationally strong, but due to the nature of the algorithm, slow and impractical:

  • Blum Blum Shub relies on the hard quadratic residuosity problem to get its security.
  • Blum Micali relies on the hard discrete logarithm problem to get its security.
  • Dual_EC_DRBG, which has later been revealed to contain a backdoor by the NSA, is based on the decisional Diffie–Hellman assumption.

The CSPRNG in the Linux kernel uses SHA1 on the input pool for handing out our random numbers. The result is also put back into the input entropy pool for further evaluation. Even though theoretical attacks have been made on SHA1, the design is flexible enough that a different cryptographically secure hashing algorithm could be put in its place, such as SHA2, SHA3, WHIRLPOOL, etc.

Non-blocking /dev/random
If you absolutely must have a /dev/random device that doesn't block, either because some software applications are relying on it, such as GnuPG, or because you are working on some informational theroetic algorithm, then there are some options available to you.

The first is the HAVEGE software daemon. This queries a lot more timing events on your motherboard than the standard kernel, and can keep your input entropy pool at full. In my testing, this can produce around 1 MBps of pseudorandom data out of /dev/random. The only drawback, is that it hits the CPU fairly frequently, and can cause the load on the system to jump. However, I have found in practice that if you set the watermark to 512, which is typically considered enough bits of entropy for even the most stringent cryptographic keys, you'll never notice it running.

Other options include purchasing USB devices that use avalanche noise in electronic circuits. Using this USB device, along with rngd(8), can keep /dev/random from blocking, provided that the USB device remains operational. One drawback with these devices, is the capability that the circuit breaks in some fashion, producing biased and highly predictable data. Some implementations, such as the Entropy Key (which is no longer in production) solve this by doing internal bias testing and whitening, at the expense of throughput. These USB devices can be purchased for around $50, and could give you up to 40 KBps of random data per device.

Lastly, some chip makers are introducing a HWRNG (hardware random number generator) on the chip, such as Intel and Broadcom. All it takes to use these generators is by loading a kernel module. Typically, a /dev/hwrng device will then show up, which you again can use rngd(8) to feed its output into the input entropy pool for /dev/random. Unfortunately, it is unknown if the NSA has compromised the integrity of these hardware generators, if the specifications are not available, or an audit has not been performed. Just passing the randomness tests, such as DIEHARD and FIPS are not good enough to discover whether or not the random data is predictable by an attacker. However, throughput can be upwards of 500 MBps for these generators. If you need that sort of throughput, these are the generators for you.

Use /dev/urandom.

Tor Versus Road Warrior

Lately, I have been doing some research regarding Tor, and the technology behind it. Further, I wanted to compare it to other products such as Freenet and I2P. In the process, I stumbled upon this post regarding comparing Tor to a proprietary product called "Road Warrior" from a company called "Cryptohippie". Initially, I tried commenting on the post, but it seems that the WordPress installation at the Daily Anarchist is broken, as every attempt to supply a comment led to a 404 error page. So, instead, I'll post the comment here:

Interesting read. I know the post is more than 3 years old, however I stumbled upon it looking up some alternatives to Tor. Not because I think Tor is bad, but because I'm interested in what the world is doing with the Edward Snowden revelations, and what's out there.

You make some interesting points, mostly with are standing on shaky ground.

1. "Tor is slow". Well, yes and no. Sure, going through 3 random hops to get to your destination is going to add latencies to your round trip connection. That's somewhat expected though. However, with that said, I usually don't have any issue watching YouTube videos, downloading updates from my operating system vendor, or downloading PDFs and other multimedia from the Internet. There have also been updates since this post was published, where the algorithms Tor use have been greatly optimized, lowering latencies, and increasing bandwidth. Personally, I don't generally see any undue latency burdens compared to standard clear net connections. For me, the bandwidth and performance is acceptible.

2. "When someone owns something and generates income from it, they almost always take care of it, and usually work hard to improve it". Sure, that is, when it's financially convenient, which isn't always the case. Many times, companies shift focus, lose investments, or just plain go under, and the product goes with it. We could list example, after example, where commercial products failed, where Free and Open Source products not only survived, but thrived. Especially when it comes to crypto, proprietary and commercial products usually don't fare well. OpenVPN rules the VPN world. OpenSSH rules the SSH world. OpenSSL rules the secure web and email worlds. The standard crypto libraries are all Free and Open Source Software, not proprietary. Even the standard crypto protocols, which Road Warrior relies on, are open not encumbered by patents, such as AES, SHA1/SHA2, etc. Because this is a criticism of "free" software, however, you are aware that there are financial contributions by the EFF to develop the Tor product, just like there are financial contributions from IBM, Oracle, Red Hat, and even Microsoft, to develop the Linux kernel- another "free" software.

3. "Tor may include malicious nodes". Sure. This will be a problem when you use ANY service, and is not unique to Tor. When using any Internet service, there is a level of trust that must be maintained between the provider, and the user. As an example, HTTPS connections could be decrypted on the server, logged, and shared. However, with the 3-random relay node selection, it is actually pretty difficult for a node operator to compromise an entire Tor client connection. In fact, that's kind of the point with the "onion" layer encryption. Really, the only thing that is practical, is controlling both the entry and exit nodes, at which connection profiles could be built based on traffic patterns, IP addresses, times of connections, and sites visited. This is possible by controlling a large number of the relay nodes in existence. As such, entry guards were introduces to minimize this attack. Even DNS, if not properly tunneled through Tor, could give away details about what your doing with your Tor connection. Regardless, no service is 100% secure, and Tor is no exception.

4. "Tor is only for web browsing". Garbage. This couldn't be further from the truth. I use Tor for my email IMAPS/SMTPS connections, for IRC over SSL, XMPP, DNS, SSH, FTP, and so much more. If your application supports using a SOCKS proxy, or if you can setup a transparent proxy for all connections to go through, it just works. Tor is completely agnostic with regards to the protocol. I use email, IRC, XMPP, and web over Tor, all the time, with very little problems.

5. "Tor requires all the software on your computer that accesses the internet to be cooperative. Many programs, however, (whether created by shady marketers, governments, crooks, or just poorly written) are not cooperative, but bypass Tor and give away your network identity". Sure. Many software programs will not allow the ability to change network connections and take advantage of a SOCKS proxy, like your browser or chat clients. However, it's not difficult to build a transparent proxy, to force all software programs over Tor. This can be done at the router, a separate server, or on the local client.

6. "For most people, Tor is to hard to use regularly. This makes security errors and leaks much more likely." This hits the nail squarely on the head, but it's not Tor's fault, and the same thing could be said for Road Warrior. Crypto is just hard. There is a reason for the mantra "don't roll your own crypto". There is a reason many cryptographers have doctorate degrees in mathematics and/or computer science. Crytpo is just exceptionally difficult to grasp. Very little about it is actually intuitive. This is why PGP is not more widely implemented. This is why people don't run encrypted filesystems on their computer. This is why people share their entire lives on Facebook and Instagram. People want things to Just Work, with very little to no work on their part. Unless Tor is 100% transparent, the only people deploying Tor will be nerds and system administrators. Same will be said for PGP, I2P, Freenet, Bitmessage, etc., and even your own product, Road Warrior. Unless cryptography and perfect forward secrecy, among other technologies, are fully 100% baked into the product, people won't deploy it. It has to be just part of the normal way things are done, transparently, such as HTTPS, DNSsec, IPsec, etc. The only thing expected out of people, is the ability to use a browser or an email client. IE- a dumbed down client application. If you expect them to hook in an additional utility, it probably won't happen.

Now, some of my own personal criticisms of Road Warrior.

1. First and foremost, Road Warrior is closed source, proprietary software. This isn't looked upon favorably in the cryptographic community. Especially for a cryptographic tool. I might believe that you're not intentionally being malicious with your Road Warrior product, or have implemented back doors for the NSA, but how do I know you've used encryption standards and best practices? How can the community evaluate that there are no serious bugs, or security vulnerabilities, without access to the source code? How do I know you're addressing CVEs? Just trusting some company, because they say so, is a big leap. Has Road Warrior been independently audited for security? If so, I'm not finding the results anywhere on the site. If the product is to remain proprietary, closed source, then at least provide your customers with the full results of a security audit.

2. All traffic goes through Cryptohippie's servers. With Tor, traffic is split out among a fully distributed, decentralized, voluntary network. Every relay is donated. My traffic might go through Denver on one connection, and Amsterdam on the other. Not so with Road Warrior. While the servers might be geographically separated, customers can't run their own server. As such, the IP list is very static, and makes Cryptohippie a very large single point of failure. Even further, how do I know you're not examining my packets? It's one thing to trust that some random Tor server on the network is not malicious. It's another to trust a company, who is in control of all of the anonymizing servers. Cryptohippie may not be malicious, but what about a disgruntled Cryptohippie employee? Cryptohippie even has an email service. This is a lot of eggs in one basket for a crypto product.

3. Because Cryptohippie is in control of the full stack, Cryptohippie would be aware of a single client and the destination servers they visit. Child porn? Illegal drug trade? Whistle blowing to journalists? Posting to a forum about overthrowing a government? Cryptohippie would know about each of this. I'm sure you're aware, as I am administrating a Tor exit node, that the FBI will make requests regarding illegal content moving across a public IP address. With Tor, the administrator of the exit node is not in control of the content entering and leaving the server. While exit nodes have been seized by governments in the past, no system administrator has been charged with the data communicating over that IP address in question. That is why the hardware is seized, for the possibility of logs or stored data. Requesting client data, such as the originating IP address requesting the data, is not possible, as it's not known. So, with Tor, it's ineffective for the FBI to request an exit node operator to hand over the Tor client data. With Cryptohippie, because that company is fully in charge of the entire network, including accounting details, warrants to release customer information is not only effective, it's easy. If Cryptohippie wishes to remain in business, it must comply with local laws, which includes turning over data to law enforcement.

4. Finally, this may seem petty, but where do your clients go for support? No bug tracker or community forum is present, unless it's behind the login. Looking around your site, I'm seeing copyrights haven't been updated since 2007-2008, which doesn't instill a lot of confidence for me as a client of yours. If your basic website doesn't have an updated copyright, how do I know your Road Warrior software does as well? And if the copyright hasn't been updated in your Road Warrior product, then how do I know you've been addressing bugs and vulnerabilities, introducing new features, or even general maintenance of the software?

I'm sure Cryptohippie is a fine company, and Road Warrior is a fine piece of software. But this blog post glosses over the issues, and is a bit sensationalist, if not a little bit incorrect.

Cryptographically Secure Passphrases In d-note

A couple nights ago, while coming home from work, I started thinking about the button you press on the d-note web application (an instance running at for generating passphrases used to encrypt your note. Each passphrase is a 22-character base 64 passphrase. Initially, I was using the following code in JavaScript:

function make_key() {
    var text = "";
    var possible =
    for(i=22; i--;) {
        text += possible.charAt(Math.floor(Math.random() * possible.length));
    return text;

Simple, functional, works. However, using Math.random() for each character generation isn't cryptographically strong. The d-note web application is known for going over the top on the security engineering, and I know I can do better. So, I did a little digging, and learned about the "Web Crypto API" that many modern browsers support, despite the specification being a "working draft". So, I figured I could use that for my code. As such, the code morphed into the following:

function make_key() {
    var text = "";
    var possible =
    var random_array = new Uint32Array(22);

    // Make some attempt at preferring a strong CSPRNG first
    if (window.crypto && window.crypto.getRandomValues) {
        // Desktop Chrome 11.0, Firefox 21.0, Opera 15.0, Safari 3.1
        // Mobile Chrome 23, Firefox 21.0, iOS 6
    else if (window.msCrypto && window.msCrypto.getRandomValues) {
        // IE 11
    else {
        // Android browser, IE Mobile, Opera Mobile, older desktop browsers
        for(i=22; i--;) {
            random_array[i] = Math.floor(Math.random() * Math.pow(2, 32));

    for(i=22; i--;) {
        text += possible.charAt(Math.floor(random_array[i] % possible.length));

    return text;

The Web Crypto API ensures that browser is using a cryptographically secure non-blocking PRNG from the operating system, such as /dev/urandom that ships with the Linux kernel. While this works, it means that browsers that don't support the Web Crypto API are stuck with non-cryptographic passphrases. This certainly wouldn't do, so I went out to fix it.

Enter Blum Blum Shub (this sounds like something out of a Dr. Seuss book). Blum Blum Shub is a cryptographically secure PRNG, provided a few criteria:

  • The primes 'p' and 'q' can be static or chosen pseudorandomly.
  • The primes 'p' and 'q' should be congruent to 3 modulo 4.
  • The seed 'x0' should be chosen pseudorandomly to start the process.
  • The seed 'x0' should not be '0' or '1'.
  • The seed 'x0' should be coprime to the product of the primes 'p' and 'q'.
  • The GCD of phi(p-1) and phi(q-2) should be small (subjective?), where phi is the Euler's totient function.

Unfortunately, as the product of primes 'p' and 'q' grow, the Blum Blum Shub algorithm slows to a crawl. Fortunately and unfortunately, the maximum integer JavaScript can store is 53-bits, or 2^53. This means the product of 'p' and 'q' cannot exceed that value. That leaves us with an integer space of '2^26' and '2^27' for the primes. However, this is okay, as even 'p = 11' and 'q = 19' generate a cycle that is long enough for our 22-character passphrase.

Thus, the code had changed to:

    else {
        // Android browser, IE Mobile, Opera Mobile, other browsers
        random_array = bbs(22);

This new "bbs(n)" function is the bread and butter for generating cryptographically secure pseudorandom numbers. The function takes an integer as an argument, to know how many numbers need to be generated, and returns an unsigned 32-bit integer array with that many random numbers. The code is as follows:

function gcd(x, y) {
    if(!y) return x;
    return gcd(y, x%y);

function seed() {
    var s = 2*Math.floor(Math.random() * Math.pow(2,31))-1; //odd
    if(s < 2) {
        return seed();
    } else {
        return s;

function bbs(n) {
    // Blum Blum Shub cryptographically secure PRNG
    // See
    var a = new Uint32Array(n);
    // Max int = 2^53 == (2^26)*(2^27) -> (2^p1)*(2^p2)
    var p1 = Math.floor(Math.random()*2)+25; // first power, 25 or 26
    var p2 = 51-p1; // second power
    var p = random_prime(2*Math.floor(Math.random() * Math.pow(2,p1))-1);
    var q = random_prime(2*Math.floor(Math.random() * Math.pow(2,p2))-1);
    var s = seed();

    // Ensure each quadratic residue has one square root which is also quadratic
    // residue. Also, gcd(totient(p-1),totient(q-1)) should be small to ensure a
    // large cycle length.
    while(p%4 != 3 || q%4 != 3 || gcd(totient(p-1),totient(q-1)) >= 5) {
        p = random_prime(2*Math.floor(Math.random() * Math.pow(2,p1))-1);
        q = random_prime(2*Math.floor(Math.random() * Math.pow(2,p2))-1);

    // s should be coprime to p*q
    while(gcd(p*q, s) != 1) {
        s = seed();

    for(i=n; i--;) {
        s = Math.pow(s,2)%(p*q);
        a[i] = s;

    return a;

The key to this function is generating primes 'p' and 'q' that meet the requirements outlined earlier in the post. The primes can be static, with the seed pseudorandomly generated, or the primes can also be pseudorandomly genenerated. I went with the latter. So, the only trick to getting these values, is testing for primality.

The standard way to test for primality is through trial division. Unfortunately, it's slow. Sure, there are a few optimization techniques that you can do to speed things up, but by and large, it's not an efficient way of attacking the problem. Instead, I applied the Miller-Rabin test for composites. Miller-Rabin uses "witnesses" that "testify" about the compositeness of a positive integer. The more witness there are to testify that your integer is composite, the more likely it is. Thus, we would say that we have a "strong composite" integer. However, if there are zero witnesses for a positive integer we can say that the integer "may or may not be prime". However, after enough witnesses, we can actually say with deterministic fact if a number is prime.

I'll stop going over the algorithm there, and let you read up on the Wikipedia article about it. Suffice it to say, the Miller-Rabin primality test runs in O(k*log(n)^3), where 'k' is the accuracy that is sought. This is good news for Blum Blum Shub, as it's slow enough already. Code below:

function modProd(a,b,n){
    if(b==0) return 0;
    if(b==1) return a%n;
    return (modProd(a,(b-b%10)/10,n)*10+(b%10)*a)%n;

function modPow(a,b,n){
    if(b==0) return 1;
    if(b==1) return a%n;
        var c=modPow(a,b/2,n);
        return modProd(c,c,n);
    return modProd(a,modPow(a,b-1,n),n);

function isPrime(n){
    // Miller-Rabin primality test taken from
    // O(k*log(n)^3) worst case, given k-accuracy
    if(n==2||n==3||n==5) return true;
    if(n%2==0||n%3==0||n%5==0) return false;
    if(n<25) return true;
    for(var a=[2,3,5,7,11,13,17,19],b=n-1,d,t,i,x;b%2==0;b/=2);
    for(i=0;i<a.length;i++) {
        if(x==1||x==n-1) continue;
              x=modProd(x,x,n); if(x==n-1) t=false;
        if(t) return false;
    return true;

function random_prime(n) {
    while(!isPrime(n)) n -= 2;
    return n;

The primes 'p' and 'q' must then go through some rigid tests to ensure that the algorithm has a long cycle, and the seed doesn't fall into a trap, where it falls to 0, and the algorithm cannot recover. Part of these tests is using Euler's totient function. The totient function does nothing more than just count what are called "totatives". A "totative" is a number that is coprime to a given number 'n'. In other words, the totative and 'n' do not share any common divisors other than '1'. Every positive integer from 1 through 'n' must be tested. Unfortunately, this is slow. However, there is a proof using a product formula, which shows that we can arrive at the same result in O(sqrt(n)/3) time. Again, because Blum Blum Shub is slow, we need to keep our processor time down.

function totient(n) {
    // compute Euler's totient function
    // O(sqrt(n)/3) worst case
    // Taken from:
    if(n < 2) return n;
    var phi = n;
    if (n % 2 == 0) {
        phi /= 2;
        n /= 2;
        while(n % 2 == 0) n /= 2;
    if (n % 3 == 0) {
        phi -= phi/3;
        n /= 3;
        while(n % 3 == 0) n /= 3;
    for(p = 5; p * p <= n;) {
        if(n % p == 0) {
            phi -= phi/p;
            n /= p;
            while(n % p == 0) n /= p;
        p += 2;
        if(p * p > n) break;
        if(n % p == 0) {
            phi -= phi/p;
            n /= p;
            while(n % p == 0) n /= p;
    if(n > 1) phi -= phi/n;
    return phi;

Putting everything together, the Blum Blum Shub algorithm on my workstation can produce approximately 1 million random numbers per second for a 53-bit space. We only need 22 numbers, so that should be sufficient, even for weaker devices. I was able to test the code successfully on a number of browsers that do not support the Web Crypto API, and the key generation is near instantaneous, even on my phone. Interacting with the console, here is the output of one call to the algorithm:

> bbs(22);
[3143943020, 475278844, 386457630, 124718623, 280175014, 2600881459,
127152064, 749398749, 2269393658, 692609408, 1218408987, 1523732228,
1265360812, 1641372390, 2500929554, 2223592103, 2462017186, 310616491,
426752821, 2973180471, 2248877527, 574751875]

Those numbers are cryptographically strong, because they are the result of the product of two primes, which means using the sequence to get back to the primes is at least as difficult as the factoring problem. It also turns out that it may also be at least as difficult as computing modular square roots, which is at least as difficult as the factoring problem.

So, even though most users visiting a site hosting d-note will likely be using the Web Crypto API, there may be some other browsers that are not. For them, should they choose to have the browser generate the passphrase for encrypting their note server-side, they can rest assured that the result is cryptographically strong.

Officially Announcing d-note Version 1.0

I've been looking forward to this post. Finally, on my birthday, it's here. My Python Flask web application of encrypted self-destructing notes is stable, and ready for production use.

Around 2011, or so, I started thinking about a way that I could send data privately and securely to friends, family and coworkers, without requiring them to spend a great deal of time setting up PGP keys, knowing a great deal about encryption and security, and other things. Basically, I wanted a way to send them an encrypted note, and transparently, it could be decrypted with very little work, or at most, providing a passphrase to decrypt the note. I knew this would need to be a web application, but I wasn't sure of the implementation details. Eventually, that got worked out, and d-note was born early January of this year.

Shortly after that release, I started focusing on a script that would generate random ASCII art with OpenPGP keys. There will be some additional news about that project I'm excited about, but will announce at a later date. As a result of that OpenPGP project, d-note development took the back seat for a bit. However, a recent pull request from Alan Dawson brought focus back to the code.

Changes And Increased Security
The changes that Alan Dawson introduced were switching away from Blowfish in ECB mode to AES-128 in CBC mode with HMAC-SHA1. This is something that I initially wanted to support, but didn't for two reasons:

  • Without thinking about PBKDF2, AES keys must be 16, 24 or 32 bytes in size. This prevented users from entering personal passwords to encrypt the note, rather than the server-side key.
  • Also, without thinking about PBKDF2, Blowfish was intentionally designed to make the key setup slow, so brute forcing the password out of the encrypted file would be more difficult.

However, Blowfish uses a 64-bit block size for its internal operations, while AES uses 128-bit. This might be of little consequence for the standard plaintext encrypted note, but for encrypting files, which I would like to support in the long term, it could mean the difference of repeated encrypted blocks in Blowfish, or none in AES.

Since the changes introduced by Alan, I've increased the security of the application to AES-256 in CTR with HMAC-SHA512. The application relies on the Python Cryptography Toolkit version 2.6. In version 2.7, new authenticated block cipher modes are introduced, such as GCM. With the ability to use GCM, the need to create an HMAC-SHA512 tag will no longer be needed. However, switching modes will break previously encrypted notes that have not yet been decrypted, so this will need to be handled with care. Also, when SHA3 is standardized, I would like to switch to that if it is introduced into PyCrypto, rather than use SHA512, even though there have been no serious security concerns over SHA2.

The Encryption Process Visualized Without A Custom Passphrase
I wanted to at least show what the encryption process looks like from top-to-bottom visually, so you wouldn't need to piece together the code and figure it out.

First, we start out with the application creating 3 static salts at random. Each salt is 16-bytes in size, and should be different from the other two, although this isn't a requirement. The salts will remain static as long as you wish. They are the only random data that is not deleted, but instead saved on the server. Changing the salts will mean any previously encrypted notes still on the server will no longer be able to be decrypted. As such, once generated, they should be left alone. However, if no encrypted notes remain on the server to be decrypted, the web server could be stopped, the salts changed, and the web server started back up. The only reason you would do this, is if you feel the salts have been compromised in some way.


When a browser renders the main index.html page, we need to create a unique and random URL to post to. As such, a one-way function is used to generate three random keys, from which we'll build everything from. First, we generate a random 16-byte nonce. This nonce is the key to starting the one-way function to build everything for encrypting and decrypting the note.


All of the security of this application rests on the ability to generate a cryptographically secure random nonce on every page view. We've taken the steps necessary to ensure that not only is that choice cryptographically strong, but all the building blocks that build from it are industry best practices, and over engineered at that. Our one-way function is started by using the salts along with our nonce in a PBKDF2 function. PBKDF2 is a password-based key derivation function that can derive a pseudorandom key of any size, given a nonce and a salt. As such, we use our three salts and the nonce to generate a 16-byte file key, a 32-byte AES key, and a 64-byte HMAC key. Notice that we use each salt only once.


Now that we have a file key, we can base-64 encode that data, which becomes our file name to save our encrypted data to. This is different than initially released, where the URL would also be the file name. The URL can produce the file name, but the reverse is not true.


Finally, our nonce is base-64 encoded, and becomes the URL that we post to, and the URL that we give to the recipient.


At this point, the user is now copying their data into the form. The user has the option of either using a personal passphrase to encrypt the note, or to let the server use its key. It's important to note, that due to our nonce above, unless the same passphrase is chosen for encrypting multiple notes, the same AES key is never used to encrypt multiple notes. Because this is symmetric encryption server-side, we don't want to run the risk of deriving the AES key and having multiple encrypted files decrypted, because they were encrypted with the same key. So the random nonce generation at each index.html view is critical.

When the note is posted to the server, it is first compressed using the ZLIB compression library. This reduces data structure out of the plaintext, and increases the overall entropy of the encrypted note before it is finally saved to disk. It should be mentioned that the note is never stored to disk until it is fully encrypted.


We are now ready to ship the compressed plaintext off to AES for encryption processing. First, we need to generate a 12-byte random initial value for our counting. We need to do this, because we will be encrypting the note with AES-256 in CTR mode, and I would like to protect the end users from backups. CTR mode uses a counter, that typically starts with "1", for ensuring that the same plaintext block is not encrypted to the same ciphertext block during the encryption process. However, if the same AES key is shared between two plaintexts, then an XOR of the two ciphertexts can reveal the AES key. As such, for every encryption process, we use that random 12-byte initial value to practically guarantee that the starting point of the AES counter is always different, even if the same AES key is used between multiple plaintexts. The initial value is 12-bytes in size, rather than 16-bytes, to ensure that we still have plenty of counting space during the encryption process.

We also need our pseudorandom 32-byte AES key that we generated from our earlier PBKDF2 function. Using this 32-byte key, our 12-byte initial value, and our compressed plaintext, we encrypt the data. This encryption process will give us a preliminary ciphertext. It is preliminary, in that we have additional work to do, before we are ready to store it on disk.


Because we used an initial value to start our AES-256 in CTR mode, we will require the initial value also for decryption. As such, we need to store the initial value with the preliminary ciphertext. As such, the initial value is prepended at the beginning of the ciphertext. Because the initial value is random data, and the ciphertext should appear as random data, prepending the initial value to the ciphertext should not reveal any data boundaries or leak any information about the stored contents. Prepending the initial value to the ciphertext gives us a file 12-bytes larger as an intermediate ciphertext.


The final step in the encryption process is to ensure data integrity and to provide authentication by using HMAC-SHA512. The reason for this, is non-authenticated block cipher modes can suffer from a practical malleability attack on the ciphertext to reveal the plaintext. Thus, the intermediate ciphertext is HMAC-SHA512 hashed. This is known as "encrypt-then-MAC" (EtM), and is the preferred way for storing MAC tags.

HMAC allows you to choose a number of different cryptographic hashing algorithms. In our case, SHA512 is used, because we can. HMAC requires both a salt and a message. In our case, because the user is not providing a passphrase to the application, the salt is our 64-byte HMAC pseudorandom key that we generated with PBKDF2 earlier. The intermediate ciphertext and this 64-byte key are added to the HMAC-512 function, to generate a 64-byte SHA512 tag.


This 64-byte SHA512 tag is prepended to our intermediate ciphertext to produce our final ciphertext document, which is finally written to disk in binary form. Initially, I had base-64 encoded the ciphertext, as is standard practice, but this just increases the used disk space by an additional 30%, and adds additional code, with no real obvious benefit, other than not being able to view the ciphertext in a text editor, or on your terminal. So, the raw binary is stored instead.


The Encryption Process Visualized With A Custom Passphrase
In the process of encrypting the note with a user-supplied passphrase, rather than use our cryptographic nonce to generate the AES key and the HMAC key, the passphrase is used in place. In other words, we still use PBKDF2 to generate our AES key and our HMAC key, combined with the appropriate salt. Everything else about the encryption process is the same.


Knowing whether or not a user supplied a passphrase to encrypt the note, an empty file with ".key" as an extension is created. The user-supplied password is not stored on disk. The empty file exists only to tell the application not to use PBKDF2 to generate the AES key, but to ask the client for it. However, if a duress key for Big Brother is supplied, then that key is indeed stored on disk. That key will not decrypt the note, but instead return random sentences from The Zen of Python, as well as destroy the original note.

To show the effectiveness of the 12-byte initial value discussed earlier, I'll encrypt the text "Hello Central, give me Doctor Jazz!" twice, both with the same passphrase "Oliver". If the AES counter started with the same value in both instances, then the ciphertext will be identical, as it should be. By starting with a random initial value for our counter, the ciphertexts should be different. Let's take a look.

Encrypting it twice, I ended up with two files "qZ-cloUfLnFYVYZImgYWmQ" and "hxlUK9GxbNSMgJZDUgEsMw". Let's look at their content:

$ ls data
hashcash.db  qZ-cloUfLnFYVYZImgYWmQ  qZ-cloUfLnFYVYZImgYWmQ.key hxlUK9GxbNSMgJZDUgEsMw  hxlUK9GxbNSMgJZDUgEsMw.key
$ cat data/qZ-cloUfLnFYVYZImgYWmQ | base64 -
$ cat data/hxlUK9GxbNSMgJZDUgEsMw | base64 -

This should not be surprising, knowing how CTR mode works with block ciphers:


As long as that IV is different, then the ciphertexts should be different, and one should not be able to extract the private key that encrypted the plaintext, even if the key is reused across multiple plaintexts. Let's see what the IV was for each, using Python:

>>> msg = """CgHOJ4NTTMSNWOVbC1SduXg7jqij92o3ZHhOcCumu8DKmDKI3cyZ5Ne9dB6w7TrnxMaR8EyDf5w9
... eMpMM+VLkztyfAieUWFYVFvFqJ0cY7+fgX1tzyaLs71rHywjydE9bobAfof0Bqo6j87sTJEgJTu+
... BFXko1w="""
>>> msg = msg.decode('base64')
>>> data = msg[64:] # remember, the first 64-bytes are the SHA512 tag
>>> iv = data[:12]
>>> long(iv.encode('hex'),16)
>>> msg = """D/fLA+UgghHQMGadJxmATifaL+JTXybRNROUDvSBzgTV6EWX7Dau9Z9zI1KpEuMbDeFNp4oZk4SN
... hlVGm2k8vWwKJWhys78U6FFd1jGmKWDC61VKH7+zXEGhNf/fo/igEEEG+Jge1awQ9A0cJbsLSmoh
... zgj6XH8="""
>>> msg = msg.decode('base64')
>>> data = msg[64:]
>>> iv = data[:12]
>>> long(iv.encode('hex'),16)

Not much really needs to be said about decrypting the encrypted notes, other than because we are using authenticated encryption with HMAC-SHA512, when client supplies the URL to decrypt a note, a SHA512 tag is generated dynamically based on the encrypted file and then compared to the SHA512 tag actually stored in the encrypted note. If the tags match, the plaintext is returned. If the tags do not match, something went wrong, and the client is redirected to a standard 404 error. Other than that, the code for decrypting the notes should be self-explanatory.

I hope you enjoy the self-destructing encrypted notes web application! If there are any issues or concerns, please be sure to let me know.

Going Google Free

On June 25, 2004 I received the following email:

To: Aaron Toponce
From: Gmail Team
Subject: Gmail is different. Here's what you need to know.
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

First off, welcome. And thanksfor agreeing to help us test Gmail. By now you probably know the key ways in which Gmail differs from traditional webmail services. Searching instead of filing. A free gigabyte of storage. Messages displayed in context as conversations.

So what else is new?

Gmail has many other special features that will become apparent as you use your account. You’ll find answers to most of your questions in our searchable help section, which includes a Getting Started guide. You'll find information there on such topics as:

  • How to use address auto-complete
  • Setting up filters for incoming mail
  • Using advanced search options

You may also have noticed some text ads or related links to the right of this message. They're placed there in the same way that ads are placed alongside Google search results and, through our AdSense program, on content pages across the web. The matching of ads to content in your Gmail messages is performed entirely by computers; never by people. Because the ads and links are matched to information that is of interest to you, we hope you'll find them relevant and useful.

You're one of the very first people to use Gmail. Your input will help determine how it evolves, so we encourage you to send your feedback, suggestions and questions to us. But mostly, we hope you'll enjoy experimenting with Google's approach to email.

Speedy Delivery,

The Gmail Team

p.s. You can sign in to your account any time by visiting

On June 25, 2014, I'll be closing my Google account for good. It seems fitting to end it on that date, exactly 10 years after first receiving it. I've already migrated all my documents, my contacts and my calendars to my personal Owncloud instance. When Google shut down Reader, I moved to TTRSS. Last week, I replaced the ROM on my phone with Cyanogenmod, and have not installed or enabled any Google accounts or applications, other than using K-9 for my email. I've already migrated my chat to my own personal XMPP server. The only thing that remains is email.

I will miss the community at Google+. I've made a great many friends there, and have had some amazing discussions. Sadly, it's time to say goodbye. You can find me on Twitter @AaronToponce, and I've usually cross-posted my articles from Plus on Twitter also. We can still have the amazing discussions there. Or, you can find me on IRC under the nick 'eightyeight' on the Freenode, OFTC and XMission IRC networks.

There are many reasons why I'm leaving the Google-verse. They include:

  • Cooperation with NSA spying.
  • Amassing vast amounts of data collection from hundreds of products.
  • Support for proprietary software.
  • A large monopolistic monoculture.

I appreciate that they will allow me to export my data, and delete it off of their servers. Their Takeout service isn't entirely evil. And their Google Summer of Code is to be commended. I appreciate that they have built Android, and submit back to the Linux kernel, and I love that they financially support Free Sofware conferences. However, despite all the good they have done, their toes are touching the "Do No Evil" line, and it's creepy.

I am going to self-host my data. I want as much control over my data as humanly possible. Using Google's services, means losing some of that control, and I just can't tolerate it any longer. This means work and some growing pains, but I'm ready for the trials.

Goodbye Google.

OpenPGP Key Random Art, Now With ANSI Color Support

I just recently committed support for my OpenPGP key random art Python script to support ANSI color. The idea is to create a "heat map" of which squares the drunken bishop has traversed during his dizzying travels. So not only can you see what your key "looks" like, but now you can sense what your key "feels" like.

The idea is simple- the squares that have not been walked will not have any color applied to them. The squares on the floor with the fewer visits will be "cold" with the blue color, while squares that have been visited many times will be "hot" with the red color, approaching white hot for even higher frequencies. There is a direct one-to-one relationship with the coin and the color. The point isn't necessarily to introduce additional entropy or to reduce collisions, but merely for entertainment value. For PGP key signing parties, standard plain text output should be the default, preventing anyone from being pigeon holed into specific software (browsers, PDF readers, etc).

Right now, the Python script is under very active development, so there are many things missing, such as switches to turn on foreground and background colors, or enabling and disabling ANSI entirely. For the time being, ANSI is default, and the script will fallback to plain text if the TTY or STDIN does not support ANSI escape sequences. The next step is to start working on command line switches. I haven't decided if ANSI color should be the default or not.

Necessary screenshots with the heat map and a key chosen at random from my keyring displayed, both for foreground and background colors are below. Note: the switches are missing in the command to differentiate between foreground and background display. I changed the code manually between the screenshots. Also, I might adjust the heat map, adding one or two extra pink to white colors, bringing red closer to blue, so a larger variety of color can be displayed in the ANSI.

Screenshot showing ANSI foreground color for an OpenPGP key.

Screenshot showing ANSI background color for an OpenPGP key.

Gunnar Glasses Review

I've owned a pair of GUNNAR Optiks for a couple of months now, and figured it would probably be a good time for a review. For those unaware, GUNNAR Optiks are specially designed eyewear for long exposure computer use. The claims on the main site are:

  • Eyes are more moist.
  • Headaches are reduced.
  • Blurry vision is reduced.
  • Eyes are no longer fatigued.

They claim these points are met by the curvature of the eyewear and the unique color of the yellow lens. I'll review each individiually.

The curvature of the glasses brings the lens in more closely to the face. The GUNNARS claim is that your eye produces natural moisture that humidifies the immediate air surrounding the eye. By bringing the lens in more closely to the face, that humidity is trapped in the area surrounding the eye, preventing the eye from drying out. So someone who does not wear glasses regularly, I was skeptical of this claim. However, after wearing them for 8 hours per day, every day, I can most certainly feel that my eyes are much more moist than without. People who regularly wear glasses are probably already aware of this. In fact, I've been fighting allergies this season, and my eyes have been abnormally moist. Wearing the GUNNARS has actually complicated the problem, to where I am constantly wiping my eyes to remove the excessive tearing.

The yellow color tint of the lens claims to neutralize the high-intensity blues that artifical light provide, such as flourescent and LED bulbs common in computer monitors. This actually didn't surprise me all that much, as the lens of your eye yellows as you get older due to UV radiation from the sun. On the color wheel, yellow is complementary to blue. Complementary colors of light, when combined correctly, produce white light. So, using a yellow tint to neutralize a blue light, providing a more balanced color spectrum is something photograhpers have known for a long time.

Due to the yellow tinting of the glasses, the claim is that headaches, fatiqued eyes and blurry vision are reduced or eliminated. I was highly skeptical of these claims, but after wearing them for a full day, I have noticed how much more "comfortable" my eyes feel at the end of the day. Indeed, I do not have the typical eye fatique that I normally have when driving home. I've never encountered headaches working in front of a computer all day, but I do get blurry vision, and the GUNNARS have most certainly reduced it. Even my wife has noticed how less grumpy and fatiqued I am when coming home after a full day of work.

After wearing the GUNNARS for the past couple of months, I've concluded that they do indeed hold up to the claims they make. There is something there to the science behind the eyewear. In fact, I've come to prefer wearing them when watching movies and television as well. Even when sitting in front of this 17" monitor writing this post without wearing the glasses, I am beginning to notice eye strain. I do feel like I look a bit kooky when walking around the office with them on my face, but they work, and they work well.

All isn't positive, however. The GUNNARS I purchased were from a friend who was getting migrains as a result of wearing them. The pair is non-prescription, so other than wearing typical sunglasses or ski goggles, I don't understand how changing the tint of what his eye sees would cause a migrain, but it was certainly the case for him. For myself, my life has been improved, not degraded by wearing the GUNNARS.

If someone were to ask me if I recommend wearing GUNNARS for gaming or general computing, it would certainly be in the affirmative.

Analysis of RIPEMD-160

Recently on Hacker News, I noticed a table showing the "Life cycles of popular cryptographic hashes" by Valerie Aurora (in this post, I've greatly compressed her HTML for faster page delivery).

Life cycles of popular cryptographic hashes (the "Breakout" chart)
Function 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
RIPEMD-128 [1]                                              
SHA-2 family                                    [2]          
SHA-3 (Keccak)                                              
Key Unbroken Weakened Broken Deprecated
[1] Note that 128-bit hashes are at best 2^64 complexity to break; using a 128-bit hash is irresponsible based on sheer digest length.
[2] In 2007, the NIST launched the SHA-3 competition because "Although there is no specific reason to believe that a practical attack on any of the SHA-2 family of hash functions is imminent, a successful collision attack on an algorithm in the SHA-2 family could have catastrophic effects for digital signatures." One year later the first strength reduction was published.
The Hash Function Lounge has an excellent list of references for most of the dates. Wikipedia now has references to the rest.

I find this table a great resource, and I'm glad she put it online. However, I have one small issue with the table (other than it's out of date, and small on functions), and that's her calling RIPEMD-160 "deprecated". My first question would be: deprecated by whom exactly? RIPEMD-160 isn't a FIPS standardized cryptographic hash function, so it couldn't be deprecated by NIST. RIPEMD-160 was actually developed in Belgium, and as far as I can tell, the Belgium NBN - Bureau for Standardisation hasn't deprecated it either. It is standardized by CRYPTREC in Japan, and also has not been officially deprecated there, as far as I can tell, although it is on their "monitored list".

It could be considered deprecated if there are known attacks that greatly weaken the algorithm, or if known collisions exist, such as MD5. However, RIPEMD-160 does not have any known weaknesses nor collisions. The simplified versions of RIPEMD do have problems, however, and should be avoided. But as it stands, RIPEMD-160 is still considered "strong" and "cryptographically secure". Being that it was first published in 1996, almost twenty years ago, in my opinion, that's impressive. Compared to SHA1, another 160-bit digest, which was first published in 1995, the first published attack against SHA-1 was published just 8 years later, in 2003, and attacks have been pouring out since.

In fact, maybe Valerie is calling RIPEMD-160 "deprecated", because it's old, and there are plenty of other functions with larger digests, such as the new proposed FIPS standard SHA3/Keccak. The only problem with that, is that while we may be able to call these secure now, they too could fall victim to various attacks, and be severely weakened. It could also be possible that RIPEMD-160 would still be considered strong and cryptographically secure.

Granted, RIPEMD-160 has not received the attention that SHA1 has. As such, it is likely that it has not gotten the mathematical attention of the cryptographic community. I understand that. However, RIPEMD-160 is part of the OpenPGP standard, and available in many cryptographic libraries for many different programming languages. While it's not the running favorite of cryptographers and developers, it also hasn't been overlooked.

The only concern I would have against RIPEMD-160 is the 160-bit output digest size. Is that large enough to withstand a sophisticated brute force search? To answer that question, we can look at the Bitcoin distributed network, likely the largest distributed computing network in the world. The Bitcoin network, at the time of this post, is currently calculating approximately 60 million billion SHA256 hashes per second, largely using specialized hardware called "ASICs". 60 million billion is 60 quadrillion, or 6x10^16. 160-bits is approximately 1.5x10^48. This means it is taking the Bitcoin distributed network approximately 2.4x10^31 seconds to completely exhaust the RIPEMD-160 digest space, or about 7.7x10^23 years. So even then, at the amazing brute force pace of 60 million billion hashes per second, it's still unreasonable to find legitimate collisions for a 160-bit digest. Even using the Birthday Attack, the likelihood of finding a collision is 50% in 890,362,173,273 years at that pace. RIPEMD-160 still stands strong.

Now, I'm an amateur cryptographer, and a lousy one at that. But my Internet searching has left me empty handed in trying to find any resource that successfully attacks, weakens, or criticizes RIPEMD-160, let alone calling it "deprecated". From everything I can tell, it's withstood the test of time, and it's still going very, very strong.

SHA3 (Keccak) in Linux

For a long time, I've been waiting to use the newly accepted SHA3 in Linux for file integrity and other uses. Like the md5sum(1), sha1sum(1), sha224sum(1), sha256sum(1), sha384sum(1), and sha512sum(1), I was hoping that a similar "sha3-224sum(1)", etc would be developed, and make its way into the GNU/Linux library. Unfortunately, I kept waiting and waiting, until eventually, I just stopped worrying about it. Well, to my surprise, it appears that there is a package that ships SHA3, as accepted by NIST in the rhash package (it also does a number of other hashes as well).

Keccak was chosen by NIST as the SHA3 winner, due to it's performance, security and construction. Keccak uses the sponge function for creating the cryptographic hash, which truly sets it apart from SHA1 and SHA2. This means any successful attack against SHA1 or SHA2 will likely be ineffective on SHA3. SHA3 clams 12.5 cycles per byte on an Intel Core 2 CPU in a software implementation. Unfortunately, it appears that SHA3 as it appears in the rhash package still needs some optimizations, as SHA2, which requires more cycles per byte due to its construction, can calculate a SHA2-256 hash faster than a SHA3-256 hash. SHA3 support was added in September 2013.

The implementation of SHA3 in rhash uses the offical acceptance of the original Keccak function as approved by NIST. This means that it does not contain the 2 bits "01" which are appended to the message for padding. It should be noted that SHA3 is only a FIPS draft as of the time of this blog post. As such, outputs could change until the standard is formalized.

Below are examples of hashing:

$ echo -n "" | rhash --sha3-224 -
f71837502ba8e10837bdd8d365adb85591895602fc552b48b7390abd  (stdin)
$ echo -n "" | rhash --sha3-256 -
c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470  (stdin)
$ echo -n "" | rhash --sha3-384 -
2c23146a63a29acf99e73b88f8c24eaa7dc60aa771780ccc006afbfa8fe2479b2dd2b21362337441ac12b515911957ff  (stdin)
$ echo -n "" | rhash --sha3-512 -
0eab42de4c3ceb9235fc91acffe746b29c29a8c366b7c60e4e67c466f36a4304c00fa9caf9d87976ba469bcbe06713b435f091ef2769fb160cdab33d3670680e  (stdin)
$ echo -n "The quick brown fox jumps over the lazy dog." | rhash --sha3-224 -
c59d4eaeac728671c635ff645014e2afa935bebffdb5fbd207ffdeab  (stdin)
$ echo -n "The quick brown fox jumps over the lazy dog." | rhash --sha3-256 -
578951e24efd62a3d63a86f7cd19aaa53c898fe287d2552133220370240b572d  (stdin)
$ echo -n "The quick brown fox jumps over the lazy dog." | rhash --sha3-384 -
9ad8e17325408eddb6edee6147f13856ad819bb7532668b605a24a2d958f88bd5c169e56dc4b2f89ffd325f6006d820b  (stdin)
$ echo -n "The quick brown fox jumps over the lazy dog." | rhash --sha3-512 -
ab7192d2b11f51c7dd744e7b3441febf397ca07bf812cceae122ca4ded6387889064f8db9230f173f6d1ab6e24b6e50f065b039f799f5592360a6558eb52d760  (stdin)

In my limited testing, it appears that the SHA3 implementation in rhash(1) is not quite up to par, and could use some additional performance improvements. I'm sure these will be committed over time. However, it's hardly a poor performer. I've been very happy with the performance results so far.

The Core Problem With Ubuntu Releases - Little QA

This post has a background bug that was introduced just over a week ago, ONE THE DAY BEFORE ITS RELEASE. The bug is being able to bypass the lock screen by just holding down your <Enter> key, letting Unity crash, then restarting without locking the desktop. That's a pretty big bug. What's interesting about this bug though, isn't the bug itself, but a bigger problem withing the Ubuntu and Canonical development teams. That is, that the release date is more important than the quality of the release.

I blogged about this six years ago when I was an Ubuntu Member, and my blog was on the Ubuntu Planet (it's weird to think that I've had this blog now for almost 10 years. Heh). I mentioned that I was moving my server away from Ubuntu 8.04 LTS to Debian 5.0. My rationale was clear: Debian is much more interested in making sure that "Debian stable" is actually rock-solid stable. If that means that it takes three years before the next release is ready, then so be it. Stable is stable. Ubuntu could learn a lot from this.

So, what does Debian do so differently than Ubuntu, and how can Ubuntu learn from this? Let's take a look at how packages transition from the "unstable" release to the "testing" release in Debian, and how "testing" becomes "stable".

  1. After the package has been in unstable for a given length of time, it can qualify for migration to testing. This depends on each package, and the urgency of the migration.
  2. The package can only enter testing if no new release critical bugs exist. This means, that the package must have fewer release critical bugs than the current package in testing.
  3. All dependencies needed for the package must be satisfiable in the current testing repository. If not, those packages must be brought in at the time the current package is, and they must meet the same criteria related to time in unstable and the number of release critical bugs.
  4. The package migrating to testing must not break any other packages currently in testing.
  5. It must be compiled for all release architectures it claims to support, and all architecture specific packages must be brought in as well, meeting the same criteria as mentioned.

Point #1 gives the package enough time for people in the community to report any immediate or obvious bugs. If nothing is reported, then it's possible that the new package version is an improvement over what is currently in testing, and as such, the package becomes a migration candidate. This doesn't mean the package migrates. Instead, it must now be evaluated in terms of quality, and this is handled by evaluating its release bug count.

Point #2 is all about release bug count. If the release bug count is equal to or higher than what is currently in testing, then the package is not an improvement, and is not going to get migrated into testing. If the release bug count is fewer than what is in testing, then migrating the package from unstable into testing improves the quality of the testing release.

Point #3 makes sure that dependencies are met with the package migration from unstable into testing. All dependencies must meet the criteria of points #1 and #2 to be candidates for migration.

Point #4 ensures that current packages in testing do not break when the new package is migrated. Not only the current package dependencies, but also other packages that might interact with it.

Point #5 brings in CPU architecture support. Debian is a universal operating system, handling many different CPU architectures. Not all CPU architectures have the same packages. But, if the package was previously compiled for a certain architecture, then it must continue to be compiled for that architecture. IE- no orphaned packages. Points 1-4 must be met for every architecture it claims to support.

If points 1-5 are successfully met, then the package can be migrated from unstable into testing, improving the quality of the testing release. You can track the release critical bug status in the current testing release at As of this writing, there are currently 384 release critical bugs that will prevent testing from becoming the new stable release (the green line in this graph). When the release critical bugs near zero, then and only then is a new release of Debian ready for general consumption.

Debian release critical bugs graph.

I know I've highlighted how the Debian release works, but how about other operating system vendors?

  • Windows releases when it's ready.
  • Mac OS X releases when it's ready.
  • RHEL releases when it's ready.
  • CentOS releases when it's ready (following RHEL).
  • SLES releases when it's ready.
  • Slackware releases when it's ready.
  • FreeBSD releases when it's ready.
  • NetBSD releases when it's ready.
  • Solaris releases when it's ready.

In fact, of all of the operating system vendors I can find, only Ubuntu and their derivatives (Linux Mint, *buntu, etc.) and OpenBSD stick to a rigid schedule of releasing every six months. Other than Ubuntu, none of of those operating system vendors see large scale mission critical server deployment. Debian does. RHEL and CentOS do. Windows certainly does. Even FreeBSD sees enormous success in the datacenter. A large part of that, I would bet, is the fact that stable means stable. "Mission critical" just is not synonymous with Ubuntu servers.

According to, as of the time of this writing, almost 2 weeks after 14.04 LTS released, there are:

  • 88 New bugs
  • 301 Open bugs
  • 19 In-progress bugs
  • 10 Critical bugs
  • 78 High importance bugs
  • 4 Incomplete bugs (can expire)

Here's a beauty, reported in December 2013, against 13.10: Reinstallation wipes out all/other partitions. People are losing their Windows partitions. That's lost data. This isn't a laughing matter. And 14.04 STILL released? No way in hell would any other vendor release their OS with a bug of that size. Let's hope it's fixed before 14.10 releases.

Even though I'm critical of Ubuntu's spyware on the desktop, and their CLA and other business practices and development, I would love to see them do things right. Unfortunately, releasing every 6 months, regardless, is not doing things right. It doesn't put a lot of trust into system administrators and into executives to run their production environment.

The Drunken Bishop For OpenPGP Keys

Almost a year ago, I blogged about the drunken bishop algorithm for OpenSSH key random art. Towards the end of the post, I mentioned that I would be building an OpenPGP implementation. I started doing so in Python, but eventually got sidetracked with other things. Well, I hosted the Scale 12x PGP keysigning party, and after the fact wished that I had finished the OpenPGP drunken bishop implementation. Well, I'll be hosting another PGP keysigning party at OpenWest 2014 next month, and couldn't risk missing another keysigning party where I could introduce the drunken bishop to attendees there.

The OpenPGP implementation follows ALMOST the exact algorithm. The only differences are this:

  1. OpenSSH key fingerprints are MD5 checksums while OpenPGP key fingerprints are SHA1 checksums. As such, a larger board is introduced.
  2. The algorithm for OpenSSH key fingerprints reads the fingerprint in little endian. I felt that this unnecessarily complicates the codebase, so I read the fingerprint in big endian. Turns out, this wasn't that big of a deal code-wise, and the reason for the little-endian reading is to "mix" the key, preventing fingerprints that start the same, from looking the same.
  3. For the ASCII art output, the PDF paper mentions that as the bishop visits a square more often, the ASCII character should increase in weight gradually. Unfortunately, it appears this wasn't strictly observed. I've changed the characters that are shown in the art to more accurately reflect the frequency of visits to each square. The starting 'S' and the ending 'E' remain the same.
  4. I added an extra label at the bottom of the random art to depict the 8-bit OpenPGP key ID for that art.

Because it's written in Python, it is slower than its C counterpart for OpenSSH. If I were motivated enough to care, I'd write it in C. But, it's plenty fast for what it's doing, and I'm sure there are performance optimizations I can make to speed things up in the Python script.

I took 9 random keys from my GnuPG public keyring, and am showing them here, as a screenshot, rather than the horrible rendering that is web browsers:

Terminal screenshot showing 9 OpenPGP random art images

As with OpenSSH, these random art images for each public key exists to make verifying OpenPGP keys a bit easier. No doubt that there exist collisions, where more than one key produce the same artwork. When in doubt, verify the hex fingerprint. However, this may be a good alternate method for speeding up keysigning parties, by having each individual verify their key individually, then only speaking up if they feel their key is in error. This way, parties can go straight to identification checking, this eliminating 1/2 to 2/3 of the party. During identification checking, each individual could verify the key random art, fingerprint, etc. while checking photo identification.

A possible handout format for the random art images could look something like this:

+----[DSA 1024]-----+	
|l^  . ^^?^.        |	1) ID: 22EEE0488086060F
|l. . .l:.?         |	UID: Aaron Toponce 
|^ E ...ll          |	Fingerprint: E041 3539 273A 6534
|^. .  ^:.          |	             A3E1 9259 22EE E048
|^ . .  ..          |	Size/Type: 1024/DSA
| . .   . S         |	
|  . .   . .        |	Matches?
|   .     .         |	Fingerprint: ____ ID: ____ Signed: ____
|                   |	
|                   |	
|                   |	

Of course, this is less efficient on paper usage than the traditional table, but it could be improved.

The source code for generating the OpenPGP random art is at It requires an OpenPGP public key file as an argument when executing the script, like so:

$ ./keyart 8086060F.pgp

Hopefully, PGP keysigning party organizers find this useful in streamlining the keysigning process. If not, at least it could be entertaining.

Time Based One Time Passwords - How It Works


With all the news about Heartbleed, passwords, and two-factor authentication, I figured I would blog about exactly how two-factor authentication can work- in this case, TOTP, or Time based one time passwords, as defined by The Initiative for Open Authentication (OATH). TOTP is defined in RFC 6238, and is an open standard, which means anyone can implement it, with no worries about royalty payments, or copyright infringements. In fact, TOTP is actually just an extension of HOTP (HMAC based one time passwords), which is defined in RFC 4226. I'll describe HOTP a little bit in this post, but focus primarily on TOTP.

What is two-factor authentication?

First, let's describe the problem. The point of two-factor authentication is to prevent attackers from getting access to your account. Two-factor authentication requires that two tokens be provided for proof of ownership of the account. The first token is something that we're all familiar with- a username and a password. The second token is a bit more elusive, however. It should be something you have, and only you. No one else should be able to come into possession with the same token. The same should be true for usernames and passwords, but we've seen how easily broken a single-factor authentication system is.

Think of two-factor authentication as layers on an onion. The first layer is your username and password. After peeling away that layer, the second layer is a secret token. Three-factor authentication is also a thing, where you can provide something that you are, such as a fingerprint or retinal scan. That would be the third layer in our onion. You get the idea.

However, the second token should not be public knowledge. It must be kept secret. As such, a shared secret is generated between the client and the server. For both HOTP and TOTP, this is just a base-32 random number. This random number, along with a message turns into an HMAC-SHA1 cryptographic hash (which is defined in RFC 2104; also described at Wikipedia).

Importance of Time

Unfortunately, a "shared secret" is a fairly lame form of authentication. First, the user could memorize the secret, no longer making it something the user has. Second, man in the middle attacks on shared secrets are extremely effective. So, we need the ability to prevent the user from memorizing the shared secret, and we need to make man in the middle attacks exceptionally difficult. As such, we turn our shared secret into a moving target.

HOTP, as already mentioned, is the base from which TOTP comes. HOTP uses the same algorithm as described below in this post, except that rather than using time as the moving factor, an 8-byte counter is changed. We need a moving target, because if the token were static, it would be no different than just a second password. Instead, we need the attacker to be constantly guessing as to what the token could be. So, with HOTP, a token is valid until used. Once used, the counter is incremented, and the HMAC-SHA-1 string is recalculated.

TOTP uses the UNIX epoch as its time scale, in seconds. This means that for TOTP, time starts with January 1, 1970 at 00:00.00, and we count the number of seconds that have elapsed since then. By default, we only look at 30 second intervals, on the minute. This means at the top of the minute (zero seconds past the minute), TOTP is refreshed, and again 30 seconds after the minute. TOTP uses a shared secret between the client and the server, so it's important that both the client and the server clocks are synchronized. However, RFC 6238 does allow for some clock skew and drift. As such, when a code is entered, the server will allow for a 30 second window on either side of the code. This means that the code is actually valid for a maximum of 90 seconds. If using NTP on the server, and a mobile phone for the client, then clock drift isn't a concern, as both will be continuously updated throughout the day to maintain accurate time. If NTP or GSM/CDMA broadcasts are not adjusting the clock, then it should be monitored and adjusted as needed.

Computing the HMAC-SHA-1

Hash-based message authentication codes (HMAC) require two arguments to calculate the hash. First, they require a secret key, and second, they require a message. For TOTP (and HOTP), the secret key is our shared secret between the client and the server. This secret never changes, and is the foundation from which our HMAC is calculated. Our message, is what changes. For TOTP, is the time in seconds, since the UNIX epoch rounded to the nearest 30 seconds. For HOTP, it is the 8-byte counter. This moving target will change our cryptographic hash. You can see this with OpenSSL on your machine:

$ KEY=$(< /dev/random tr -dc 'A-Z0-9' | head -c 16; echo)
$ echo $KEY
$ echo -n '1397552400' | openssl sha1 -hmac "$KEY"
(stdin)= f7702ad6254a06f33f7dcb952000cbffa8b3c72e
$ echo -n '1397552430' | openssl sha1 -hmac "$KEY" # increment the time by 30 seconds
(stdin)= 70a6492f088785444fc664e1a66189c6f33c2ba4

Suppose that our HMAC-SHA1 string is "0215a7d8c15b492e21116482b6d34fc4e1a9f6ba". We'll use this image of our HMAC-SHA-1 to help us identify a bit more clearly exactly what is happening with our token:

Image clarifying the HMAC-SHA-1 string, by using 20 divisions of 1 byte each (two characters per division).

Dynamic Truncation

Unfortunately, requiring the user to enter a 40-character hexadecimal string in 30 seconds is a bit unrealistic. So we need a way to convert this string into something a bit more manageable, while still remaining secure. As such, we'll do something called dynamic truncation. Each character occupies 4-bits (16-possible values). So, we'll look at the lower 4-bits (the last character of our string) to determine a starting point from which we'll do the truncating. In our example, the last 4-bits is the character 'a':

Same 20-division image, highlighting the last character of the HMAC-SHA-1 string- character 'a'.

The hexadecimal character "a" has the same numerical value as the decimal "10". So, we will read the next 31-bits, starting with offset 10. If you think of your HMAC-SHA-1 as 20 individual 1-byte strings, we want to start looking at the strings, starting with the tenth offset, of course using the number "0" as our zeroth offset:

Image clarifying the HMAC-SHA-1 string, by using 20 divisions of 1 byte each (two characters per division).

As such, our dynamically truncated string is "6482b6d3" from our original HMAC-SHA-1 hash of "0215a7d8c15b492e21116482b6d34fc4e1a9f6ba":

Same 20-division image, highlighting the first 31-bits of the HMAC-SHA-1 string, starting with the 10th offset.

The last thing left to do, is to take our hexadecimal numerical value, and convert it to decimal. We can do this easily enough in our command line prompt:

$ echo "ibase=16; 6482B6D3" | bc

All we need now are the last 6 digits of the decimal string, zero-padded if necessary. This is easily accomplished by taking the decimal string, modulo 1,000,000. We end up with "288083" as our TOTP code:

TOTP: 288083

The user then types this code into the form requesting the token. The server does the exact same calculation, and verifies if the two codes match (actually, the server does this for the previous 30 seconds, the current 30 seconds, and the next 30 seconds for clock drift). If the code provided by the user matches the code calculated by the server, the token is valid, and the user is authenticated.


TOTP, and alternatively HOTP, is a great way to do two-factor authentication. It's based on open standards, and the calculations for the tokens are done entirely in software, not requiring a proprietary hardware or software solution, such as provided by RSA SecurID and Verisign. Also, the calculations are done entirely offline; there is no need to communicate with an external server for authentication handling at all. No calling home to the mothership, to report someone has just logged in. It's platform independent, obviously, and is already implemented in a number of programming language libraries. You can take this implementation further, by generating QR codes for mobile devices to scan, making it as trivial as installing a mobile application, scanning the code, and using the tokens as needed.

Two Factor Authentication with OpenSSH

With all the news about Heartbleed, passwords and two-factor authentication, I figured I would finally get two-factor authentication working with my SSH servers. I've known about it in the past, but haven't done anything about it. Now is the time.

To get two-factor authentication working with your OpenSSH server, you need to install the "libpam-google-authenticator" PAM module on your system. Don't let the package name fool you, however. It is developed by Google, but it does not "phone home" to Google servers at all. It also does not require a Google account. Further, the PAM module is Free and Open Source software, licensed under the Apache 2.0 license.

To install the module on Debian-based systems, run the following:

$ sudo aptitude install libpam-google-authenticator

Once installed, run the google-authenticator(1) command, and answer the resulting questions. The questions offer a balance between increased security and convenience of use. You have the opportunity to create an HMAC-Based One-time Password (HOTP), as specified in RFC 4226, or a Time-based One-time Password (TOTP), as specified in RFC 6238. The only difference, is that with HOTP, each code is dependent on the previous codes, whereas with TOTP, each code is dependent on time. If running NTP on your SSH server, and you're using this with a phone, then TOTP would probably be a better bet, as time will be very closely synchronized, and unlikely to fall out of sync.

The command will create an ANSI QR code that you can scan on your phone to setup the codes. Further, it will print 5 backup codes in the event that you lose your phone, or don't have it, but need to login. Print those backup codes, and store them in your wallet. Just in case you need the codes, they are stored in your ~/.google_authenticator file:

$ cat ~/.google_authenticator 

Now you just need to configure OpenSSH to use OTP as part of the authentication process. There are two configuration files to edit. First, you need to edit the /etc/pam.d/sshd config file, and put "auth required" at the bottom of the file:

$ sudo vim /etc/pam.d/sshd
(move to the bottom of the file)
auth required

Now change the /etc/ssh/sshd_config file, to allow a challenge and response as part of the authentication process:

$ sudo vim /etc/ssh/sshd
ChallengeResponseAuthentication yes

Restart your OpenSSH server, and you should be good to go:

$ sudo service ssh restart

For an OTP application that you can install on your Android phone, I would personally recommend FreeOTP by Red Hat. It's Free and Open Source software, unlike the Google Authenticator app, and it has increased security in that codes are not displayed by default. Instead, you must tap the refresh button for that site to display the code. Also, I like the display options better with FreeOTP than with Authenticator, but that's personal choice. There currently is not an iOS version of the application in the Apple App Store, but my understanding is that one will make it there soon.

Anyway, happy two-factor authentication on your OpenSSH server!

Heartbleed And Your Passwords

Recently it was discovered that OpenSSL contained a pretty massive security hole that allowed simple TLS clients to retrieve plain text information from a TLS-protected server using the TLS Heartbeat. The advisory is CVE-2014-0160. This has to be one of the most dangerous security vulnerabilities to hit the Internet in a decade. More information can be found at (ironically enough, using a self-signed certificate as of this writing).

I don't wish to cover all the extensive details of this vulnerability. They are large and high in number. However, I do want to address it from the point of passwords, which is something I can handle on this blog. First, let's review the heartbleed vulnerability:

  1. FACT: This vulnerability was introduced into OpenSSL on December 31, 2011.
  2. FACT: This vulnerability was fixed on April 5, 2014.
  3. FACT: Without the TLS heartbeat bounds check, data is leaked in 64KB chunks.
  4. FACT: A TLS client can expose the server's SSL private key, to decrypt future communication, without any special hardware or software.
  5. FACT: This is not a man-in-the-middle (MITM) attack.
  6. FACT: Armed, an attacker can reveal usernames and passwords without anyone knowing.

To demonstrate this, here is a post by Ronald Prins on Twitter:

He demonstrates it on his blog, showing a censored example. What does this mean? This means that if you logged into a server, using SSL during the past two years, your username and password could already be compromised. This includes your email acount, your bank account, or social media accounts, and any others. Of course, if the service takes advantage of two-factor authentication, then that specific account is likely safe. However, if you share passwords between accounts, additional accounts may not be.

I really wish I was joking.

My advice? Time to change passwords to accounts you think you may have used over the past two years. But, before you do so, you need to know if the service is protected against Heartbleed. You can use and as online Heartbleed testers before logging in with your account credentials.

Protect Against Bit Rot With Parchive


Yes, this post was created on April 1. No, it's not an April Fool's joke.

So, I need to begin with post with a story. In 2007, I adopted my daughter, and my wife decided that she wanted to stay home rather than work. In 2008, she quit her job. She was running a website on my web server, which only had a 100G IDE PATA drive. I was running low on space, so I asked if I could archive her site, and move it to my ext4 Linux MD RAID-10 array. She was fine with that, so off it went. I archived it using GNU tar(1), and compressed the archive with LZMA. Before taking her site offline, I verified that I could restore her site with my compressed archive.

From 2008 to 2012, it just sat on my ext4 RAID array. In 2012, I built a larger ZFS RAID-10, and moved her compressed archive there. Just recently, my wife has decided that she may want to go back to work. As such, she wants her site back online. No problem, I thought. I prepared Apache, got the filesystem in order, and decompressed the archive:

$ unlzma site-backup.tar.lzma
unlzma: Decoder error


$ tar -xf site-backup.tar.lzma
xz: (stdin): Compressed data is corrupt
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Oh no. Not now. Please, not now. My wife has put in hundreds of hours into this site, and many, many years. PDFs, images, other media, etc. Tons, and tons of content. And the archive is corrupt?! Needless to say, my wife is NOT happy, and neither am I.

I'm doing my best to restore the data, but it seems that all hope is lost. From the GNU tar(1) documentation:

Compressed archives are easily corrupted, because compressed files have little redundancy. The adaptive nature of the compression scheme means that the compression tables are implicitly spread all over the archive. If you lose a few blocks, the dynamic construction of the compression tables becomes unsynchronized, and there is little chance that you could recover later in the archive.

I admit to not knowing a lot about the internal workings of compression algorithms. It's something I've been meaning to understand more fully. However, best I can tell, this archive experienced bit rot in the 4 years it was on my ext4 RAID array, and as a result, the archive is corrupt. Had I known about Parchive, and that ext4 doesn't offer any sort of self healing with bit rot, I would have been more careful storing that compressed archive.

Because most general purpose filesystems do not protect against silent data errors, like Btrfs or ZFS, there is no way to fix a file that suffers from bit rot, outside of restoring from a snapshot or backup, or the internal file type has some sort of redundancy. Unfortunately, I learned that compression algorithms have very little to no redundancy in the final compressed file. It makes sense, as compression algorithms are designed to remove redundant data, either using lossy or lossless techniques. However, I would be willing to grow the compressed file, say 15%, if I knew that I could suffer some damage on the compressed file, and still get my data back.


Parchive is a Reed-Solomon error correcting utility for general files. It does not handle Unicode, and it does not work on directories. It's initial use was to maintain the integrity of Usenet posts on the Usenet server against bit rot. It has since been used by anyone who is interested in maintaining data integrity for any sort of regular file on the filesystem.

It works by creating "parity files" on a source file. Should the source file suffer some damage, up to a certain point, the parity files can rebuild the data, and restore the source file, much like parity in a RAID array when a disk is missing. To see this in action, let's create a 100M file full of random data, and get the SHA256 of that file:

$ dd if=/dev/urandom of=file.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 7.76976 s, 13.5 MB/s
$ sha256sum file.img > SUM
$ cat SUM
5007501cb7f7c9749a331d1b1eb9334c91950268871ed11e36ea8cdc5a8012a2  file.img

Should this file suffer any sort of corruption, the resulting SHA256 hash will likely change. As such, we can protect the file against some mild corruption with Parchive (removing newlines and cleaning up the output a bit):

$ sudo aptitude install par2
$ par2 create file.img.par2 file.img
Block size: 52440
Source file count: 1
Source block count: 2000
Redundancy: 5%
Recovery block count: 100
Recovery file count: 7
Opening: file.img
Computing Reed Solomon matrix.
Constructing: done.
Wrote 5244000 bytes to disk
Writing recovery packets
Writing verification packets

As shown in the output, the redundancy of file is 5%. This means that we can suffer up to 5% damage on the source file, or the parity files, and still reconstruct the data. This redundancy is default behavior, and can be changed by passing the "-r" switch.

If we do a listing, we will see the following files:

$ ls -l file.img*
-rw-rw-r-- 1 atoponce atoponce 104857600 Apr  1 08:06 file.img
-rw-rw-r-- 1 atoponce atoponce     40400 Apr  1 08:08 file.img.par2
-rw-rw-r-- 1 atoponce atoponce     92908 Apr  1 08:08 file.img.vol000+01.par2
-rw-rw-r-- 1 atoponce atoponce    185716 Apr  1 08:08 file.img.vol001+02.par2
-rw-rw-r-- 1 atoponce atoponce    331032 Apr  1 08:08 file.img.vol003+04.par2
-rw-rw-r-- 1 atoponce atoponce    581364 Apr  1 08:08 file.img.vol007+08.par2
-rw-rw-r-- 1 atoponce atoponce   1041728 Apr  1 08:08 file.img.vol015+16.par2
-rw-rw-r-- 1 atoponce atoponce   1922156 Apr  1 08:08 file.img.vol031+32.par2
-rw-rw-r-- 1 atoponce atoponce   2184696 Apr  1 08:08 file.img.vol063+37.par2

Let's go ahead and corrupt our "file.img" file, and verify that the SHA256 hash no longer matches:

$ dd seek=5k if=/dev/zero of=file.img bs=1k count=128 conv=notrunc
128+0 records in
128+0 records out
131072 bytes (131 kB) copied, 0.00202105 s, 64.9 MB/s
$ sha256sum -c SUM
file.img: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

This should be expected. We've corrupted our file through a simulated bit rot. Of course, we could have corrupted up to 5% of the file, or 5 MB worth of data. Instead, we only wrote 128k of bad data.

Now that we know our file is corrupt, let's see if we can recover the data. Parchive to the rescue (again, cleaning up the output):

$ par2 repair -q file.img.par2 
Loading "file.img.par2".
Loading "file.img.vol031+32.par2".
Loading "file.img.vol063+37.par2".
Loading "file.img.vol000+01.par2".
Loading "file.img.vol001+02.par2".
Loading "file.img.vol003+04.par2".
Loading "file.img.vol007+08.par2".
Loading "file.img.vol015+16.par2".
Target: "file.img" - damaged. Found 1996 of 2000 data blocks.
Repair is required.
Repair is possible.
Verifying repaired files:
Target: "file.img" - found.
Repair complete.

Does the resulting SHA256 hash now match?

$ sha256sum -c SUM 
file.img: OK

Perfect! Parchive was able to fix my "file.img" by loading up the parity files, recalculating the Reed-Solomon error correction, and using the resulting math to fix my data. I could have also done some damage to the parity files to demonstrate that I can still repair the source file, but this should be sufficient to demonstrate my point.


Hindsight is always 20/20. I know that. However, had I known about Parchive in 2008, I would have most certainly used it on all my compressed archives, including my wife's backup (in reality, a backup is not a backup unless you have more than one copy). Thankfully, the rest of the compressed archives did not suffer bit rot like my wife's backup did, and thankfully, all of my backups are now on a 2-node GlusterFS replicated ZFS data store. As such, I have 2 copies of everything, and I have self healing against bit rot with ZFS.

This is the first time in my life where bit rot has reared its ugly head, and really caused a great deal of pain in my life. Parchive can fix that for those who have sensitive data where data integrity MUST be maintained on filesystems that do not provide protection against bit rot. Of course, I should have also had multiple copies of the archive to have a true backup. Lesson learned. Again.

Parchive is no substitution for a backup. But it is a protection against bit rot.