Image of the glider from the Game of Life by John Conway
Skip to content

Linux Kernel CSPRNG Performance

I'm hardly the first one to notice this, but I was having a discussion in ##crypto on Freenode about the Linux kernel CSPRNG performance. It was mentioned that the kernelspace CSPRNG was "horrendously slow". Personally, I found the performance sufficient for me needs, but I decided to entertain his definition. I'm glad I did; I wasn't disappointed.

Pull up a terminal, and run the following command, passing 10GB of data from /dev/urandom to /dev/null:

$ dd if=/dev/urandom of=/dev/null bs=1M count=1024 iflag=fullblock  
1024+0 records in  
1024+0 records out  
1073741824 bytes (1.1 GB) copied, 80.1537 s, 13.4 MB/s
$ pv < /dev/urandom > /dev/null # cancel in a different terminal, unless you have "-S"
1.02GB 0:01:20 [13.3MB/s] [                   < =>                              ]

13.4 MBps of throughput for reading data directly out of the kernelspace CSPRNG. But, can we do better?

In the ##crypto channel, and as should be across development mailing lists, forums, groups, and discussion channels, I recommend that developers should not generally develop their own userspace CSPRNG. There are all sorts of pitfalls and traps waiting for you when you attempt it. Unless you know what you're doing, you could end up with a CSPRNG that isn't actually cryptographically secure (the "CS" in "CSPRNG").

However, what happens when I do actually run a userspace CSPRNG on the same machine? What can I expect out of performance? For example, I could implement AES-128 in CTR mode as a CSPRNG. In fact, we can do this with OpenSSL:

$ dd if=/dev/zero bs=10M count=1024 iflag=fullblock 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 15.3137 s, 701 MB/s
$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null
31.9GB 0:00:34 [ 953MB/s] [                                  < =>               ]

700-950 MBps (notice that dd(1) incurs a performance penalty). That's 52-70x the speed of reading the kernelspace CSPRNG directly. That's more than a full order of magnitude faster. However, this is on a box with AES-NI. What about disabling AES-NI on the same box? How badly does it damage performance, and how does it compare to reading the kernelspace CSPRNG? We can use OpenSSL speed(1SSL) to benchmark algorithms.

First, with AES-NI enabled:

$ openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr     468590.43k  1174849.02k  1873606.83k  2178642.60k  2244471.47k

And with AES-NI disabled:

$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      74272.21k    83315.43k   340393.30k   390135.47k   391279.96k

In this case, we see about a 5x performance improvement when using the AES-NI instruction set as compared to when not using it. That's significant. And even with AES-NI disabled in userspace, we're still outperforming /dev/urandom by almost 30x.

Interestingly enough, even the OpenBSD CSPRNG (different hardware than previously tested), which uses ChaCha20, outperforms the Linux CSPRNG (although its userspace CSPRNG with openssl(1) doesn't outperform kernelspace):

% dd if=/dev/urandom of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 13.630 secs (78775541 bytes/sec)
% dd if=/dev/zero bs=1M count=1024 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes transferred in 33.498 secs (32052998 bytes/sec)
% openssl speed -elapsed -evp aes-128-ctr 2> /dev/null
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      41766.37k    46930.74k    49593.54k    50669.32k    50678.33k

Roughly 78 MBps for OpenBSD on an Intel Xeon CPU running at 2.80GHz. Basically, six times the speed of the Linux kernel CSPRNG on an Intel Xeon CPU running at 2.67GHz.

So why is the Linux CSPRNG so slow? And, what can we do about it? Well, first, the kernel is using SHA-1 for its cryptographic primitive. In very loose terms, the CSPRNG hashes the input pool with SHA-1, and spits out the output to /dev/urandom. It's output is also its input, so its digesting its own output.

But, that's not all it's doing actually. The first function actually adds data into the input pool without increasing the entropy estimate. Then, after adding those bytes, the input pool is mixed with a Skein-like mixing function. Then some math is done to credit the entropy estimator, and the system is polled for data to add to the input entropy pool. Things like disk IO, CPU timings, interrupts, and user activity. Finally, we're ready to hash the data. This is done by extracting the data out of the input pool, and hashing it with SHA-1. But, we don't want any recognizable output, so the output is left-rotated and folded in half. Then, and only then, is the data ready for consumption.


Unfortunately, the Linux kernel CSPRNG is not based on any sound theoretical security design. It's very much a hodge-podge home-brew design by developers who think they know what they're doing, when in reality, they don't. In 2013, a security audit and analysis was performed on the Linux kernel CSPRNG (PDF), and concluded that not only is it not robust, but it has some weaknesses:

In the literature, four security notions for a PRNG with input have been proposed: resilience (RES), forward security (FWD), backward security (BWD) and robustness (ROB), with the latter being the strongest notion among them.


Distributions Used in Attacks based on the Entropy Estimator As shown in Section 5.4, LINUX uses an internal Entropy Estimator on each input that continuously refreshes the internal state of the PRNG. We show that this estimator can be fooled in two ways. First, it is possible to define a distribution of zero entropy that the estimator will estimate of high entropy, secondly, it is possible to define a distribution of arbitrary high entropy that the estimator will estimate of zero entropy. This is due to the estimator conception: as it considers the timings of the events to estimate their entropy, regular events (but with unpredictable data) will be estimated with zero entropy, whereas irregular events (but with predictable data) will be estimated with high entropy.


As shown in Section 5.7, it is possible to build a distribution D0 of null entropy for which the estimated entropy is high (cf. Lemma 3) and a distribution D1 of high entropy for which the estimated entropy is null (cf. Lemma 4). It is then possible to mount attacks on both /dev/random and /dev/urandom, which show that these two generators are not robust.


We have proposed a new property for PRNG with input, that captures how it should accumulate the entropy of the input data into the internal state. This property actually expresses the real expected behavior of a PRNG after a state compromise, where it is expected that the PRNG quickly recovers enough entropy. We gave a precise assessment of Linux PRNG /dev/random and /dev/urandom security. In particular, we prove that these PRNGs are not robust. These properties are due to the behavior of the entropy estimator and the mixing function used to refresh its internal state. As pointed by Barak and Halevi [BH05], who advise against using run-time entropy estimation, we have shown vulnerabilities on the entropy estimator due to its use when data is transferred between pools in Linux PRNG. We therefore recommend that the functions of a PRNG do not rely on such an estimator.

Finally, we proposed a construction that meets our new property in the standard model and we showed that it is noticeably more efficient than the Linux PRNGs. We therefore recommend to use this construction whenever a PRNG with input is used for cryptography.

TL;DR? The Linux CSPRNG does not meet the definitions of a secure CSPRNG per the PDF. It's not that it's theoretically broken, it's just not theoretically secure either. It's really nothing theoretically at all. This isn't great.

A replacement for random.c in the kernel would be to ditch the homebrew entropy collection, mixing, and output mangling, and instead, stick with AES-128 in CTR mode. Of course, as per the PDF, the entropy collectors need serious work, but if AES-128-CTR was deployed as the CSPRNG instead of SHA-1, then the generator could take advantage of hardware AES performance, which as I've shown, is exceptionally superior. It's frustrating, because the kernel already ships AES, so the code is already there. It's just not being utilized.

The Linux kernel could have 1 GBps in CSPRNG output, but is deliberately choosing not to. That's like having a V12 turbo-charged sleeper, without the turbo, and only firing on 3 of the 12 cylinders, with a duct taped muffler on the back.

Why does 1 GBps of performance matter? How about wiping hard drives or secure data removal in general? With 20 MBps, we can't even saturate a single drive in IOPS. With 1 GBps, we could saturate many simultaneously. As someone who wipes old employee workstations when they leave the company, backup servers with dozens of drives, or old decommissioned hardware, I see great benefit here.

Or, how about HTTPS web sites for a shared web hosting provider? I have seen countless times HTTPS and SSH connections lag due to waiting on the CSPRNG. Not that it's being intentionally blocked, but because the load is so intense on the server, it just can't generate enough cryptographic randomness to keep up with requests.

I'm sure there are plenty of other examples where end userspace applications could benefit with improved performance of the CSPRNG. And, as shown, it can't be that difficult to implement correctly. The real question is, of course, who will do the work and submit the patch?

Cryptographic Hashing, Part I- Introduction


Lately, I've been seeing some discussion online about cryptographic hashing functions, along with some confusion between a cryptographic digest, a cryptographic signature, and a message authentication codes. At least in that last post, I think I did well defining and clarifying the differences between those terms, but I also feel like I could take this discussion a lot further. So, I decided to dedicate a series to generic cryptographic hashing functions, which will include building compression frameworks with security proofs, specific implementations of cryptographic hashing functions, and some implementations of these functions. So, without further ado, let's get started.


When we talk about a hashing function (cryptographic or otherwise), we are referring to any function that can take an arbitrary length of data, and compress it into a fixed-length digest. Typically, this digest is called a "fingerprint", a "checksum", or a "hash". The goal, is that any time we input the same data, our function outputs the same digest. Further, it's important that not only can I produce that digest, but anyone can produce the same digest. This gets us prepared for the Random Oracle, but we still have some ground to cover first.

Because our hashing function has a fixed length output, say 128-bits, then an ideal function would map every input to one of those outputs. In other words, our function maps an element in the domain (our data to be hashed) to exactly one element in the range (our actual hash). So, if our function produces 128-bit digests, then there are a total of 2^128 digests in the range. This means, that we have at least a one-to-one mapping of elements in the domain to elements in the range. Again, speaking about an ideal hashing function.

However, we know that there are many more inputs than just 2^128; there are infinitely many, actually. But think about it for a second. Take the number zero, and send it through our hashing function. Increment that number by 1, then hash that number. Continue in this manner, assuming infinite computing resources and infinite time, until you've hashed every number between 0 and 2^128. Ideally, you've produced exactly 2^128 unique digests. But, what happens when you now want to hash 2^128+1? Now we have what is called a collision. In other words, two distinct inputs was hashed to the same output. To put it formally:

Definition: A collision is when two distinct pieces of data hash to the same digest, checksum, or fingerprint.
Theorem: For any fixed-length hashing function, there are infinitely many collisions.
Proof: This can be proven using the pigeon-hole principle. Given a fixed-length hashing function of n-bits of output, hashing n+1 inputs from the domain will produce a collision in the range. As n tends to infinity, the collisions tend to infinity. Q.E.D.

I don't think I need to tell you how much larger infinity is to 128-bits. As a result, collisions are overwhelming. In fact, would you like to see a collision in practice? Below are 2 different hexadecimal strings. The differences are very subtle, but they indeed distinct (emphasized in bold red). Here, we'll take the two strings, and hash them with the known MD5 algorithm. Then, just to show I'm not cheating, we'll hash the same strings with SHA-1. While we produce a collision in MD5, we have distinct digests with SHA-1. Go ahead, and verify that you get the same results.

$ INPUT1=d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89\
$ INPUT2=d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89\
$ printf "$INPUT1" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT2" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT1" | xxd -r -p | sha1sum
a34473cf767c6108a5751a20971f1fdfba97690a  -
$ printf "$INPUT2" | xxd -r -p | sha1sum 
4283dd2d70af1ad3c2d5fdc917330bf502035658  -

Crazy, right? With an ideal hashing function, it should be at least as difficult as a brute force search to find these collisions, and it should take searching an entire 128-bit domain to find a collision. Unfortunately, however, finding blind collisions with a brute force search turns out to be much faster, thanks to the Birthday Paradox. The Birthday Paradox says the following:

In a room of just 23 people, there is a 50% probability that at least two of them share the same birthday. In a room of just 75 people, there is a 99.9% probability that at least two of them share the same birthday.

Wait, what? Uhm, last I checked, there are 366 days days in a year, assuming leap year. Soooo, if there are 23 people in a room, then there should be a 23/366, or about a 6% probability that two people share the same birthday. Unfortunately, this isn't how it works. There may be a 6% chance someone shares your birthday, but there is a 50% chance two arbitrary people share the same birthday. Now do you see the problem? Not only must you compare your birthday to everyone, but so must everyone else. This is a case of permutations. So, with 23 people in the room, there are actually 253 possible comparisons that must be made (23*22/2). The math gets a little hairy, and to be honest, it's a bit outside the scope of this post, and this series (it's going to be long enough as it is). Refer to the Wikipedia article if you want to work through the theory and the proof.

We can use this Birthday Paradox to work out an attack on finding two distinct inputs that produce an identical digest. This is called the Birthday Attack, and it's the primary driver in finding collisions. The attack basically says something like this:

To find a collision in a n-bit range with approximately 50% probability, you need to only search the the square root of 'n' of elements in the domain.

So, for a 128-bit digest (2^128 possible distinct outputs), using the Birthday Attack, I only need to search 2^64 possible inputs to have approximately a 50% probability that I have found a collision. If you don't think 2^64 is very small, the bitcoin network is currently mining 2^64 SHA-256 digests about every 20 seconds.

Blind, preimage, and second preimage collisions

Armed with this knowledge, we can now formalize some definitions of collision attacks. This might be confusing, so I'll define it first, then give some examples.

Collision attack:
A blind search, where two distinct inputs produce the same digest.
Preimage attack:
A search to find an input that matches a defined digest.
Second preimage attack:
A search to find a second file that matches the digest of a defined file.

Let's break these down individually. A collision search is literally a blind search, without any respect to inputs or outputs. You don't know what the inputs will be nor do you know what their outputs will be. You only know that you have found two distinct inputs that collide to the same output, all of which is entirely arbitrary.

A preimage attack is where you have a digest in your possession, but you would like to find an input that matches it. In this case, while the input is completely arbitrary, the output is static. For example, suppose you have the 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9". A preimage attack would be successfully finding the input that produced it. In this case, it was "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz".

A second preimage attack means having both the input "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz" and its 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9", and finding a second input that produces that same digest.

Usually, when breaking cryptographic hash functions, the first thing to break is the compression function, which I'll cover in later posts. Once the compression function is broken, the next step is to break searching for blind collisions. This is generally done by analyzing the weaknesses in mathematics, find bias in the output, observe the quality of the avalanche effect, and so forth. You eventually learn where the hashing function is weak, and where you can take "shortcuts" to get to your goal. Eventually, the algorithm is broken to the point that finding blind collisions is practical. MD5 is broken in this regard.

After breaking the compressing function, and weakening the algorithm to the point of practical collision attacks, preimage attacks become the next focus of analysis. However, when the compression function is broken, such as in the case of SHA-1, it's a strong sign to start moving away from the algorithm, long before you find collisions. So, analysis tends to slow down after collisions have been found, because no one should be using the function anymore. This also means continuing to find second preimage collisions gets even less attention.

Avalanche Effect

The final property of cryptographic hashing functions that needs to be addressed is the "avalanche effect". It is absolutely critical in cryptographic hashing functions that even though inputs may be sequential, their outputs do not show that to be the case. For example, consider the SHA-256 of the first 10 digits:

$ for I in {1..10}; do printf "$I: "; printf "$I" | sha256sum -; done
1: 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b  -
2: d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35  -
3: 4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce  -
4: 4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a  -
5: ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d  -
6: e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683  -
7: 7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451  -
8: 2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3  -
9: 19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7  -
10: 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5  -

Notice that there is no clear indication on sequential digests. For all practical purposes, they are truly randomized output, despite the sequential input (merely flipping a single bit on each input from the previous). However, can we formally define the avalanche effect? What would be ideal is that with each bit change on the input, every bit in the digest output has as close to a 50% chance of being flipped as theoretically possible.

I'll talk more about "rounds" in future posts when I talk about specific implementations and designs. Suffice it to say that a cryptographic hashing function will iterate through the compression functions a certain number of times, before outputing the state. On each round, the bits in the output should each have a 50% chance of being flipped. So, on each output of each round iteration, close to half of the bits have been flipped in some pseudorandom manner. After a certain number of rounds, the final output should be indistinguishable to true random noise.

So, how about this as a formal definition:

When a single input bit is flipped, each output bit should change with a 50% probability.

Of course, the cryptographic strength doesn't rest solely on the avalanche effect. There are mathematical properties that determine that. But, the output should be completely unpredictable. You could apply the "next bit test", in that there is no algorithm you could produce that would determine the next state of the next bit, without actually compromising the state of the machine (this is a test held to cryptographically secure pseudorandom number generators).

Unfortunately, all we have to test the avalanche effect is standard randomness tests, such as the chi-square distribution, Monte Carlo for Pi, and the Birthday Paradox, among others. This doesn't say anything about the cryptographic strength of the hashing function, but says a lot about randomness properties (non-cryptographic hashing functions can also exhibit strong randomness qualities).

There are a couple software utilities we can use to test and analyze cryptographic hashing functions. First, we have standard randomness tests, such as Dieharder and the FIPS 140-2 suite. But, for something more specific on analyzing cryptographic primitives, I would recommand Cryptol. On the one side, this isn't an out-of-the-box software solution for just running a battery of tests and analysis. It is actually a domain-specific language that will require a bit of a learning curve. On the other hand, it's Free Software, and you'll probably learn more about cryptanalysis with this tool, than just playing with randomness tests.


This was just a primer post to get you thinking about cryptographic hashes, specifically thinking about their output, and the task of finding collisions. The rest of the posts in the series will cover specific functions such as MD5, SHA-1, -2, and -3, as well as some others. We'll talk about hashing constructions, and where you'll find cryptographic functions in practice (I think you'll be surprised). I may even throw in a post or two about random oracles, and how we want cryptographic hashing functions to not only imitate them, but be proven secure under the "Random Oracle Model".

Regardless, this post will get you started, and hopefully excited for what is to come.

Manual Authenticated File Encryption With OpenSSL

One thing that bothers me about OpenSSL is the lack of commandline support for AEAD ciphers, specifically AES in CCM and GCM block modes. Why does this matter? Suppose you want to save an encrypted file to disk, without GnuPG, because you don't want to get into key management. Further, suppose you want to send this data to a recipient or store it on a server outside of your full control. The authenticated encryption is important, otherwise the ciphertext is malleable and vulnerable to bit flipping.

So, when you get to the shell, you may try using AES in GCM mode with OpenSSL's "enc(1)" command, only to be left wanting. Here, we generate a key from /dev/urandom, convert it to hexadecimal, and provide the key as an argument on the command line.

$ LC_CTYPE=C tr -cd 'abcdefghjkmnpqrstuvwxyz23456789-' < /dev/urandom | head -c 20; echo
$ echo sec2tk24ppprcze33ucs | xxd -p
$ openssl enc -aes-256-gcm -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
AEAD ciphers not supported by the enc utility
$ echo $?

So, rather than using GCM, however, we can build the authentication tag manually with HMAC-SHA-512, which OpenSSL does support. This means using a non-authenticated block cipher mode, such as CTR, as a first step, then authenticating the ciphertext manually as a second step.

Using our same password from the previous example, we'll do this in two steps now:

$ openssl enc -aes-256-ctr -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
$ openssl dgst -sha512 -binary -mac HMAC -macopt hexkey:73656332746b323470707072637a6533337563730a -out file.txt.aes.mac file.txt.aes

Now you have three files- your plaintext file, your AES encrypted ciphertext file, and your HMAC-SHA-512 authentication file:

$ ls -l file.txt*
-rw-rw-r--. 1 aaron aaron 1050 Feb 27 10:26 file.txt
-rw-rw-r--. 1 aaron aaron 1066 Feb 27 10:27 file.txt.aes
-rw-rw-r--. 1 aaron aaron   64 Feb 27 10:28 file.txt.aes.mac

When sending or remotely storing the "file.txt.aes" file, you'll want to also make sure the "file.txt.aes.mac" authentication file is accompanied with it. Unfortunately, the OpenSSL dgst(1) command does not support verifying message authentication codes, so you'll have to script this manually. So, you'll need to generate a second file, maybe "file.txt.tmp.mac", then compare the two. If they match, you can decrypt the "file.txt.aes" ciphertext file. If not, discard the data.

This isn't elegant, and I wish enc(1) supported AEAD, but as it stands, it doesn't. So, you'll have to stick with doing things manually. However, this is something simple enough to script, and provides both data confidentiality and authenticity, which should be the goal of every ciphertext.

Digest Algorithms in Google Spreadsheets

I can't imagine there are a lot of uses for using digest algorithms in spreadsheets, but I came up with one, and I really wished I had access to them. Seeing as though most spreadsheet applications don't ship one, I figured I would create my own.

Mostly, I use Google for my document processing and spreadsheet use, and I had a spreadsheet of Louis L'Amour books. My grandfather gave me his entire selection of Louis L'Amour books last year, and I made a goal to read them all during 2016. I grew up listening to them on audiotape in the truck when I was on the road with him laying carpet, heading up to the family cabin in Idaho, and other things, so, I have memories of many of the stories. It will be fun to read them.

So, what do digest algorithms have to do with Louis L'Amour novels? Well, after the spreadsheet was created to track what I've read and what I have left, as well as a pace (I have to be reading at least 60 pages every day), I wanted to start reading the books in a random order. Sure, I'll read the Sackett, Hopalong Cassidy, Talon & Chantry, and Kilkenny series first, but when I'm finished with the series, I want to read the novels in random order. Why? Because I don't want to get caught up in published year watching him change as a writer, or go in alphabetical order, because that's boring. Randomness is exciting!

Now I could have used the =RAND() function in the spreadsheet, but when I sort the columns, the numbers change. So, I need to copy and paste their values, then sort the columns. Besides, is RAND() even cryptographically secure (indistinguishable from true random noise)? Even better, I could just get ASCII data off of /dev/urandom and paste those results into the column, then sort off of that. But that requires using an external tool. However, I could also use a digest algorithm to calculate the digest of the book title, then sort by the digest. Because digest algorithms aren't part of the Google Spreadsheet default functions, my OCD kicked in, and I had to create one.

Here is what I came up with. You'll notice that MD2(), MD5(), and SHA1() are created, even if they're not cryptographically secure for today's modern cryptographic applications. However, in this specific use case, such as sorting columns, they are fine. Also, notice that SHA256(), SHA384(), and SHA512() exist, but not SHA224(). This is because "Utilities.DigestAlgorithm" does not export a "SHA_224" algorithm, which in my opinion, is just odd. MD4 is also not available, nor any of the SHA-3 functions. Regardless, all the digest algorithms supported by the API are available.

// Cryptographic hash functions for use in Google Spreadsheets
// Use with =MD5(string)
// A string or cell (or concatenation of cells) can be provided
// Output in hex
// Released to the public domain

function MD2(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD2, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  return hexstr;
function MD5(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD5, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  return hexstr;
function SHA1(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_1, s)
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  return hexstr;
function SHA256(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  return hexstr;
function SHA384(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_384, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  return hexstr;
function SHA512(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_512, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  return hexstr;

So, how do we apply it? Because this is a script which is applied to your spreadsheet, you need to add it manually every time you create a new spreadsheet document. Supposedly, you can permanently add it through the Chrome Web Store (I created a project that is currently pending review), but for the time being, copying/pasting works.

Navigate to "Tools > Script Editor", remove the default function Google provides, and add the code above. Save it as a project, then it will be available to your spreadsheet. Now you can use it.

For example,

To calculate the MD5 of cell "A1":

To calculate the SHA1 of cells "A1" through "D1":

Retrieve the 12 left-most characters from a SHA512 digest of cells "A1" through "D1":
    =LEFT(SHA512(CONCATENATE(A1:D1)), 12)

I'm sure there are a few uses for digest algorithms in spreadsheets, they're just not very common; at least web searching for their use gives me very scant results. If you find these helpful, even if there are other solutions, I would be interested in how you used them in the comments below.

Oh, and here's my Louis L'Amour spreadsheet tracking my reading progress. 🙂

My Strange Tweets

You may have noticed some tweets from me that look.... strange. Probably something like these:

First, let me provide some background. When Twitter was announced, a couple Free Software developers got together to create a self-hosted Free Software alternative. They called that alternative "Identica", because it was hosted in Canada, and a way to establish your social identity. It made sense, and the Free Software and Open Source ecosystem ate it up. Within no time, it was a thriving online social network, involving mostly those from the Free Software and Open Source world, with all sorts of very influential developers and people creating accounts.

One account that seemed to catch the eye of many was @key. It posted what appeared to be MD5 checksums every 2 hours, regularly and consistently. Plenty of people were following the account, yet it wasn't following anyone. People replied to the tweets, asking what it was posting, who it was, why it was doing what it was doing, if it was a government account, etc. No one could figure it out, and if there were MD5 checksums, no one could reproduce them. It was a social enigma, and it kept people enthralled and engaged.

I thought this was exceptionally creative, and I was quite jealous that I didn't think of it first. The best I figured was that it was posting the timestamp of the tweet with a custom salt. At least, that is what I would have done. It couldn't be an MD5 of random data, otherwise, why not just post the random data? Or is that exactly what it is? So, instead, I decided to play with the Identica API and roll my own, using my own account. I had already setup the "Identica-Twitter bridge", so anything I posted to my Identica account would get posted to Twitter automatically.

But, I have to be different. So rather than a random digest that no one could figure out (I'm sure it's a timestamp), I wanted something a little more transparent. I started with taking the SHA-1 of the Unix epoch (the number of seconds since Jan 1, 1970 00:00.00) at 13:37 local time, because it's leet. This was easily accomplished with a bit of shell code:

$ EPOCH=$(date --date="today 13:37" +%s); printf "$EPOCH: "; printf "$EPOCH" | sha1sum - | cut -d ' ' -f 1

This was the first tweet:

Later however, I wanted something even more creative. I go by the online nick "eightyeight" on IRC, because I play the piano. However, some Asian cultures see the number "8" as lucky. With "Chinese" fortune cookies, I figured I would "encrypt" a fortune at 08:08 local time. Again, I decided to do this with a bit of shell code:

$ fortune -s -n 70 | gzip -c | base64 | rot13 | paste -sd ''

The first tweet to hit that was (testing the API, so this one actually wasn't on 08:08):

However, Identica started going downhill. First, we had big challenges fighting bot spam. Despite repeated bug reports and discussion on the network, very little change was happening in the code to combat the spam (for future reference, just use Hashcash tokens as a proof-of-work for form submissions). Then getting venture capital, and attempting to appeal to the mass market, things started changing. First it rebranded itself as "Status.Net", then we lost threaded replies. The API was no longer Twitter compatible (at least some things were different), and branding got real weird. Then it rebranded itself again under a completely new code rewrite as "", and that is the status today. At this last rebranding, the API was no longer functional, and my scripts stopped. I didn't want to work with the Twitter API, so I didn't bother setting it up again.

It wasn't until some time ago I decided to resurrect my cryptic tweets. However, I made some changes. Instead of using SHA-1, I decided to use RIPEMD-160. Although it hasn't had the mountains of analysis SHA-1 has had, RIPEMD-160 is still considered secure, although with its 160-bit digest size, the security margin might be a bit too slim for some. However, I stuck with the same Unix epoch timestamp automated at 13:37 local time.

Then, after developing my own playing card cipher, and refining it with the help of @timshadel, I decided to actually attempt a legitimate (if still insecure) cipher with Talon. It's still a fortune (BOFH style) and it's still published at 08:08 local time for the same reasons. If you want a crack at decrypting it, check out my playing card cipher repository at There should be a new one every day, but it may be possible that the fortune is 1 character too long, and as a result, it doesn't get posted (I've accounted for this, but I'm sure I've missed something).

What's the point? Nothing more than just a bit of fun. It's probably not something you're interested in seeing on your timeline, and I don't blame you. Granted, there will be one of each every day. If you don't have a busy timeline, I guess it could get a bit old. But, I don't plan on stopping, nor using a separate account.

Checksums, Digital Signatures, and Message Authentication Codes, OH MY!

I recently submitted a bug to the Vim project about its Blowfish encryption not using authentication. Bram Moolenaar, the lead developer of Vim, responded about using checksums and digital signatures. I hope he doesn't mind me using him as an example here, but I want to quote the relevant bits (emphasis mine):

The encryption is meant to avoid other people, who don't have the key, from reading the text. It does not have the goal of protecting manipulation of the text, that is something else. You could add a checksum even when not using encryption. I believe it's called signing.

Unfortunately, Bram is confusing checksums, digital signatures, and message authentication codes, all rolled up into one. I don't blame him. This is a topic that is not well understood by those not intimately familiar with cryptography. In a nutshell, each provide data integrity at the core. Where they differ is whether or not you're using encryption keys, and whether or not those encryption keys are symmetric or asymmetric. So, in this post, I would like to break it down.


Checksums do not require any sort of encryption key. They are simply digests, or "fingerprints" that represent some data. When you download a piece of software from the Internet, there may be a file with an MD5, SHA-1, or SHA-256 hash of the file. This is the software vendor providing a way for you to verify that you got all the correct bits when the download completes.

For example, suppose you wish to download the latest Debian 8.3.0 amd64 ISO from Notice that there are the following files: MD5SUMS, SHA1SUMS, SHA256SUMS, & SHA512SUMS. Part of the SHA256SUMS file looks like this:

$ head SHA256SUMS
1dae8556e57bb04bf380b2dbf64f3e6c61f9c28cbb6518aabae95a003c89739a  debian-8.3.0-amd64-CD-1.iso
89facfbb5039e49d4e3eeff1cca6ab55e9121ff46affeb46ed510c11731acf41  debian-8.3.0-amd64-CD-10.iso
7f6bc807d3636975374b937c2724353f7468ecd7a61e60f2a8b71f92eeefe629  debian-8.3.0-amd64-CD-11.iso
bd99b7c274ea400b50960ab9e46dd23bad76f87574d2ceee1e8e43859fbd045b  debian-8.3.0-amd64-CD-12.iso
e85679304a509593526cffa77ff0d675329565eb4430444ee2c0d2cdd87842a8  debian-8.3.0-amd64-CD-13.iso
69f727bceb0460957bbd5023fe79749c6bf9f0e3a1b89945e6c63c6b3f04f509  debian-8.3.0-amd64-CD-14.iso
d1dab389f8cb794013986d2da8a6dc72c0be8bc932fcc6d7291cb09b418724d5  debian-8.3.0-amd64-CD-15.iso
913b5d89322b500a02f699d44778901cb59aae909f09bff64963115143c2a6ca  debian-8.3.0-amd64-CD-16.iso
0638aca6f59a8f5bec6d1cd4d272cea01758c2b2d6ec1412048ecb78ef684a77  debian-8.3.0-amd64-CD-17.iso
6f17742fbc82828f04da39f66647e958b0ac667cb4d2a40c9888c749680f1eb8  debian-8.3.0-amd64-CD-18.iso

So, when downloading "debian-8.3.0-amd64-CD-1.iso", I can use the sha256sum(1) command to verify the file:

$ sha256sum debian-8.3.0-amd64-CD-1.iso
1dae8556e57bb04bf380b2dbf64f3e6c61f9c28cbb6518aabae95a003c89739a  debian-8.3.0-amd64-CD-1.iso

The digest matches, so the download was successful and all the correct bits exist. Another way would be to download the SHA256SUMS file, and use the "-c" switch for the utility to verify the checksum automatically, rather than you eyeballing it:

$ sha256sum -c SHA256SUMS 
debian-8.3.0-amd64-CD-1.iso: OK

The important thing to understand about checksums, is they are completely and totally anonymous. There is no secret shared between the server where I downloaded the software and myself, and there is no identity attached to the checksum. This means that anyone can change the original file and recalculate the checksum. So, if transferring data over the Internet, there is nothing preventing a man-in-the-middle attack from replacing the bits you're downloading with something else, while also replacing the checksum.

In other words, checksums provide data integrity, but they do not offer any sort of authentication. However, there are a number of checksum hashing functions, both cryptographically secure and not, such as CRC, MurmurHash, MD5, SHA-1, SHA-2, SHA-3, and so forth. For non-authenticated data integrity, a cryptographically secure hash function isn't always desirable, which is why non-cryptographic hash functions exist.

Digital Signatures

Digital signatures are a form of checksum, in that they provide data integrity, but they require asymmetric encryption to also provide authenticity. Digital signatures are away to attach an identity to the checksum. This implies a level of trust between you and the 3rd party, such as Debian with our example above. If you have met with the 3rd party, or dealt with them enough to establish some level of trust, then you can install the 3rd party's public key into your system. Then, when they provide data attached with a digital signature, you can verify that the data did in fact come from the 3rd party, and no other source.

Notice that a man-in-the-middle attack is no longer valid here, if I already have the 3rd party's public key installed on my system. Going back to our example with Debian, I have the Debian signing public key already installed. So, I can now download the MD5SUMS.sign, SHA1SUMS.sign, SHA256SUMS.sign, or SHA512SUMS.sign file, along with the checksums file I already downloaded, and verify that the checksums are those intended by Debian:

$ gpg --verify SHA256SUMS.sign                   
gpg: assuming signed data in `SHA256SUMS'
gpg: Signature made Sun 24 Jan 2016 11:08:33 AM MST using RSA key ID 6294BE9B
gpg: Good signature from "Debian CD signing key <>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: DF9B 9C49 EAA9 2984 3258  9D76 DA87 E80D 6294 BE9B

If we look at the contents of the SHA256SUMS.sign file, we get the following:

Version: GnuPG v1


The details of a PGP signature aren't important for this blog post. Suffice it to say that it requires the signer's private key to produce the signature, and the signer's public key to verify it. The sender will hash the message with their private key, and append the signature to the message. When the recipient receives the message, they will separate the signature from the message, hash the message, and verify that the two signatures match using the sender's public key. If they do match, the recipient knows that only the sender signed the data, and no one else.

Because the private key is required to create the signature, and because only the 3rd party should have access to the private key, this means that a man-in-the-middle attack is no longer effective. A miscreant should not be able remove the signature, apply a new signature, and have the recipient still verify the signature as "good" from Debian, unless that miscreant also had access to Debian's private signing key.

To make an example of an existing software vendor, Arch Linux was under heat about this. A core developer strongly disagreed about digitally signed packages. They would provide their software from their repositories with MD5 checksums only. The packages were not digitally signed.

So, when your local Arch Linux installation would request packages from the Arch Linux software repository, unless served over HTTPS, a man-in-the-middle could interject their own bits with their own MD5 checksum. Your pacman(8) package manager would verify that the MD5 is valid, and proceed to install the software with root privileges, because that is what you told it to do. By also digitally signing the package with an Arch Linux signing key, this attack is no longer possible.

Eventually, Arch Linux fixed the vulnerability, and closed a very large security hole, by digitally signing their packages.

As with checksums, digital signatures really should be using a cryptographically secure hashing function as part of the protocol. This can include RIPEMD160, SHA-2, SHA-3, BLAKE2, Skein, and others. MD5 and SHA-1 are no longer considered cryptographically secure, and should not be used with digital signatures (thus why SHA-256 SSL certificates instead of SHA-1).

Message Authentication Codes

Finally, message authentication codes (also called "MAC tags") are another way to provide data integrity with authentication, but this time using symmetric encryption. Where digital signatures imply a physical identity behind the authentication, MAC tags provide anonymous authentication. Generally, symmetric keys don't have identities associated with them. They're usually short-lived and shared via complex key exchanges, such as the Diffie-Hellman key exchange.

A MAC tag is keyed, meaning a shared secret is used when calculating the digest. There are a number of different implementations of MACs, such as CBC-MAC, HMAC, UMAC, & Poly1305, among others. The differences between each of those isn't important for this post. What is important, is how they are calculated, and how they are used with encryption.

The sender of some message will apply a cryptographic hashing function to the message, and append (or prepend) the resulting digest to the message, and send the full payload off. Because both the ciphertext and the MAC were calculated with a shared secret key, a man-in-the-middle cannot strip the MAC tag and apply their own without knowing the shared secret. Because, when the recipient receives the payload, they will strip off the MAC tag, rehash the message with the same keyed hashing function, and see if the two MAC tags match. If they do match, the message can be acted upon. If they do not match, something happened to the data in transit, and the payload can be safely ignored.

There are three main ways to apply MAC tags to messages: encrypt-then-MAC, MAC-then-encrypt, and encrypt-and-MAC. The first, encrypt-then-MAC, is considered "best practice" for message authentication. First the message is encrypted, then the ciphertext is authenticated, and the resulting MAC is appended to the ciphertext. This provides both ciphertext and plaintext integrity. The big advantage to this approach, is that if the MAC tag does not match the newly calculated MAC tag during verification, the ciphertext does not need to be decrypted. This is the default approach with IPsec and modern versions of OpenSSH. RFC 7366 standardizes this for TLS (yet to be implemented by OpenSSL last I checked). Also an ISO/IEC 19772:2009 standard.

Encrypt-then-MAC flowchart.

The next approach, MAC-then-encrypt, means authenticating the plaintext, appending the resulting MAC tag to the plaintext, and then encrypting the full plaintext and MAC tag payload. While this approach offers plaintext data integrity, it does not offer ciphertext integrity. As such, the ciphertext must be decrypted before the MAC tag can be verified. This is the default behavior in older versions of OpenSSH.

MAC-the-encrypt flowchart

Finally, encrypt-and-MAC, means authenticating the plaintext first, then encrypting the plaintext. The resulting MAC tag is appended to the ciphertext. Again, like MAC-then-encrypt, this approach offers plaintext data integrity, but it does not offer ciphertext integrity. So, you must detach the MAC tag first, then decrypt the ciphertext, then verify if the MAC tag is valid. This is the default behavior with OpenSSL.


As I understand it, there are no known vulnerabilities with MAC-then-encrypt and encrypt-and-MAC MACs. However, by having both ciphertext and plaintext integrity with encrypt-then-MAC, as well as not needed to decrypt the ciphertext on failure, is why encrypt-then-MAC is the preferred way to handle message authentication.

As with digital signatures, MACs should be calculated with cryptographically secure hashing functions, such as RIPEMD160, SHA-2, SHA-3, BLAKE2, Skein, etc. MD5 and SHA-1 would not qualify (although we could get into a discussion about HMAC-MD5 and HMAC-SHA1, but we won't).


No doubt, it's confusing to separate checksums from digital signatures from message authentication codes. Things get even a bit more hairy with blind signatures (used primarily in digital currencies) and Merkle trees (used primarily in peer-to-peer networks and copy-on-write filesystems), but they're special cases of the primary three functions discussed above. However, if you can get checksums, digital signatures, and message authentication codes cleared up, then you're that much closer to implementing cryptographic protocols correctly.

Bitcoin Mining Rate and Waste

Recently, the Bitcoin mining rate surpassed 1 exahash per second, or 1 quintillion SHA-256 hashes per second.

Bitcoin mining graph showing gigahashes per second over time.

If we do some quick math, we can determine the following:

  • If SHA-1 collisions can be found in 2^65.3 hashes, that's one SHA-1 collision found every 45 seconds.
  • Every combination of bits can be flipped in an 84-bit keyspace every year.
  • If mining is done strictly with ASICs and each ASIC can produce 1 trillion hashes per second, that's 1,000,000 ASICs.
  • If each ASIC above consumes 650 Watts of power, that's 650 Megawatts of power consumed.
  • At 650,000 kWh per ASIC, that's 1.3 million pounds of CO2 released into the atmosphere every hour if using fossil fuels.
  • Current global rate is about 160 Bitcoins mined per hour.
  • At $0.15 USD per kWh, that's $609 spent on electricity per Bitcoin mined. Bitcoin is currently trading at $376/BTC.

That's "back of the envelope" calculations, with some big assumptions made about the mining operation (how it's powered, who is powering it, etc.).

Of course, not all mining is using fossil fuels, not all miners are using ASICs, not all ASICs can do 1 trillion hashes per second (some more, some less), not all ASICs are consuming that wattage per rate, and the cost of electricity was strictly a U.S. figure. Of course, if you're using a GPU, or worse, a CPU for your mining, then you expending more electricity per the rate than ASICs are. That may help balance out some of the miners who are using renewable energy, such as solar power for their mining. Many Chinese and Russian mining data centers certainly have less overhead costs on electricity. You get the point- we've made some big assumptions, to come to some very rough "ballpark" figures. I don't think we're too far off.

So, according to those numbers, unless using renewable energy, cheaper electricity, or Bitcoin trading goes north of $610 USB/BTC mining for Bitcoin is a net loss. This comes at the expense of 1.3 million pounds of CO2 released into the atmosphere every hour. I would argue that Bitcoin is the worst idea to come out of Computer Science in the history of mankind.

Using Your Monitors As A Cryptographically Secure Pseudorandom Number Generator

File this under the "I'm bored and have nothing better to do" category. While coming into work this morning, I was curious if I could use my monitors as a cryptographically secure pseudorandom number generator (CSPRNG). I don't know what use this would have, if any, as your GNU/Linux operating system already ships a CSPRNG with /dev/urandom. So, in reality, there is really no need to write a userspace CSPRNG. But what the hell, let's give it a try anywho.

The "cryptographically secure" piece of this will come from the SHA-512 function. Basically, the idea is this:

  • Take a screenshot of your monitors.
  • Take the SHA-512 of that screenshot.
  • Resize the screenshot to 10% it's original size.
  • Take the SHA-512 of that resized file.
  • Take the SHA-512 of your previous two SHA-512 digests.
  • Take the last n-bits of that final digest as your random number.

Most GNU/Linux systems come with ImageMagick pre-installed, as well as the "sha512sum(1)" function. So thankfully, we won't need to install any software. So, here's a simple shell script that can achieve our goals:

# Produces random numbers in the range of [0, 65535].
# Not licensed. Released to the public domain.

cd ~/Private # assuming you have an encrypted filesystem mounted here

TS1=$(date +%Y%m%d%H%M%S%N)
import -window root ${TS1}.png
sha512sum ${TS1}.png > /tmp/SHA512SUMS

TS2=$(date +%Y%m%d%H%M%S%N)
convert -scale 10% ${TS1}.png ${TS2}.png
sha512sum ${TS2}.png >> /tmp/SHA512SUMS

DIGEST=$(sha512sum /tmp/SHA512SUMS)
printf "%d\n" 0x$(printf "$DIGEST" | awk '{print substr($1, 125, 128)}')

shred ${TS1}.png ${TS2}.png
rm ${TS1}.png ${TS2}.png
shred /tmp/SHA512SUMS
rm /tmp/SHA512SUMS

Running it for 10 random numbers:

$ for i in {1..10}; do sh; done

A couple things to note:

  • This is slow, due to taking the screenshot, and resizing it.
  • The data on your monitors should be sufficiently random. Chats, social updates, etc. The security of this will depend entirely on the entropy of the initial screenshot.
  • You really should be saving your screenshots to an encrypted filesystem, such as eCryptfs.
  • We're using timestamps with nanosecond accuracy to provide some additional entropy for the final SHA-512 digest.
  • This is using the last 4 hexadecimal characters to be converted to decimal. In reality, it could be anything, including some convoluted dynamic search algorithm in the string.

It's worth noting that the entropy of the initial screenshot is critical, which is actually difficult to accurately measure. So, it may help to have a chat window or more open, with recent chat logs. Same could be said for social update "walls", with the most recent updates (Twitter, Facebook, Goodreads, etc.). Having a clock with seconds ticking in a status bar can also help (although not unpredictable, at least semi-unique). Tabs in browsers, running applications, etc. The more unpredictable your workspace in the screenshot, the better off you'll be. But, people in general suck at randomness, so I'm not advocating this as something you should rely on for a cryptographically secure random number generator.

If you wanted, you could add this to a terminal, giving you a sort of "disco rave" before taking the screenshot:

# Disco lights in your terminal
# No license. Released to the public domain

while true; do
    printf "\e[38;5;$(($(od -d -N 2 -A n /dev/urandom)%$(tput colors)))m•\e[0m"

Get that running first, then take your screenshot. But then, if you're reading data off of /dev/urandom, you might as well do that for your random numbers anyway...

Disable Pocket From Iceweasel

I'm not sure who I should be more disappointed in- Mozilla or Debian. Iceweasel 43 recently arrived in Debian unstable, and with it, Pocket. For those who are not familiar, Pocket is a 3rd party service that allows users to save sites they want to read or visit for later. Provided the extension is installed, this allows users to sync pages they want to read for later, across devices and platforms.

But here's the catch: it's a proprietary non-free service-as-a-software-substitue (SaSS).

Thankfully, you can disable it, and it really isn't that difficult. Open up about:config in a new tab, and type "pocket" into the search filter. From there, set "browser.pocket.api" and "" to "localhost", and set "browser.pocket.enabled" to "false", then restart your browser.

Screenshot showing about:config with the above settings in place.

It really bothers me that Mozilla has enabled this sort of integration into their browser. Not only Pocket, but other proprietary or privacy invasive plugins and extensions also, such as "sponsored tiles" (which is finally removed), "encrypted media extensions", and "Hello" (which I haven't figured out how to disable). These sorts of things should be separate extensions or plugins that the user can install at their whim. Shipping it by default takes away freedom and choice, and it's turning the browser into a proprietary non-free software application.

What ultimately bothers me about this, is that Mozilla already has bookmark synchronization support, and their sync server is Free Software, allowing you to roll your own. Pocket doesn't offer anything that Mozilla Sync doesn't. I already have a "TOREAD" bookmark folder, where I can put pages I want to read later. And it's synched across all of my devices.

Mozilla pushing the 3rd party proprietary Pocket, and Debian shipping it in Iceweasel (thankfully, a bug is submitted) is a great disservice to users and a threat to software freedom.

Hopefully, Pocket goes the way of sponsored tiles, and gets removed.

Encrypted Account Passwords with Vim and GnuPG

Screenshot of terminal showing hidden password with Vim syntax highlighting


I've been a long-time KeepassX user, and to be honest, I don't see that changing any time soon. I currently have my password database on an SSH-accessible server, of which I use kpcli as the main client for accessing the db. I use Keepass2Android with SFTP on my phone to get read-only access to the db, and I use sshfs mounts on my workstations with KeepassX for read-only GUI access. It works great, and allows me to securely access my password databases from any client, mobile or otherwise.

However, I recently stumbled on this post on how to use Vim with GnuPG to create an encrypted file of passwords: I've heard about a GnuPG plugin for Vim for years now, and know friends that use it. I've even recommended that others use it as a simplistic means of keeping an encrypted password database, instead of relying on 3rd-party tools. However, I've never really used it myself. Well, after reading that post, I decided to give it a try.

Defining a specification

Ultimately, everything in that post I'm carrying over here, with only a couple modifications. First, fields should end with a colon, which include the comments. Comments could just be just a single line, or multi-line, but it's still a field just as much as "user" or "pass". Further, there should be a little flexibility in the field keywords, such as "user" or "username". Additionally, because I exported my Keepass db to an XML file, then used a Python script to convert it into this syntax, I also carried over some additional fields. So, I've defined my database with the following possible fields:

  • comment|comments
  • expire|expires
  • pass|password
  • tag|tags
  • type
  • url
  • user|username

Notice that I did not define a "title" as would be the case in the Keepass XML. The entry itself is the title, so I find this redundant. Also, you'll noticed I defined an additional "type" field. While not explicitly defined in the Keepass XML, it is implicitly defined with icons for entries. This could be useful for defining "ssh" vs "mysql" vs "ldap" vs "http" authentications when doing searching in the file.

So, an invalid example on is:

Super Ecommerce{{{
    user:   foobar
    pass:   g0d
        birthday:   1/1/1911
        first car:  delorean

This is invalid due to the "Comments" field. Fixed would be:

Super Ecommerce{{{
    user:   foobar
    pass:   g0d
        birthday:   1/1/1911
        first car:  delorean

Another valid entry could be:

Example {{{
    username: aarontoponce
    password: toomanysecrets
    type: http
    tags: internet,social,2fa
    comments: {{{
        backup codes: vbrd83ezn2rjeyj, p89r4zdpjmyys2k, rdh6e7ubz8vh82g, er4ug6vp25xsgn5
        2fa-key: "3udw mkmm uszh cw2a 5agm 7c3p 5x32 tyqz"

Notice that I have not defined file comments, such as those found in configuration files or source code. There is a comment section per entry, so that seems to be the fitting place for any and all comments.

I really liked the post, and how thought out the whole thing was, including automatically closing the PGP file after an inactivity timeout, automatically folding entries to prevent shoulder surfing, and clearing the clipboard when Vim closes. However, one oversight that bothered me, was not concealing the actual password when the entry is expanded. Thankfully, Vim supports syntax highlighting. So, we just need to define a filetype for GnuPG encrypted accounts, and define syntax rules.

Vim syntax highlighting

EDITED TO ADD: I tried getting the Vim syntax working in this post, but WordPress is clobbering it. So, you'll need to get it from pastebin instead. Sorry.

To get this working, we need a syntax file. I don't know if one exists already for this syntax structure, but it isn't too difficult to define one. Let's look at what I've defined in this pastebin, then I'll go over it line-by-line.

The first four lines in the syntax file define are just comments. Next is just a simple if-statement checking if syntax highlighting is enabled. If so, use it. The first interesting line is the following:

let b:current_syntax = "gpgpass"

This defines our syntax. Whenever we load a file with syntax highlighting enabled, and we set the "filetype" to "gpgpass", this syntax will be applied.

syntax case ignore

This just allows us to have "comment" or "Comment" or "COMMENT" or any variations on the letter case, while still matching and proving a highlight for the match.

After that, we get into the meat of it. This "syntax match" section allows me to conceal the passwords on the terminal to prevent shoulder surfing, even when the entry is expanded. this is done with setting the background terminal color to "red" and the foreground text color also to "red". Thus, we have red text on a red background. The text is still yankable and copyable, even with the mouse cursor, it's just not visible on screen.

The actual concealment is done with the regular expression. An atom is created to match "pass:" or "password:" surrounded by whitespace as the first word on the line. However, I don't want to conceal the actual text "pass:", just the password itself. So, the regular expression "\@<=" says to ignore our atom in the match, and only match "\S\+" for concealing. The concealment is achieved with red foreground text on a red background with:

highlight gpgpassPasswords ctermbg=red ctermfg=red

The rest of the syntax matching in that pastebin is for identifying our fields, and highlighting them as a "Keyword" using regular expressions. All field names will be highlighted the same color based on your colorscheme, as they are all defined the same. Thus, aside from the hidden password, there is uniformity and elegance in the presentation of the syntax.

Using the syntax in Vim

This syntax file won't do us much good if it isn't installed and Vim isn't configured to use it. We could save it system-wide to "/usr/share/vim/vim74/syntax/gpgpass.vim", or just keep it in our home directory at "~/.vim/syntax/gpgpass.vim". Whatever works.

Now that the syntax file is installed, we need to call it when editing or viewing GnuPG password files. We can use the vimrc from, with one addition- we're going to add "set filetype=gpgpass" under the "SetGPGOptions()" function. Now, I understand that you may edit encrypted files that are not GnuPG password files. So, you're going to get syntax highlighting in those cases. Or, you could enable the modeline and set a modeline in the password file. The problem with the modeline, is its long history of vulnerabilities. Most distributions, including Debain, disable it, and for good reason too. So, I'd rather have it set here, and unset the "filetype" if it's bothering me.

Here's the relevant config:

if has("autocmd")
    " GnuPG Extensions "

    " Tell the GnuPG plugin to armor new files.
    let g:GPGPreferArmor=1

    " Tell the GnuPG plugin to sign new files.
    let g:GPGPreferSign=1

    augroup GnuPGExtra
    " Set extra file options.
	autocmd BufReadCmd,FileReadCmd *.\(gpg\|asc\|pgp\) call SetGPGOptions()
    " Automatically close unmodified files after inactivity.
	autocmd CursorHold *.\(gpg\|asc\|pgp\) quit
    augroup END

    function SetGPGOptions()
    " Set the filetype for syntax highlighting.
	set filetype=gpgpass
    " Set updatetime to 1 minute.
	set updatetime=60000
    " Fold at markers.
	set foldmethod=marker
    " Automatically close all folds.
	set foldclose=all
    " Only open folds with insert commands.
	set foldopen=insert
endif " has ("autocmd")


What I like about this setup is the portability and simplicity. I am in a terminal on a GNU/Linux box most of my waking hours. It makes sense to use tools that I already enjoy, without needing to rely on 3rd party tools. This also closes the gap of potential bugs with 3rd party password managers leaking my passwords. I'm not saying that Vim and GnuPG won't be vulnerable, of course, but I do place more trust in these tools than the Keepass ones, to be honest.

As of right now, however, I am still a Keepass user. But, I wanted to put this together, and try it out for size, and see how the shoe fits. As such, I've exported my KeepassX database, encrypted it with GnuPG, configured Vim, and I'm off to the races. I'll give this a go for a few months, and see how I like it. I know it's going to pose issues for mp on my phone, even with ConnectBot and SSH keys. But, maybe I don't need it on my phone anyway. Time will tell.

Oh, and I can still view the database as read-only and still enjoy the syntax highlighting benefits by using "view /path/to/passwords.gpg" instead of "vim /path/to/passwords.gpg".

Multiple Encryption

I hang out in ##crypto in Freenode, and every now and then, someone will ask about the security of multiple encryption, usually with the context that AES could be broken in the near future. When talking about multiple encryption, they are usually referring to cascade encryption which has the form of:

CT = Alg_B(Alg_A(M, key_A), key_B)

The discussion revolves around the differences between "Alg_A" and "Alg_B". Such as using AES for "Alg_A" and Camellia for "Alg_B". Also, the discussion will include whether or not "key_A" and "key_B" should be the same key, or different.

Cascade encryption is more efficient in storage space than some alternatives, such as this one suggested by Bruce Schneier:

CT = Alg_A(OTP, key_A) || Alg_B(XOR(M, OTP), key_B),  where OTP is a true one-time pad

I'm not going to go into the theoretical concerns with multiple encryption. However, I would like to cover some practical considerations:

  1. Multiple key security.
  2. Long-term storage.
  3. Complexity.
  4. Host security.

Multiple key security

It should come as no surprise that when dealing with multiple encryption, that you are going to be dealing with multiple keys, if you choose to keep "key_A" and "key_B" separate. Probably the most difficult aspect of encryption implementations, is keeping the secret key secret. For example, key exchanges between machines over the scary Internet has been notoriously difficult to get correct. Current best practice is implementing authenticated ephemeral elliptic-curve Diffie-Hellman (ECDHE) when communicating secret symmetric keys between machines. So, not only do you need to communicate one key, but multiple keys when encrypting and decrypting data.

If the multiple-encrypted data is to be stored on disk, then keys will need to be retrieved for later. How are these stored? This isn't an easy question to answer. If you store them in a password manager, they are likely just getting single-encrypted, probably with AES. So, the security of your ciphertext rests on the security of your stored keys, likely protected by the very algorithm you are trying to safe-guard.

Now, you could use the same key for every encryption layer. But, this poses a theoretical concern (which I promised I wouldn't cover- sorry). If the same key is used for every layer, then if an attacker can recover the key through cryptanalysis of the first encryption layer, then the attacker could possibly decrypt all remaining layers. Obviously, you don't want to use ciphers where the decryption process is exactly the same as the encryption process. Otherwise, the second encryption process on the ciphertext would decrypt the first encryption! While not probable, this last scenario could even occur with different algorithms, such as AES and Camellia. So, it seems at least at a cursory glance, that using the same key for all encryption layers probably is not a wise idea. So, we're back to key management, which is the bane of cryptographers everywhere.

Long-term storage

In my opinion, a larger problem is that of storage. It's one thing to get multiple encryption correct and safe on the wire, it's another to place value on long-term data storage. Think about it for a second- what is the longest you have kept data on the same drive? In personal scenarios, I have some friends that have had personal backups for up to five years. To me, this is impressive. It's likely more common that data switches drives every couple of years. RAID arrays die, hardware is replaced, higher drive capacity is demanded, or even bit rot creeps in, destroying data (such as on magnetic or optical mediums). When push comes to shove, the encrypted data is just going to move from drive-to-drive. But, ask yourself this next question- what is the oldest data you have in your possession right now?

Let's be realistic here for a second. I would be hard-pressed to find data I stored back in 2000, 15 years ago. I could find some photos in photo albums, on mugs, and on Christmas cards, but I'm not 100% confident I could get the digital original. Despite my best efforts, accidents happen, mistakes are made, and data is just lost. I don't think I'm alone here. I've even worked for companies, with large budgets, that had a hard time recovering data that is 10+ years old. For one, it's expensive to hold on to data indefinitely, but a great amount of data also becomes less and less valuable as time progresses. Yes, I still use my same email account from 2004- Google has done a great job of keeping all of my emails these past 11 years, and I would expect other data service providers to do the same. But, how many of you have kept an email address for 10+ years? Or even the data, for that matter? (This blog is actually 11 years old as well- kudos to me on keeping the data going this long).

My point is, hardware fails and is changed. Your personal value on data also changes, and accidents happen. So you're concerned about AES being broken in 20 years, or even sooner. Do you think by that time you'll still place value on that encrypted data? Do you think you'll even still have access to it, or can find it? And, if so, will it really be that difficult to decrypt the AES data, and encrypt it with the current best practice encryption algorithm?


This is probably the problem you should be concerned with the most. As a collective group, we as developers have a hard time getting single encryption correct, let alone multiple encryption. This deeply enters the theoretical realm, which I promised I wouldn't blog about. But, you do have a practical concern as well- order of operations and correct implementations.

First, order of operations. It's one thing to do "double encryption", where only two algorithms are chosen and used. If you can't recall if you used AES first or second, it's a 50/50 shot at getting the order correct (provided you know which key belongs to which algorithm, otherwise it's a one-in-four chance). Imagine however, using three encryption layers, and lining the keys up correctly. Imagine the complexity of four layers, or more. Ugh. Seems like you certainly don't want to go higher than two layers.

Second, look at implementations. AES is AES. It shouldn't matter what algorithm does the calculations. But, implementations like to put "magic bytes" at the beginning of ciphertexts (OpenSSL, OpenPGP, etc.). This data is only valuable for that implementation, and even worst, for a specific subset of versions. Just imagine encrypting a file with OpenSSL version 1.0 now, and needing to decrypt it in 10 years. Will OpenSSL version X be able to read those magic bytes, and correctly decrypt the file? Or will it error out, unable to decrypt the data because the data structure of the magic bytes changed in that 10 year time frame?

So, it seems best to encrypt it with some programming language library, where you can control exactly what data is stored. But, as everyone will tell you while frothing at the mouth, "don't roll your own crypto". Technically, you aren't if you "import aes" and use the "aes" module provided by that language correctly. It just remains to be seen if you implemented it correctly to thwart an attacker. Crypto is hard and full of sharp edges. It's very difficult to get things right, without getting cut. Regardless, while the "aes" module might be available in 10 years, what about the "camellia" module, or whatever algorithm you chose for the second layer? Is it still in development, or was it abandoned due to either being broken, or lack of development? Can you find that module, so you can decrypt your data?

Host Security

In a more practical real-world, everyday person scenario, how secure is the host that is doing the multiple encryption? Do others have physical access to the machine? Is it free of viruses, malware, and other badware? Does the system run an encrypted filesystem? Where and how are backups stored? Who has access to those backups? So many more questions can be asked that judge the quality of the security level of the host storing or processing the data.

Viruses and malware would probably be my number one concern if the data was so valuable, as to be multiple-encrypted. So, I would probably encrypt the plaintext on one machine, encrypt the ciphertext on a second machine, and store it on a third machine, preferably air-gapped. Thus, if a virus exists on one machine, hopefully it doesn't exist on another, and hopefully it doesn't attach itself to my encrypted data, and hopefully the badware didn't report my plaintext to a botnet pre-encryption.

Physical host security is hard. People have crappy passwords protecting their workstations. Physical access can get the attacker root regardless. Systems are infected with badware all the time, just by visiting websites! So there is hardly a guarantee that your data is safe, even though it was encrypted multiple times with different keys and algorithms.

A Couple Thoughts

It hardly seems worth the effort to encrypt your data multiple times with different algorithms and different keys, provided the overhead necessary in managing everything (hardware and software). Further, in reality, modern encryption algorithms aren't usually broken. For example, DES as an algorithm, isn't broken- it just requires a small key space. So, encrypting your data multiple times is solving a problem that for the most part, just doesn't exist.

That's not to say that AES will remain secure in 10, 20, or 40 years. I'm not that naive. But, as a user, you do have the ability to switch algorithms when AES does break. So, decrypt your AES ciphertext, and encrypt it with SevenFish (sorry Bruce- bad joke). Keep it encrypted with SevenFish until that breaks, and then decrypt it, and encrypt it with whatever the new modern cipher is at the time (if you still have the data, it's still valuable to you, and all implementations can still work with the ciphertext).


In my opinion, don't worry about multiple encryption. Generate a GnuPG key pair, encrypt your data once, and be done with it.

Getting Root On The Nexus 6 With Android 6

This probably the 40th millionth time, since owning this phone, that I've needed to root my device. Because I keep doing it over and over, while also referring to past commands and notes, it's high time I blogged the steps. If I can benefit myself from my own blog post, then chances are someone else can. So, with that said, here's what we're going to do:

  1. Grab the latest Nexus factory images from Google.
  2. Update the phone by flashing all the images (without wiping user data).
  3. Flash the recovery with the latest TWRP image.
  4. Get root on the device with Chainfire's "system-less root" SuperSU package.
  5. Enable USB tethering and the wireless hotspot functionality.

Before beginning, I should mention that if the title isn't immediately clear, this post is specific to the Motorola Nexus 6, which is the phone I currently own. It's probably generic enough, however, to be applied to a few Nexus devices. Minus getting the factory Nexus images from Google, this might even be generic enough for non-Nexus devices, but you're on your own there. Proceed at your own risk. With that said, it's fairly hard to brick an Android phone these days.

Also, you need to make sure you have an unlocked bootloader. Google ships with the bootloader locked by default. Unlocking it, will wipe your user partition, meaning you will lose any and all user data (images, videos, text messages, application data, etc.). I'm going to assume that you've already unlocked the bootloader, and are ready to proceed.


If you don't want to read the post, and know what you're doing, here's the short of it:

$ tar -xf shamu-mmb29k-factory-9a76896b.tgz
$ cd shamu-mmb29k
$ adb reboot bootloader
$ fastboot flash bootloader bootloader-shamu-moto-apq8084-71.15.img
$ fastboot reboot-bootloader
$ fastboot flash radio radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
$ fastboot reboot-bootloader
$ fastboot update
$ fastboot flash recovery twrp-
$ fastboot reboot recovery
(reboot normally)
$ adb push /sdcard/
$ adb reboot recovery
(install /sdcard/ from TWRP)
(do not install TWRP root)
(reboot normally)
(install build.prop editor from Google Play)
(set "net.tethering.noprovisioning" to "true")

Otherwise ...

Getting the Google Nexus factory images

Navigate to and grab the version you are looking for. For example, I recently wanted to flash 6.0.1, so I grabbed the "MMB29K" image. Before flashing, I find it critical to verify the checksums. They are "27dde1258ccbcbdd3451d7751ab0259d" for MD5 and "9a76896bed0a0145dc71ff14c55f0a590b83525d" for SHA-1. So, after downloading, I pulled up a terminal, and verified them:

$ md5sum shamu-mmb29k-factory-9a76896b.tgz
27dde1258ccbcbdd3451d7751ab0259d  shamu-mmb29k-factory-9a76896b.tgz
$ sha1sum shamu-mmb29k-factory-9a76896b.tgz 
9a76896bed0a0145dc71ff14c55f0a590b83525d  shamu-mmb29k-factory-9a76896b.tgz

After examination, it's clear these checksums match, so I'm ready to flash.

Flashing the images

This step does not require root on your device. I'll need to connect my phone to my computer via USB, and verify that I can talk to it via adb(1). This means installing the Debian "android-tools-adb" and "android-tools-fastboot" packages if they're not already. After installed, I should be able to verify that I can talk to the phone:

$ sudo apt-get install android-tools-adb android-tools-fastboot
$ adb devices
List of devices attached 
[serial number]      device

If your device is visible, we are ready to rock-n-roll. First, extract the tarball, and enter the directory:

$ tar -xf shamu-mmb29k-factory-9a76896b.tgz
$ cd shamu-mmb29k
$ ls -lh
total 2.3G
-rw-r--r-- 1 atoponce atoponce  124 Jan  1  2009 android-info.txt
-rw-r--r-- 1 atoponce atoponce 8.1M Jan  1  2009 boot.img
-rw-r----- 1 atoponce atoponce  11M Nov 18 16:59 bootloader-shamu-moto-apq8084-71.15.img
-rw-r--r-- 1 atoponce atoponce 6.2M Jan  1  2009 cache.img
-rw-r----- 1 atoponce atoponce  985 Nov 18 16:59 flash-all.bat
-rwxr-x--x 1 atoponce atoponce  856 Nov 18 16:59*
-rwxr-x--x 1 atoponce atoponce  814 Nov 18 16:59*
-rw-r----- 1 atoponce atoponce 964M Nov 18 16:59
-rw-r----- 1 atoponce atoponce 113M Nov 18 16:59 radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
-rw-r--r-- 1 atoponce atoponce 8.8M Jan  1  2009 recovery.img
-rw-r--r-- 1 atoponce atoponce 2.0G Jan  1  2009 system.img
-rw-r--r-- 1 atoponce atoponce 136M Jan  1  2009 userdata.img

Notice a couple of things- first, there are shell scripts "" and "" for Unix-like systems. Also, notice the "bootloader-shamu-moto-apq8084-71.15.img" & "radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img" raw images, as well as the "". These are the only files we're going to concern ourselves with when flashing the phone.

However, we want to be careful that we don't flash "userdata.img". This will format your user partition and all user data will be wiped (see above). What we're going to do, is basically the same execution as the "" shell script. However, we're going to make just one small modification. Further, we need our phone already booted into the bootloader. As such, here's what we're going to do:

$ adb reboot bootloader
$ fastboot flash bootloader bootloader-shamu-moto-apq8084-71.15.img
$ fastboot reboot-bootloader
$ fastboot flash radio radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
$ fastboot reboot-bootloader
$ fastboot update

Notice that I removed -w from that last command (if you looked in the "" shell script). That option wipes user data, which would be necessary if we wanted to return the phone back to factory state. We don't- we're just upgrading. Also, I don't see the need for "sleep 5". Just wait for the phone to successfully reboot before running the next command.

At this point, the phone is successfully updated. If you were to reboot the phone, it would be perfectly operational as if you did an OTA update, or purchased it from the store. However, we want root, so we have a few more steps to accomplish.

Getting and flashing TWRP

This step also does not require root on your phone. I prefer TWRP for my recovery on Android. It's touch-based, which sets the UI apart from the other recoveries, and it's Free Software, unlike ClockworkMod. Both of these are big wins for me. Grab the latest image at I downloaded twrp- Unfortunately, I couldn't find any checksums to check to verify the download. So, I installed it anyway, knowing I could flash the stock "recovery.img" if something goes wrong. So far, things have been great, so I calculated the checksums for you:

$ md5sum twrp- 
f040c3a26f71dfce2f04339f62e162b8  twrp-
$ sha1sum twrp-
40017e584879fad2be4043c397067fe4d2d76c88  twrp-
$ sha256sum twrp-
ebe5af833e8b626e478b11feb99a566445d5686671dcbade17fe39c5ce8517c7  twrp-

If those checkout, you should be safe in flashing. Currently, the phone should already be booted into the bootloader. If not, make sure it is. Once in the bootloader, we can flash TWRP then reboot normally:

$ fastboot flash recovery twrp-

Now, it's critical that we don't normally reboot the phone. If we do, recovery will be overwritten, and we'll have to reflash. So, while your phone is still booted into the bootloader, reboot it into recovery. You can do this by pressing the volume up/down arrows, until rebooting into recovery is available, and pressing the power button. This should boot you into TWRP. Now that you're there, you can reboot the phone normally.

It is possible that while booting, your phone will notify you that the system cannot be verified. One of two things will happen: either the boot will pause, and not go further, or will boot without despite the warning. If you flashed these exact versions, my phone boots without the warning at all. However, don't panic if you see it. Remember, you have the factory images. Just reflash the recovery.img, and you will be just fine.

More info can be found at

Getting and flashing SuperSU (getting root)

At this point, the phone should be booted into its regular state. We are now ready to root the phone. This means getting the latest SuperSU package, and installing it through TWRP. However, I need to throw out another caution. We'll be installing a beta version of SuperSU to do something called "system-less root". This means that the package will only be modifying the bootloader image to get root, and will not be touching the system partition. This is both good, and bad. It's good in that we only need to reflash the bootloader to remove root. It's bad in that this is experimental software, and really not ready for production. Further, unlike TWRP, SuperSU is proprietary software, which sucks. It does make me a bit nervous, to be honest, to rely on non-free closed-source proprietary software, on such a critical piece of my life. Proceed at your own risk.

As of this writing, you'll need to get the SuperSU package from the XDA forums at;=3. I grabbed version "". There may be updates since this post was published.

Unfortunately, again, I did not see any published checksums. So, I've installed it, with the knowledge of how to reflash my bootloader should I encounter problems.

$ md5sum 
$ sha1sum 
$ sha256sum 

Provided these checksums match, we're good to go. We need to push the ZIP to our phone with the Android debugger, and reboot into the TWRP recovery:

$ adb push /sdcard/
$ adb reboot recovery

From the TWRP interface, tap "Install" and install the /sdcard/ package. When it finishes, tap "Reboot". TWRP will ask if you would like to install the root provided by the image. You do NOT want to install this root- you just flashed one.

The phone should boot normally.

Enable USB tethering and the wireless hotspot

This step requires root. Finally, we want to enable the hotspot and tethering. Google is bending to wireless carriers, forcing the user to prove that they are subscribing to a cellular service that allows them to use USB tethering or the wireless hotspot. Personally, I find this dirty, and unfortunate. Even worse, is the fact that cellular providers think they can get away by charging double for using your own data. Data is data; it shouldn't matter if it comes from your phone, or your laptop connected to your phone. If they want to charge for overages on caps, whatever. But charging double, just because you connected your phone via USB? Or setting up a hotspot in your grandma's house, because she doesn't have WiFi but you have cellular coverage? Please. This is clearly grandfathered from the days of feature phones, where you couldn't tether or hotspot. So, you purchased a USB dongle to enable the hotspot. Even then, it was dirty, but it's clear that this is a byproduct of days gone by.

To enable tethering and the hotspot, you just need to add one line to /system/build.prop config file. Unfortunately, /system/ is mounted read-only. So, you'll have to remount it as read-write and edit the file. However, every attempt I have made at modifying it has ended up with an empty file- IE: losing all its contents. So, rather than editing it manually, there is an app for that.

Install Add "net.tethering.noprovisioning" and set the property to "true", then reboot your phone. At that point, you should be able to USB tether and setup a wireless hotspot.


This wasn't for the faint of heart or for someone who doesn't care about gaining the necessary control over their Android phone that root would give them (setting up firewalls, ad blockers, tethering/hotspot, etc.). However, as mentioned earlier, it's getting fairly difficult to hard brick and Android phone these days. Even better, the steps are getting somewhat standardized. IE: flash factory images, flash custom recovery, install SuperSU, & optionally enable tethering/hotspot.

Your GnuPG Private Key

This post is inspired by a discussion in irc:// about Keybase and a blog post by Filippo Valsorda.

I was curious just exactly how my private key is encrypted. Turns out, gpg(1) can tell you directly:

$ gpg --output /tmp/secret-key.gpg --export-secret-keys 0x22EEE0488086060F
$ gpg --list-packets /tmp/secret-key.gpg
:secret key packet:
	version 4, algo 17, created 1095486266, expires 0
	skey[0]: [1024 bits]
	skey[1]: [160 bits]
	skey[2]: [1023 bits]
	skey[3]: [1023 bits]
	iter+salt S2K, algo: 3, SHA1 protection, hash: 2, salt: ad8d24911a490591
	protect count: 65536 (96)
	protect IV:  01 1e 07 58 4a b6 68 a0
	encrypted stuff follows
	keyid: 22EEE0488086060F

Notice the line "iter+salt S2K, algo: 3, SHA1 protection, hash: 2, salt: ad8d24911a490591". In there, you see "algo: 3" and "hash: 2". What do those identifiers reference? If you refer to RFC4880, you can learn what they are:

Symmetric Encryption Algorithms

  1. Plaintext or unencrypted data
  2. IDEA
  3. 3DES
  4. CAST5
  5. Blowfish
  6. Reserved
  7. Reserved
  8. AES-128
  9. AES-192
  10. AES-256
  11. Twofish

Cryptographic Hashing Algorithms

  1. MD5
  2. SHA-1
  3. RIPEMD-160
  4. Reserved
  5. Reserved
  6. Reserved
  7. Reserved
  8. SHA-256
  9. SHA-384
  10. SHA-512
  11. SHA-224

I emphasized the defaults, which are CAST5 and SHA-1. So, your key is encrypted with the SHA-1 of your passphrase, which is used as the key for CAST5 to encrypt your private key. Thus, the whole security of your encrypted private key rests on the entropy of your passphrase, provided that sane defaults are chosen for the encryption and hashing algorithms, which they are.

CAST5 has been well analyzed and it is not showing any practical or near practical weaknesses. It is a sane default to chose for a symmetric encryption algorithm. However, CAST5 uses 64-bit blocks for encrypting and decrypting data, which may have some theoretical weaknesses. AES uses 128-bit blocks, and thus has a larger security margin. Because AES-256 is available as a symmetric encryption algorithm, there really is no reason to not use it, aside from feeling more secure.

SHA-1 is showing near practical attacks on blind collisions, but for use with keying a block cipher from a passphrase, it's still exceptionally secure. What is needed to break SHA-1 in this regard, is a pre-image attack. A pre-image attack is where you have the hash, but you do not know the input that created it. This is not brute force. This is able to break the algorithm in such a way, that provided with any hash, you can reliably produce its input. SHA-1 has a wide security margin here, so there really is nothing practical to worry about. However, with SHA-512 available, there is also really no reason why not to use a SHA-2 algorithm. In fact, aside from the increase security margin, SHA-512 is designed to work well on 64-bit platforms, but struggle with 32-bit. So this gives us an increased security margin, albeit negligible, against using something like SHA-256.

So, how can we change these? Turns out to be quite simple. All you need to do is specify the secret key symmetric encryption algorithm and hashing algorithm, then change your password (retype it with the same password if you don't want to change it):

$ gpg --s2k-cipher-algo AES256 --s2k-digest-algo SHA512 --edit-key 0x22EEE0488086060F
Secret key is available.

pub  1024D/0x22EEE0488086060F  created: 2004-09-18  expires: never       usage: SCA 
                               trust: unknown       validity: unknown
sub  1792g/0x7345917EE7D41E4B  created: 2004-09-18  expires: never       usage: E   
sub  2048R/0xCE7911B7FC04088F  created: 2005-07-04  expires: never       usage: S   

gpg> passwd
gpg> save

Now if we export our key, and look at the OpenPGP packets, we should see the new updates:

$ gpg --output /tmp/secret-key.gpg --export-secret-keys 0x22EEE0488086060F
$ gpg --list-packets /tmp/secret-key.gpg
:secret key packet:
	version 4, algo 17, created 1095486266, expires 0
	skey[0]: [1024 bits]
	skey[1]: [160 bits]
	skey[2]: [1023 bits]
	skey[3]: [1023 bits]
	iter+salt S2K, algo: 9, SHA1 protection, hash: 10, salt: 9c3dbf2880791f2e
	protect count: 65536 (96)
	protect IV:  db f5 e8 1c 98 03 99 7c 77 33 4e cd d3 3c 1f 4f
	encrypted stuff follows
	keyid: 22EEE0488086060F

Now I have "algo: 9" which is "AES256" and "hash: 10" which is "SHA512" protecting my private key. I've gained a little bit extra security margin at the cost of retyping my passphrase. Not bad. Now, I'd like to pose you a question:

If I were to publish my encrypted GnuPG private key, think I'm crazy?

Let's look at this for a second. We know that the key is protected with AES-256. AES remains secure, after 15 years of intense scrutiny and analysis, and is showing no signs of wear. It's the most used symmetric encryption algorithm that protects HTTPS, SSH, VPN, OTR, Tor, and even your GnuPG encrypted emails. It protects social security numbers, credit card transactions, usernames and passwords, patient health records, hard drives, and on, and on, and on. AES is secure.

So, breaking the encrypted key will be at least as hard as breaking AES, which seems to be a long shot. So, you would be better off attacking my passphrase. Knowing that it was used as the key for AES, and that it is hashed with SHA-512, we can write a brute force algorithm to generate passphrases, hash them with SHA-512, and attempt at decrypting the AES key. After all, we have the salt and the IV right there in plaintext in the packets. Turns out, there are already projects for this.

This should be a breeze. Unless, of course, you have sufficient entropy behind your passphrase. What is sufficient entropy? Well, take some time searching "entropy" on my blog. You'll quickly learn that I recommend at least 80-bits. 100-bits would give you a paranoid security margin, even with the password being hashing with a fast hashing algorithm like SHA-512.

So, if the private key is encrypted with AES-256, and the passphrase has sufficient entropy to withstand a sophisticated attack, and it is hashed with SHA-512, then I should have no concern posting my encrypted key into the open, right?

Without further ado, here is my encrypted GnuPG private key:

Hash: SHA512

Version: GnuPG v1

Version: GnuPG v1


Now that this is posted, I should expect everyone to steal my GnuPG identity, parading around as me, forging signatures, correct? No, it's not happening. I trust the crypto. I trust the math. I trust the software.

Why? Why am I doing this? Well, a recent discussion popped up on IRC about Keybase. Some don't like the fact that they are encouraging you to upload your encrypted private key to the server. Some claim that it is "insecure". However, didn't we just logically and reasonably conclude that AES and SHA-2 are protecting my best interests? So, if it's insecure, then it can't be due to the crypto. It must be something else. Let's look at all the risk factors:

Keybase servers are compromised and you use the web interface
If the server is compromised, then the attackers can modify the code, providing malicious JavaScript to the browser, so when you successfully decrypt your private key, it can be sent and stored elsewhere under their control. There is nothing you can do here. You are screwed. This is a very valid concern.

Keybase servers are compromised and you use the command line interface
The attackers only have access to your encrypted private key. Provided you never use the web interface, all encryption and decryption as handled out-of-band. This means that regardless of the Keybase server compromise, the attackers will never actually get to forge signatures using your GnuPG private key.

Your local client is compromised
There is nothing I can do for you here. Keybase isn't to blame, and not even the gpg(1) client can protect you. You have bigger problems on your hands than the attackers gaining access to your unencrypted GnuPG private key.

I think a lot of GnuPG users are paranoid about their key getting leaked, because they are unsure of exactly how it is stored. Hopefully this post lays those fears to rest. If there are still concerns about the leak of encrypted private keys, then it's probably due to a fear of not fully understanding the strength of passwords and their entropy requirements. However, if you meet the necessary password entropy requirements, and your key is encrypted with CAST5/SHA-1 or AES-256/SHA-512, there is nothing wrong with keeping a backup of your GnuPG key on Github, AWS, Dropbox, or other "cloud" hosting solutions.

Trust the math. Trust the software.

By the way, if you do successfully recover my private key passphrase (and I know you won't), I would be interested in hearing from you. Send me a signed email with my private key, and I'll make it financially worth your while. 🙂

Now Using miniLock

I have been a long proponent of OpenPGP keys for a way to communicate securely. I have used my personal key for signing emails since ~ 2005. I have used my key at dozens and dozens of keysigning parties. I have used my key to store account passwords and credentials with vim(1), Python, and so many other tools. I have used my key to encrypt files to myself. And so much more. There is only one problem.

No one is using my key to send me encrypted data.

Sure, when attending keysigning parties, it's an encryption orgy. Attendees will sign keys, then send encrypted copies to recipients. And, of course, people will send encrypted emails before and after the party, for various reasons. But, when the party dies down, and people get back to their regular lives, and very few actually send encrypted data with OpenPGP keys. Realistically, it's rare. Let's be honest. Sure, there's the one-off, either as a corporation (XMission uses OpenPGP extensively internally for password storage) or individuals (sending encrypted tax forms to a spouse or accountant), but by large, it's rarely used.

I have a good idea of why, and it's nothing ground breaking- OpenPGP is hard. It's hard to create keys. It's hard to manage the keys. It's hard to grasp the necessary concepts of public keys, private keys, encryption, decryption, signatures, verification, the Web of Trust, user identities, key signing parties, revocation certificates, and so much more.

OpenPGP is just hard. Very hard.

Well, in an effort to encourage more people, such as my family and friends that would not use OpenPGP, to encrypt sensitive data, I've jumped on board with miniLock. What is miniLock? Currently, it's a Free Software browser application for the Google Chrome/Chromimum browser (not an extension). It uses ECC, bcrypt, BLAKE2, zxcvbn, and a number of other tools that you really don't need to worry about, unless you want to audit the project. All you need is an email and a password. The keys are deterministically generated based on that.

Think about this for a second. You don't need a public and private keyring to store your keys. You don't need to upload them to a key server. You don't need to attend keysigning parties, worry about the Web of Trust, or any of that other stuff that makes OpenPGP the nightmare it is.

All you need is an email and a password.

Unfortunately, this does have one big drawback- your email or password can't change, without changing your keys. However, the miniLock keys are cheap- IE: you can change them any time, or create as many as you want. You only need to distribute your miniLock ID. In fact, the miniLock ID is the entire public key. So, they don't even need to be long term. Generate a one-time session miniLock ID for some file that you need to send to your accountant during tax season, and call it good.

However, I prefer long-term keys, so as such, I created 3 IDs, one for each email account that I use. If you want to send me encrypted data, without the hassle of OpenPGP, feel free to use the correct miniLock ID for the paired email address.

Email miniLock ID mWdv6o7TxCEFq1uN6Q6xiWiBwMc7wzyzCfMa6tVoEPJ5S qU7DJqG7UzEWYT316wGQHTo2abUZQk6PG8B6fMwZVC9MN 22vDEVchYhUbGY9Wi6EdhsS47EUeLKQAVEVat56HK8Riry

Don't misunderstand me. If you have an OpenPGP key, and would prefer to use that instead, by all means do so. However, if you don't want to setup OpenPGP, and deal with the necessary overhead, I can now decrypt data with miniLock. Maybe that will a better alternative for you instead.

Do XKCD Passwords Work?

You'll always see comments on web forums, social sites, blog posts, and emails about "XKCD passwords". This is of course referring to the XKCD comic by Randall Munroe describing what he thinks is the best password generator:

What no one has bothered asking, is if this actually works.

Lorrie Faith Cranor, director of the Carnegie Mellon Usable Privacy and Security Laboratory at Carnegie Mellon University, a member of the Electronic Frontier Foundation Board of Directors, and Professor in the School of Computer Science and the Engineering and Public Policy Department at Carnegie Mellon University, did ask this question. In fact, she studied to the point, that she gave a TED talk on the subject. The transcript of her talk can be found here. Here are the relevant bits (emphasis mine):

Now another approach to better passwords, perhaps, is to use pass phrases instead of passwords. So this was an xkcd cartoon from a couple of years ago, and the cartoonist suggests that we should all use pass phrases, and if you look at the second row of this cartoon, you can see the cartoonist is suggesting that the pass phrase "correct horse battery staple" would be a very strong pass phrase and something really easy to remember. He says, in fact, you've already remembered it. And so we decided to do a research study to find out whether this was true or not. In fact, everybody who I talk to, who I mention I'm doing password research, they point out this cartoon. "Oh, have you seen it? That xkcd. Correct horse battery staple." So we did the research study to see what would actually happen.

So in our study, we used Mechanical Turk again, and we had the computer pick the random words in the pass phrase. Now the reason we did this is that humans are not very good at picking random words. If we asked a human to do it, they would pick things that were not very random. So we tried a few different conditions. In one condition, the computer picked from a dictionary of the very common words in the English language, and so you'd get pass phrases like "try there three come." And we looked at that, and we said, "Well, that doesn't really seem very memorable." So then we tried picking words that came from specific parts of speech, so how about noun-verb-adjective-noun. That comes up with something that's sort of sentence-like. So you can get a pass phrase like "plan builds sure power" or "end determines red drug." And these seemed a little bit more memorable, and maybe people would like those a little bit better. We wanted to compare them with passwords, and so we had the computer pick random passwords, and these were nice and short, but as you can see, they don't really look very memorable. And then we decided to try something called a pronounceable password. So here the computer picks random syllables and puts them together so you have something sort of pronounceable, like "tufritvi" and "vadasabi." That one kind of rolls off your tongue. So these were random passwords that were generated by our computer.

So what we found in this study was that, surprisingly, pass phrases were not actually all that good. People were not really better at remembering the pass phrases than these random passwords, and because the pass phrases are longer, they took longer to type and people made more errors while typing them in. So it's not really a clear win for pass phrases. Sorry, all of you xkcd fans. On the other hand, we did find that pronounceable passwords worked surprisingly well, and so we actually are doing some more research to see if we can make that approach work even better. So one of the problems with some of the studies that we've done is that because they're all done using Mechanical Turk, these are not people's real passwords. They're the passwords that they created or the computer created for them for our study. And we wanted to know whether people would actually behave the same way with their real passwords.

So, in her research, XKCD passwords really didn't work out that well. They are longer in length, so they take longer to type, which increases the chance for error, and people are no better at remembering on XKCD passphrase, than they are a short string of random characters.

To me, this is unsurprising. If you look at the history of my blogging on passwords, you'll find that I continually advocate true random events to build your passwords, maximizing entropy. In my last post, I even blogged two shell functions that you can use to build XKCD passwords, and "monkey passwords" (monkeys generating passwords by banging away at a keyboard). Both target 80-bits of entropy in the generation. Check out the lengths:

$ gen-monkey-pass 9
cxqwtw63taxdr3zn	uaq4tbt43japmm2q	mptwrxhhb486yfuv
-cb73b9-kgzhmww3	s45t3x6r9smw-7yr	hjkgzkha-qup4gh4
34c5rg4ksw-aprvk	uug-2vq7pfze6dnp	s4qx4eazbnrd2pqe

$ gen-xkcd-pass 9
sorestdanklyAlbanyluckyRamonaFowler   (sorest dankly Albany lucky Ramona Fowler)
towsscareslaudedrobinawardsrenal      (tows scares lauded robin awards renal)
thinkhazelsvealjuggedagingscareen     (think hazels veal jugged agings careen)
tarotpapawsNolanpacketAvonwiped       (tarot papaws Nolan packet Avon wiped)
surgesakimbohardercruelArjunablinds   (surges akimbo harder cruel Arjuna blinds)
amountlopsedgemeaslyCannoninseam      (amount lops edge measly Cannon inseam)
EssexIzmirwizesPattygroutszodiac      (Essex Izmir wizes Patty grouts zodiac)
hoursmailedslamsvowedallowspar        (hours mailed slams vowed allow spar)
AfghanNigelnutriadillmoldertrolly     (Afghan Nigel nutria dill molder trolly)

XKCD passwords average 32 characters to achieve 80-bits of entropy, compared to 16 characters that "monkey passwords" produce. And, according to the research done by Lorrie, people won't necessarily recall XKCD passwords any easier than "monkey passwords". So, if that's the case, then what's the point? Why bother? Why not just create "monkey passwords", and use a password manager?

Exactly. It's 2015. There are password managers for your browser, all versions of every desktop operating system, command-line based utilities for servers, and even apps for your smartphone. There are plenty of "cloud" synchronization services to make sure each instance is up-to-date. At this point, your passwords should:

  • Contain at least 80-bits of entropy.
  • Be truly random generated (no influence from you).
  • Be unique for each and every account.
  • Be protected with two-factor authentication, where available.
  • Be stored in a password manager, that is easily accessible.

You'll remember the ones you type in frequently, and you'll memorize them quickly. The others are stored for safe keeping, should you need to recall them.