Image of the glider from the Game of Life by John Conway
Skip to content

Breaking HMAC

Okay. The title might be click bait, just a little, but after you finish reading this post, I think you'll be a bit more careful picking your HMAC keys. After learning this, I know I will be. However, HMAC is not broken. It just has an interesting ... property that's worth knowing about.

First off, let's remind ourselves what HMAC is. HMAC, or Hashed Message Authentication Codes, are the ability to authenticate a cryptographic message. This is done through an asymmetric key agreement protocol, such as Diffie-Hellman, where two parties securely share symmetric keys. These keys are used to encrypt messages as well as authenticate data. HMAC tags prevent chosen plaintext attacks, where the attacker can insert malicious data into the payload (send $1,000,000 to my account), and HMAC tags prevent adaptive chosen ciphertext attacks where the attacker can send encrypted data to the server, and learn what is being protected (is "password" in the payload?).

Authenticated messages are absolutely essential to modern cryptographic software. If you're writing cryptographic software, and you're not authenticating your ciphertext, you're doing it wrong. It doesn't matter if the data is at rest or in motion. Authenticate your ciphertext. This is where HMAC fits in.

So, best practice, when not using native AEAD ciphers, such as AES-GCM, or ChaCha20-Poly1305, is to encrypt the plaintext then authenticate it with HMAC, and prepend or append the digest (called a "MAC tag" or just "tag") to the ciphertext.

Something like this pseudocode:

ciphertext = AES-256-CTR(nonce, plaintext, key1)
tag = HMAC-SHA-512(ciphertext, key2)
ciphertext = tag || ciphertext

Then we ship off the newly bundled ciphertext and MAC tag. When it arrives at our destination, the recipient can verify if the ciphertext is what it should be, before decrypting it, by verifying the MAC tag, which is the whole point:

tag1 = ciphertext[:64]
data = ciphertext[64:]
tag2 = HMAC-SHA-512(ciphertext, key2)
hmac_check = 0

for char1, char2 in zip(tag1, tag2):
    hmac_check |= ord(char1) ^ ord(char2)

if hmac_check == 0:
    plaintext = AES-256-CTR(nonce, data, key1)
else:
    return False
return True

Notice that we're doing constant time comparison of the shipped tag and the calculated tag. Last thing we want is to introduce a timing attack by doing "if tag1 != tag2". However, after the constant time comparison, if tag1 and tag2 match, we can decrypt the data, and retrieve the plaintext.

So, now you have the background on HMAC, let's look at some examples with Python, and see where HMAC breaks. After all, that's the click bait title, no?

1
2
3
4
5
6
7
>>> import os
>>> import hmac
>>> import hashlib
>>> key = os.urandom(16)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg).hexdigest()
'd5a94f051b1e6ff67065b6f4c3a60130'

In this example, the default HMAC is HMAC-MD5. HMAC-MD5 is still considered cryptographically secure, even though vanilla MD5 is broken. Regardless, it'll suffice for this example, and we'll look at SHA-1 and SHA-2 also.

In RFC 2104, where HMAC is standardized, section 3 has this odd little tidbit (emphasis mine):

The key for HMAC can be of any length (keys longer than B bytes are
first hashed using H
). However, less than L bytes is strongly
discouraged as it would decrease the security strength of the
function. Keys longer than L bytes are acceptable but the extra
length would not significantly increase the function strength. (A
longer key may be advisable if the randomness of the key is
considered weak.)

The "B bytes" length is the block size of the underlying HMAC operations. If the key is shorter than this block size, zeroes need to be appended to the key. If the key is longer than this block size, then it is hashed with the HMAC cryptographic hash.

The block size is 64 bytes for the following HMACs:

  • HMAC-MD5
  • HMAC-RIPEMD128
  • HMAC-RIPEMD160
  • HMAC-SHA1

In other words, HMAC wants to key to be exactly one block in length. If it's longer, it's hashed with zeros appended to fit exactly into one block.

Can we test this?

1
2
3
4
5
6
>>> key = os.urandom(65) # longer than one block
>>> msg = os.urandom(256)
>>> hmac.new(key, msg).hexdigest()
'f887a4146e94ed47405c97931798885d'
>>> hmac.new(hashlib.md5(key).digest(),msg).hexdigest()
'f887a4146e94ed47405c97931798885d'

We have a collision. In other words:

For:
* 'H' a cryptographic hash
* 'k' a private key
* 'm' a message
* 'B' an HMAC block size

    HMAC(k, m) == HMAC(H(k), m)

for all 'k', where len(k) > B

Does this work with HMAC-SHA1?

1
2
3
4
5
6
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha1).hexdigest()
'1070312944223b36928382d7a53ca54f7204ad4a'
>>> hmac.new(hashlib.sha1(key).digest(), msg, hashlib.sha1).hexdigest()
'1070312944223b36928382d7a53ca54f7204ad4a'

How about all the SHA-2 functions? (SHA-224, SHA-256, SHA-384, & SHA-512)?

SHA-224:

1
2
3
4
5
6
7
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha224).hexdigest()
'9ea8c3f667e55e6e9c5d63c5dd1b569ca69e2cc69f5e3fa3f87e94ba'
>>> hmac.new(hashlib.sha224(key).digest(), msg, hashlib.sha224).hexdigest
()
'9ea8c3f667e55e6e9c5d63c5dd1b569ca69e2cc69f5e3fa3f87e94ba'

SHA-256:

1
2
3
4
5
6
7
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha256).hexdigest()
'2aa02e678fcfe7ecaa1475efb70fe284fe91cc81e5a9c543433b70f5f5112c4b'
>>> hmac.new(hashlib.sha256(key).digest(), msg, hashlib.sha256).hexdigest
()
'2aa02e678fcfe7ecaa1475efb70fe284fe91cc81e5a9c543433b70f5f5112c4b'

SHA-384:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha384).hexdigest()
'0941e6502233a72d01beeec729eaa7db2469f8ce96339cd5b3b2c9a4684501e6a7025fac
6c9c20a511c48df76b453ec3'

>>> hmac.new(hashlib.sha384(key).digest(), msg, hashlib.sha384).hexdigest
()
'5380905fb89fee68836be076ebfccff600e3b89c6554840fe61fed01b049d6a6a77423d2
f5f4be1afb9d1c6f63b8b7fc'

SHA-512:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha512).hexdigest()
'36063bdc2d02ce8ea4b01b40ba040094c640959e0cc5716f7a75f119cbc348aa93d555f8
6bfcdaee5dad4ec2e5d53ed4362f9df0720ec0e1272288d49a912f7e'

>>> hmac.new(hashlib.sha512(key).digest(), msg, hashlib.sha512).hexdigest
()
'8bfe675e3ca35a8680243d3747b5d3ce7ded1731e1a307cf5d1b00ae9243395ab94039f2
2585b417d7cbdf09f3d8dcf39c85ce147ff77c901c1a21f8de981b6a'

So it appears that the block size is indeed 64 bytes for MD5, SHA-1, SHA-224, and SHA-256, but for SHA-384 and SHA-512, it doesn't appear to be working. That is because the block size has changed to 128 bytes for these two functions. So, if our key is 129 bytes, we should be able to replicate collisions:

SHA-384 with a 129-byte key:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(129)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha384).hexdigest()
'bda2b586637e3bd73a27919601d7d5a1c1743f1f9f5cb72a0aa874f832046f4bc396ff8e
307f9318dc404c4b432ca491'

>>> hmac.new(hashlib.sha384(key).digest(), msg, hashlib.sha384).hexdigest
()
'bda2b586637e3bd73a27919601d7d5a1c1743f1f9f5cb72a0aa874f832046f4bc396ff8e
307f9318dc404c4b432ca491'

SHA-512 with a 129-byte key:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(129)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha512).hexdigest()
'd0153f8bb6a549539abbcff8ee5ac7592c48c082bbbb7b3cc95dcb2166f162e5c59bb7bb
3316e65d1481bd8697e8d3bc91deb46ad44845b972c57766f45c54bd'

>>> hmac.new(hashlib.sha512(key).digest(), msg, hashlib.sha512).hexdigest
()
'd0153f8bb6a549539abbcff8ee5ac7592c48c082bbbb7b3cc95dcb2166f162e5c59bb7bb
3316e65d1481bd8697e8d3bc91deb46ad44845b972c57766f45c54bd'

This isn't a poor Python implementation of HMAC. Try it in your favorite language, and you should be able to replicate the collisions. This is a "bug" in HMAC. If the key is longer than the block size, it's hashed with the HMAC cryptographic hash, then appended with zeros to fit the single block.

So what does this mean? It means that when choosing your HMAC keys, you should stay within one block size of bytes- 64 bytes or less for MD5, RIPEMD-128/160, SHA-1, SHA-224, SHA-256, and 128-bytes or less for SHA-384 and SHA-512. If you do this, you'll be fine.

Then again, you should probably be using NaCl or libsodium rather than piecing these cryptographic primitives manually yourself. These sorts of pitfalls are already handled for you.

This post is the result of a Twitter discussion between Scott Arciszewski, myself, and some others.

UPDATE: I'm not aware of any actual security implications where this is a problem that two distinct inputs produce the same HMAC digest. Ultimately, what are you trying to get out of HMAC? It's used as an authentication and integrity mechanism. So what if a long key, and the hash of that key produce the same HMAC digest? At 64 bytes, or 512-bits, the amount of work required at guessing the right key is anything but practical. Still, this is interesting.

Further Investigation Into Scrypt and Argon2 Password Hashing

Introduction

In my previous post, I didn't pay close attention to the memory requirements of Argon2 when running my benchmarks. Instead, I just ran them until I got tired of waiting around. Further, I really didn't do justice to either scrypt nor Argon2 when showing the parallelization factor. So, as a result, I did a lot more benchmarks on both, so you could more clearly see how the cost affects the time calculating the password hash, and how parallelization can affect that time.

More scrypt Benchmarks and Clarification

So, let's run some more scrypt benchmarks, and take a closer look at what's going on. Recall that the cost parameters for scrypt are:

  • N: The CPU and RAM cost
  • r: The mixing loop memory block size
  • p: The product factor

The recommended cost factors from the Colin Percival are:

  • N: 16384 (214)
  • r: 8
  • p: 1

To calculate the amount of memory used, we use the following equation:

Memory in bytes = (N * r * 128) + (r * p * 128)

So, according to the recommendation:

(214 * 8 * 128) + (8 * 1 * 128) = 16,778,240 bytes

Or, about 16 megabytes. According to Anthony Ferrara, you should be using at least 16 MiB with scrypt. At 4 Mib or less, it is demonstrably weaker than bcrypt. So, when you're looking at the images below, you'll notice that the first memory result is in red text, to show that 8 MiB is the weak point with those cost factors, and the 16 MiB is green, showing cost factors from there up are preferred. As a result, anything between 8 MiB and 16 MiB is default black, in that you should probably be cautious using these cost factors. While you might be demonstrably stronger than bcrypt with these work loads, you're not at the developer's recommendation of 16 MiB.

So, knowing that, let's look at the results. Notice that when we increase our product factor, our execution time increases by that factor. Despite urban legend, this isn't a parallelization constant (well, it is, but it's not breaking up the problem into smaller ones- it's multiplying it). The idea is that once you've reached a reasonable memory cost, you can increase the execution time by creating more mixing loops with the same cost on each loop. So, instead of one mixing loop costing you 16 MiB, you can have two mixing loops costing you 16 MiB. We didn't divide the problem, we multiplied it. As such, our execution time will double from one mixing loop to two mixing loops.

This seems strange, and indeed it is, but you should start with "p=1" at the memory cost you can afford, then increase the proudct factor if you can donate more time to the execution. In other words, the product factor is designed for hardware limited scenarios. In general, you'll want to look at your execution time, and let the memory come in as an after thought (provided it's at or more than 16 MiB).

As in the last post, I have highlighted the green cells with interactive password sessions targeting half-of-a-second and red cells with symmetric key derivation targeting a full five seconds.


Scrypt table showing memory block sizes of 1 and 2 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 4 and 6 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 8 and 10 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 12 and 14 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 16 and 18 with product multipliers of 1, 2, and 4.

More Argon2 Benchmarks

When showing Argon2 in my last post, I did a poor job demonstrating several additional cost factors, and I didn't make it clear how the parallelization played a roll when keeping the same cost factor. As a result, I ran additional benchmarks to make it more clear exactly how CPU and RAM play with each other in your work load.

As a reminder, the cost parameters for Argon2 are as follows:

  • n: The number of iterations on the CPU.
  • m: The memory work load.
  • p: The parallelization factor.

Unlike scrypt, where a single factor manipulates both the CPU and RAM cost, Argon2 separates them out. You deliberately have two knobs to play with- "n" for the CPU and "m" for the RAM. But, one affects the other. If you are targeting a specific execution time, and you increase your memory factor by two, then your CPU work factor will be decreased by half. On the reverse, if you increase your CPU work factor by two, then your memory work factor will be decreased by half. So, affecting one work factor affects the other.

Why is this important? Well, let's consider setting your memory requirement into the gigabyte range. At 1 GiB, for an interactive login session of .5 seconds, you would need at least both cores working on the hash, and you would only get a single iteration. In other words, your work is entirely memory dependent without any significant CPU cost. Maybe you're trying to thwart FPGAs or ASICs with the large memory requirement. However, is it possible that an adversary has 1 GiB of on-die cache? If so, because you're entirely memory-dependent, and no CPU work load, you've been able to cater to the adversary, without significant hardware cost.

On the reverse, you could get CPU heavy with 2048 iterations to hit your .5 seconds execution time, but then you would only be using 256 KiB of memory. You're likely not defeating the FPGAs and ASICs that Argon2 is designed for, as you're almost entirely processor-driven.

So, what to do? It would probably be a good idea to target a balance- require a significant amount of memory, even if it doesn't break the on-die cache barrier, while also requiring a significant amount of processor work. Sticking with Colin's recommendation of 16 MiB (214) of memory and 32 iterations on 4 cores for interactive logins is probably a good balance. Then again, it will all depend on your hardware, what you can expect in customer execution time, load, and other variables.

However, here are additional timings of Argon2, just like with scrypt, so you can see how parallelization affects identical costs. Again, green cells are targeting .5 seconds for interactive logins, and red cells are targeting 5 seconds for symmetric key derivation.


Argon2 table showing memory requirements of 256 KiB and 512 KiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 1 MiB and 2 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 4 MiB and 8 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 16 MiB and 32 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 64 MiB and 128 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 256 MiB and 512 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 1 GiB and 2 GiB with parallel factors of 1, 2, and 4.

Conclusion

Hopefully, this will help you make a more educated decision about your cost factors when deploying either scrypt or Argon2 as your password hash or symmetric key derivation function. Remember, that you have a few things to consider when picking your costs:

  • Make it expensive for both CPU and memory.
  • Target a realistic execution time for the situation.
  • Guarantee that you can always meet these goals after deployment.

Also, don't deploy Argon2 into production quite yet. Let is bake for a while. If it still stands secure in 2020, then you're probably good to go. Otherwise, deploy scrypt, or the other functions mentioned in the prior post.

Let's Talk Password Hashing

TL;DR

In order of preference, hash passwords with:

  1. scrypt
  2. bcrypt
  3. Argon2
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2

Do not hash passwords with:

  1. MD5
  2. md5crypt
  3. UNIX crypt(3)
  4. SHA-1/2/3
  5. Skein
  6. BLAKE2
  7. Any general purpose hashing function.
  8. Any encryption algorithm.
  9. Your own design.
  10. Plaintext

Introduction

Something that comes up frequently in crypto circles, aside from the constant database leaks of accounts and passwords, are hashing passwords. Because of the phrase "hashing passwords", developers who may not know better will think of using generic one-way fixed-length collision-resistant cryptographic hashing functions, such as MD5, SHA-1, SHA-256, or SHA-512, without giving a second thought to the problem. Of course, using these functions is problematic, because they are fast. Turns out, we don't like fast hashing functions, because password crackers do like fast hashing functions. The faster they can do, the sooner they can recover the password.

The Problem

So, instead of using MD5, SHA-1, SHA-256, SHA-512, etc., the cryptographic community got together, and introduced specifically designed password hashing functions, where a custom work factor is included as a cost. Separately, key derivation functions were also designed for creating cryptographic keys, where a custom work factor was also included as a cost here. So, with password-based key derivation functions and specifically designed password hashing functions, we came up with some algorithms that you should be using instead.

The Solution

The most popular algorithms of this type would include, in my personal order of preference from most preferred to least preferred:

  1. scrypt (KDF)
  2. bcrypt
  3. Argon2 (KDF)
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2 (KDF)

The only difference between a KDF and a password hashing function, is that the digest length can be arbitrary with KDFs, whereas password hashing functions will have a fixed length output.

For the longest time, I was not a fan of scrypt as a password hashing function. I think I've changed my mind. Even though scrypt is sensitive to the parameters picked, and it suffers from a time-memory trade-off (TMTO), it's still considered secure, provided you pick sane defaults. I also place bcrypt over Argon2, because Argon2 was just recently announced as the Password Hashing Contest winner. As with all cryptographic primitives, we need to time to analyze, attack, and pick apart the design. If after about 5 years, it still stands strong and secure, then it can be recommended as a solution for production. In the meantime, it's something certainly worth testing, but maybe not for production code. Finally, I prefer sha512crypt and sha256crypt over PBKDF2, mostly because they are included with every GNU/Linux distribution by default, they are based on the strong SHA-2 hashing function, which has had years and mountains of analysis, and unlike PBKDF2, you know exactly which hashing function is used. PBKDF2 could be using SHA-2 functions by default, or it could be using SHA-1. You'll need to check your library to be sure.

Different Strokes for Different Folks

Regardless, all of the above functions include cost parameters for manipulating how long it takes to calculate the hash from a password. It's less important exactly what the cost parameters are, and more important that you are targeting an appropriate time to work through the cost, and create the hash. This means you need to identify your threat model and your adversary.

The two common scenarios you'll find yourself in, are:

  1. Password storage
  2. Encryption keys

For password storage, your threat model is likely the password database getting leaked to the Internet, and password crackers around the world working on the hashes in the database to recover passwords. Thus, your adversary is malware, Anonymous, and password crackers. For encryption keys, your threat model is likely private encrypted keys getting compromised and side-channel attacks. Thus, your adversary is also malware, poor key exchanges, or untrusted networks. Knowing your threat model and your adversary changes how you approach the problem.

With password storage, you may be dealing with an interactive login, such as through a website. As such, you probably want the password hashing time to be quick, while still maintaining a work factor that would discourage large distributed attacks on your leaked database. Possibly, .5 seconds. This means if the database was leaked, the password cracker could do no more than 2 passwords per second. When you compare this to the millions of hashes per second a GPU could execute on Windows NTLM passwords, 2 passwords per second is extremely attractive. For encryption keys, you probably don't need to worry about interactive sessions, so taking 5 seconds to create the key from the password probably isn't a bad thing. So key crackers spending 5 seconds per guess trying to recover the password that created the encrypted private key is really nice.

bcrypt, sha256crypt, sha512crypt, & PBKDF2

So, knowing the work factors, what would it look like for the above algorithms? Below, I look at bcrypt, sha256crypt, sha512crypt, and PBKDF2 with their appropriate cost. I've highlighted the row green where a possible work factor could mean spending 0.5 seconds on hashing the password, and a red row where a possible work factor could mean spending 5 full seconds on creating a password-based encryption key.

Spreadsheet table showing various cost factors for bcrypt, sha256crypt, sha512crypt, and PBKDF2.

Notice that for bcrypt, this means for password hashing, a factor of 13 would provide a cost of about 0.5s to hash the password, where a factor of 16 would get me close to my cost of about 5 seconds for creating a password-based key. For sha256crypt, sha512crypt, and PBKDF2, that seems to be about 640,000 and 5,120,000 iterations respectively.

scrypt

When we move to scrypt, things get a touch more difficult. With bcrypt, sha256crypt, sha512crypt, and PBKDF2, our cost is entirely a CPU load factor. Unfortunately, while possibly problematic for fast GPU clusters, they still fall victim to algorithm-specific FPGAs and ASICs. In order to combat this, we need to also include a memory cost, seeing as though memory on these devices is expensive. However, having both a CPU and a RAM cost, means multiple knobs to tweak. So, Colin Percival, the designer of scrypt, decided to bundle both the CPU and the RAM cost three factors: "N", "r", and "p". The resulting memory usage is calculated as follows:

Memory in bytes = (N * r * 128) + (r * p * 128)

There are a lot of suggestions out there about what's "best practice". It seems that you should at least have the following cost factors with scrypt, which provides a 16 MiB memory load:

  • N: 16384 (214)
  • r: 8
  • p: 1

While you should be aware of the sensitivity of scrypt parameters, provided you are working with at least 16 MiB of RAM, you aren't any worse than other password hashing functions or KDFs. So, in the following tables, I increase the memory cost needed for the hash by tweaking the three parameters.

Update 2016-06-29: I've clarified these parameters in a follow-up post, which you should most definitely read at https://pthree.org/2016/06/29/further-investigation-into-scrypt-and-argon2-password-hashing/.

First table showing the cost factors of scrypt. Second table showing different cost factors of scrypt. Third table showing yet different cost factors of scrypt.

Because I only have access to a single-socket-quad core CPU in this testing machine, I wanted to limit my "p" cost to 1, 2, and 4, which is displayed in those tables. Further, I'm limited on RAM, and don't want to disrupt the rest of the applications and services running on the box, so I've limited my "r" cost to 4, 8, and 16 multiplied by 128 bytes (512 bytes, 1024 bytes, and 2048 bytes).

Interestingly enough, Colin Precival recommends 16 MiB (N=16384 (214), r=8, p=1) for interactive logins and 16 MiB (N=131072 (217), r=1, p=1) for symmetric key derivation. If I were targeting my 0.5s password hashing time, then I could improve that to 256 MiB (N=65536 (216), r=8, p=1), or 2 GiB (N=2097152 (221), r=8, p=1), if targeting just slightly more than 5 seconds for symmetric key derivation.

Argon2

Finally, we look at Argon2. Argon2 comes in two flavors- Argon2d and Argon2i; the first of which is data (d)ependent and the latter is data (i)independent. The former is supposed to be resistant against GPU cracking while the latter is supposed to be resistant against side-channel attacks. In other words, Argon2d would be suitable for password hashing, while Argon2i would be suitable for encryption key derivation. However, regardless of Argon2d or Argon2i, the cost parameters will perform the same, so we'll treat them as a single unit here.

Like scrypt, Argon2 has both a CPU and a RAM cost. However, both are handled separately. The CPU cost is handled through standard iterations, like with bcrypt or PBKDF2, and the RAM cost is handled through specifically ballooning the memory. When I started playing with it, I found that just manipulating the iterations felt very much like bcrypt, but I could affect the overall time it took to calculate the hash by just manipulating the memory also. When combining the two, I found that iterations affected the cost more than the RAM, but both had significant say in the calculation time, as you can see in the tables below. As with scrypt, it also has a parallelization cost, defining the number of threads you want working on the problem:

First table showing the cost factors of Argon2. Second table showing different cost factors of Argon2. Third table showing yet different cost factors of Argon2.

Note the RAM cost between 256 KiB and 16 MiB, in addition to the number of iterations and the processor count cost. As we balloon our RAM, we can bring our iteration cost down. As we require more threads to work on the hash, we can bring that iteration count down even further. Regardless, we are trying to target 0.5s for an interactive password login, and a full 5 seconds for password-based encryption key derivation.

Conclusion

So, what's the point? When hashing passwords, whether to store them on disk, or to create encryption keys, you should be using password-based cryptographic primitives that were specifically designed for this problem. You should not be using general purpose hashing functions of any type, because of their speed. Further, you should not be rolling out your own "key-stretching" algorithm, such as recursively hashing your password digest and additional output.

Just keep in mind- if the algorithm was specifically designed to handle passwords, and the cost is sufficient for your needs, threat model, and adversary, then you're doing just fine. Really, you can't go wrong with any of them. Just avoid any algorithm not specifically designed around passwords. The goal is security through obesity.

Best practice? In order of preference, use:

  1. scrypt
  2. bcrypt
  3. Argon2
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2

Do not use:

  1. MD5
  2. md5crypt
  3. UNIX crypt(3)
  4. SHA-1/2/3
  5. Skein
  6. BLAKE2
  7. Any general purpose hashing function.
  8. Any encryption algorithm.
  9. Your own design.
  10. Plaintext

The Physics of Brute Force

Introduction

Recently, MyDataAngel launched a Kickstarter project to sell a proprietary encryption algorithm and software with 512-bit and 768-bit symmetric keys. The motivation was that 128-bit and 256-bit symmetric keys just isn't strong enough, especially when AES and OpenSSL are older than your car (a common criticism they would mention in their vlogs). Back in 2009, Bruce Schneier blogged about Crypteto having a 49,152-bit symmetric key. As such, their crypto is 100% stronger, because their key is 100% bigger (than 4096-bit keys?). Meganet, which apparently still exists, has a 1 million-bit symmetric key!

It's hard to take these encryption products seriously, when there are no published papers on existing primitives, no security or cryptography experts on your team, and you're selling products with ridiculous key lengths (to be fair, 512-bit and 768-bit symmetric keys aren't really that ridiculous). Nevermind that your proprietary encryption algorithm is not peer-reviewed nor freely available to the public. Anyone can create a symmetric encryption algorithm that they themselves cannot break. The trick is releasing your algorithm for peer review, letting existing cryptography experts analyze the design, and still coming out on top with a strong algorithm (it wouldn't hurt if you analyzed existing algorithms and published papers yourself).

So with that, I want to talk a bit about the length of symmetric keys, and what it takes to brute force them. Bruce Schneier addressed this in his "Applied Cryptography" book through the laws of thermodynamics. Unfortunately, he got some of the constants wrong. Although the conclusion is basically the same, I'm going to give you the same argument, with updated constants, and we'll see if we come to the same conclusion.

Counting Bits

Suppose you want to see how many bits you can flip in one day by counting in binary every second. Of course, when you start counting, you would start with "0", and your first second would flip your first bit to "1". Your second second would flip your second bit to "1" while also flipping your first bit back to "0". Your third second would flip the first bit back to "1", and so forth. Here is a simple GIF (pronounced with a hard "G") counting from 0 to 127, flipping bits each second.

Bit counter animation counting from 0 to 127.

By the end of a 24-hour period, I would have hit 86,400 seconds, which is represented as a 17-bit number. In other words, every 24 hours, flipping 1 bit per second, I can flip every combination of bits in a 16-bit number.

Binary representation of 86400

By the end of a single year, we end up with a 25-bit number, which means flipping a single bit every second can flip every combination of 24-bits every year.

Binary representation of 31536000

So, the obvious question is then this- what is the largest combination of bits that I can flip through to exhaustion? More importantly, how many computers would I need to do this work (what is this going to cost)?

Some Basic Physics

One of the consequences of the second law of thermodynamics, is that it requires energy to do a certain amount of work. This could be anything from lifting a box over your head, to walking, to even getting out of bed in the morning. This also includes computers and hard drives. When the computer wishes to store data on disk, energy is needed to do that work. This is expressed with the equation:

Energy = kT

Where "k" is Boltzmann's constant of 1.38064852×10−16 ergs per Kelvin, and "T" is the temperature of the system. I'm going to use ergs as our unit, as we are speaking about work, and an "erg" is a unit of energy. Of course, a "Kelvin" is a unit of temperature, where 0 Kelvin is defined as a system devoid of energy; also known as "absolute zero".

It would make the most sense to get our computer as absolutely cool as possible to maximize our output while also minimizing our energy requirements. Current background radiation in outer space is about 2.72548 Kelvin. To run a computer cooler than that would require a heat pump, which means adding additional energy to the system than what is needed for our computation. So, we'll run this ideal computer at 2.72548 Kelvin.

As a result, this means that to flip a single bit with our ideal computer, it requires:

Energy = (1.38064852×10−16 ergs per Kelvin) * (2.72548 Kelvin) = 3.762929928*10-16 ergs

Some Energy Sources

The Sun

Now that we know our energy requirement, let's start looking at some energy sources. The total energy output from our star is about 1.2*1034 Joules per year. Because one Joule is the same as 1*107 ergs, then the total annual energy output of the Sun is about 1.2*1041 ergs. So, doing some basic math:

Bits flipped = (1.2*1041 ergs) / (3.762929928*10-16 ergs per bit) = 3.189004374*1056 bits

3.189004374*1056 bits means I can flip every combination of bits in a 2187-bit number, if I could harness 100% of the solar energy output from the sun each year. Unfortunately, our Sun is a weak star.

A Supernova

A supernova is calculated to release something around 1044 Joules or 1051 ergs of energy. Doing that math:

Bits flipped = (1051 ergs) / (3.762929928*10-16 ergs per bit) = 2.657503608*1066 bits

2.657503608*1066 bits is approximately 2220-bits. Imagine flipping every bit in a 220-bit number in an orgy of computation.

A Hypernova

A hypernova is calculated to release something around 1046 Joules or 1053 ergs of energy. Doing that math:

Bits flipped = (1053 ergs) / (3.762929928*10-16 ergs per bit) = 2.657503608*1068 bits

2.657503608*1068 bits is approximately 2227-bits. This is a computation orgy turned up to 11.

Of course, in all 3 cases, I would have to harness 100% of that energy into my ideal computer, to flip every combination of these bits. Never mind finding transportation to get me to that hypernova, the time taken in that travel (how many millions of light years away is it?), and the cost of the equipment to harness the released energy.

Bitcoin Mining

As a comparative study, Bitcoin mining has almost surpassed 2 quintillion SHA-256 hashes per second. If you don't think this is significant, it is. That's processing all of a 60-bit number (all 260 bits) every second, or an 85-bit number (all 285 bits) every year. This is hard evidence, right now, of a large scale 256-bit brute force computing project, and it's barely flipping all the bits in an 85-bit number every year. The hash rate would have to double (4 quintillion SHA-256 hashes every second) to surpass flipping all the bits in an 86-bit number every year.

Further, we do not have any evidence of any clustered supercomputing project that comes close to that processing rate. It can be argued that the rate of Bitcoin mining is the upper limits of what any group of well-funded organizations could afford (I think it's fair to argue several well-funded organizations are likely Bitcoin mining). To produce a valid conspiracy theory to counteract that claim, you would need to show evidence of organizations that have their own semiconductor chip manufacturing, that has outpaced ARM, AMD, Intel and every other chip maker on the market, by several orders of magnitude.

Regardless, we showed the amount of energy needed anyway to flip every bit in a 256-bit number, and the laws of thermodynamics strongly imply that it's just not physically possible.

Asymmetric Cryptography

Things change when dealing with asymmetric cryptography. Now, instead of creating a secret 256-bit number, you're using mathematics, such as prime number factorization or elliptic curve equations. This changes things drammatically when dealing with key lengths, because even though we assume some mathematical problems are easy to calculate, but hard to reverse, we need to deal with exceptionally large numbers to give us the security margins necessary to prove that hardness.

As such, it because less of a concern about energy, and more a concern about time. Of course, key length is important up to a point. We just showed with the second law of thermodynamics, that brute forcing your way from 0 to 2256 is just physically impossible. However, finding the prime factors of that 256-bit number is a much easier task, does not require as much energy, and can be done by only calculating no more than half of the square root amount of numbers (in this case, 2127, assuming we're only testing prime numbers).

As such, we need to deal with prime factors that are difficult to find. It turns out that it's not enough to just have a 512-bit private key to prevent the Bad Guys from finding your prime factors. This is largely because there are efficient algorithms for calculating and testing prime numbers. So, it must also be expensive to calculate and find those primes. Currently, best practice seems to be generating 2 1024-bit prime factors to produce a 2048-bit private RSA key.

Table showing recommendations of key length from various authorities

Fixed-length Collision-resistant Hashing

Fixed-length collision-resistant hashing puts a different twist on brute force searching. The largest problem comes from the Birthday Attack. This states that if you have approximately the square root of 2 times 365 people in the room (about 23 people), the chances that any two people share the same birthday is 50%. Notice that this comes from any two people in the room. This means that you haven't singled out 1 person, and the odds that the other 22 people in the room have that same birthday is 50%. This isn't a pre-collision search. This is a blind search. You ask the first person what their birthday is, and compare it with the other 22 people in the room. Then you ask the second person what their birthday is, and compare it with the remaining 21 people in the room. And so on and so forth. After working through all 23 people comparing everyone's birthday to everyone else's birthday, the odds you found a match between two random people is 50%.

Why is this important? Suppose you are processing data with SHA-1 (160-bit output). You only need to calculate 280 SHA-1 hashes before your odds of finding a duplicate hash out of the currently calculated hashes reaches 50%. As we just learned with Bitcoin, this is practical within one year with a large orchestrated effort. Turns out, SHA-1 is weaker that that (we only need to calculate 264 hashes for a 50% probability), which is why the cryptographic community has been pushing so hard to get everyone and everything away from SHA-1.

Now you may understand why 384-bit and 512-bit (and more up to 1024-bit) cryptographically secure fixed-length collision-resistant hashing functions exist. Due to the Birthday Attack, we can make mince meat of our work.

Conclusion

As clearly demonstrated, the second law of thermodynamics provides a clear upper bound on what can be found with brute force searches. Of course, brute force searches are the least effective way to find the private keys you're looking for, and indeed, there are more efficient ways to get to the data. However, if you provide a proprietary encryption algorithm with a closed-source implementation, that uses ridiculously long private keys, then it seems clear that you don't understand the physics behind brute force. If you can't grasp the simple concept of these upper bounds, why would I want to trust you and your product in other areas of security and data confidentiality?

Quantum computing does give us some far more efficient algorithms that classical computing cannot achieve, but even then, 256-bits still remains outside of the practical realm of mythical quantum computing when brute force searching.

As I've stated many times before- trust the math.

Webcam Random Number Generation

A couple weeks ago, I purchased a lava lamp for $5 at a thrift store. It was in brand spanking new condition, and worked like a charm. The only thing going through my head at the time? I can't wait to point my webcam at it, and start generating some random numbers! Okay, well that, and mood lighting for the wife.

Anyway, I wrote a quickie Python script which will capture a frame from the webcam, hash it with a keyed BLAKE2, and output the result to a FIFO file to be processed. The BLAKE2 digest of the frame also becomes the key for the next BLAKE2 instance, making this script very CBC-like in execution (the first function is keyed from /dev/urandom, and each digest keys the next iteration).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/python

# Create true random seeds (near as we can tell) with your webcam.
#
# This script will use your webcam pointed at a source of entropy, keyed with
# random data from the OS CSPRNG. You could point the camera at:
#
#   * Lava lamps
#   * Plasma globes
#   * Double pendulums
#   * Rayleigh-Benard convection
#   * Brownian motion
#
# Performance is ~ 2 KiB/s.
# Requires pyblake2: https://pypi.python.org/pypi/pyblake2
#
# Released to the public domain.

import os
import cv2
import pyblake2

cap = cv2.VideoCapture(0)
webcamfile = '/tmp/webcamfile.fifo'
key = os.urandom(64)

try:
    os.mkfifo(webcamfile)
except OSError, e:
    print "Cannot create FIFO: {0}".format(e)
else:
    fifo = open(webcamfile, 'w+')

while True:
    ret, frame = cap.read()
    if not ret:
        break

    b2sum = pyblake2.blake2b(key)
    b2sum.update(frame)
    digest = b2sum.digest()
    key = digest

    fifo.write(digest)
    fifo.flush()

    cv2.imshow('webcamlamp', frame)
    k = cv2.waitKey(1) & 0xFF
    if k == 27:
        break

fifo.close()
os.remove(webcamfile)
cap.release()
cv2.destroyAllWindows()

As you'll notice in the code, you should point your webcam at a source of either chaotic randomness, like a lava lamp, or quantum randomness, like a plasma globe. Because the frame is whitened with a keyed BLAKE2, it could be considered as a true random number generator, or you could use it as a seed for a cryptographically secure pseudorandom number generator, such as those shipped with modern operating systems. If you do use this as a TRNG, realize that it's slow- it only operates at about 2 KiBps.

Here is a screenshot of the webcam itself looking at a USB desk plasma globe, that you can purchase of ThinkGeek for $10.

Webcam view of a plasma globe in operation.

The data is sent to a FIFO in /tmp/. If you don't do anything with the data, and let the buffer fill, the script will hang, until you read data out of the FIFO. As such, you could do something like this to reseed your CSPRNG (of course, it's not increasing the entropy estimate, just reseeding the generator):

$ < /tmp/webcamrng.fifo > /dev/random

Lava lamps and plasma globes are only the beginning. Anything quantum or chaotic that can be visually observed also works. Things like:

  • Double pendulums
  • Brownian motion
  • Rayleigh-Benard convection
  • CCD noise from the webcam itself
  • A bouncing ball on a sinusoidal vibrating table

So, there you have it. Plasma globes and lava lamps providing sufficiently random data via a webcam, either to be used as a secret seed, or as a TRNG itself. Any other systems that could be used to point a webcam at, or suggestions for improvement in the Python code, let me know in the comments.

CPU Jitter Entropy for the Linux Kernel

Normally, I keep a sharp eye on all things cryptographic-related with the Linux kernel. However, in 4.2, I missed something fantastic: jitterentropy_rng.ko. This is a Linux kernel module that measures the jitter of the high resolution timing available in modern CPUs, and uses this jitter as a source of true randomness. In fact, using the CPU timer as a source of true randomness isn't anything new. If you're read my blog for some time, you're already familiar with haveged(8). This daemon also collects CPU jitter and feeds the collected data into the kernel's random number generator.

The main site is at http://www.chronox.de/jent.html and the PDF paper describing the CPU jitter entropy can be found on the site.

So why the blog post about jitterentropy_rng.ko? Because now that we have something in the mainline kernel, we get a few benefits:

  1. More eyes are looking at the code, and can adjust, analize, and refine the entropy gathering process, making sure it's not to aggressive nor conservative in its approach.
  2. We now have something that can collect entropy much earlier in the boot sequence, even before the random number generator has been initialized. This means we can have a properly seeded CSPRNG when the CSPRNG is initialized.
  3. While not available now, we could have a kernelspace daemon collecting entropy and feeding it to the CSPRNG without the need for extra software.
  4. This isn't just for servers, desktops, and VMs, but anything that runs the Linux kernel on a modern CPU, including Android phones, embedded devices, and SoC.
  5. While haveged(8) has been a good solution for a long time, it has been heavily criticized, and it seems development on it has stalled. Here is another software solution for true randomness without the need of potentially dangerous 3rd party USB random number generators.
  6. You don't need Intel's RDRAND. Any modern CPU with a high resolution timer will work. AMD, SPARC, ARM, MIPS, PA-RISC, Power, etc.

As mentioned in the list, unfortunately, loading the kernel doesn't automatically top-off the entropy estimate of the internal state of the CSPRNG (/proc/sys/kernel/random/entropy_avail). As such, /dev/random will still block when the estimate is low or exhausted. So you'll still need to run a userspace daemon to prevent this behavior. The author has also shipped a clean, light userspace daemon that just reads the data provided by the jitterentropy_rng.ko kernel module, and uses ioctl(2) to increase the estimate. The jitterentropy_rng.ko module provides about 10 KBps of random data.

Again, this isn't anything that something like haveged(8) doesn't already have access to. However, by taking advantage of a loaded kernel module, we can ensure that randomness is being collected before the CSPRNG is initialized. So, when CSPRNG initialization happens, we can ensure that it is properly seeded on first boot, minimizing the likelihood that exact keys will be created on distinct systems. This is something haveged(8) can't provide, as it runs entirely in userspace.

Unfortunately, jitterentropy-rngd(8) isn't available in the Debian repositories yet, so you'll need to download the compressed tarball from the author's website, manually compile and install yourself. However, he does ship a systemd(8) service file, which makes it easy to get the daemon up and running on boot with minimal effort.

I've had the jitterentropy_rng.ko module installed with the jitterentropy-rngd(8) userspace daemon running all day today, without haveged(8), and needless to say, I'm pleased. It keeps the CSPRNG entropy estimate sufficiently topped off for software that still relies on /dev/random (please stop doing this developers- start using /dev/urandom please) and provides adequate performance. Near as I can tell, there is not a character device created when loading the kernel module, so you can't access the unbiased data before feeding it into the CSPRNG. As such, I don't have a way to test its randomness quality. Supposedly, there is a way to access this via debugfs, but I haven't found it.

Anyway, I would recommend using jitterentropy_rng.ko and jitterentropy-rngd(8) over haveged(8) as the source for your randomness.

Weechat Relay With Let's Encrypt Certificates

I've been on IRC for a long time. Not as long as some, granted, but likely longer than most. I've had my hand in a number of IRC clients, mostly terminal-based. Yup, I was (shortly) using the ircII client, then (also shortly) BitchX. Then I found irssi, and stuck with that for a long time. Search irssi help topics on this blog, and you'll see just how long. Then, after getting hired at XMission in January 2012, I switched full-time to WeeChat. I haven't looked back. This IRC client is amazing.

One of the outstanding features of WeeChat is the relay, effectively turning your IRC client into a bouncer. This feature isn't unique- it's in irssi also. However, the irssi proxy does not support SSL (2009). The WeeChat relay does. And with Let's Encrypt certificates freely available, this is the perfect opportunity to use TLS with a trusted certificate.

This post assumes that you are running WeeChat on a box that you can control the firewall to. In my case, I run WeeChat on an externally available SSH server behind tmux. With Let's Encrypt certificates, you will need to provide a FQDN for your Common Name (CN). This is all part of the standard certificate verification procedure. I purchased a domain that points to the IP of that server, and you will need to do the same.

The official Let's Encrypt "certbot" package used for creating Let's Encrypt certificates is already available in Debian unstable. A simple "apt install certbot" will get that up and running for you. Once installed, you will need to create your certificate.

$ certbot certonly --standalone -d weechat.example.com -m aaron.toponce@gmail.com

Per Let's Encrypt documentation, you needs ports 80 and 443 open to the world when creating and renewing your certificate. The execution will create four files:

# ls -l /etc/letsencrypt/
total 24
drwx------ 3 root root 4096 May 19 12:36 accounts/
drwx------ 3 root root 4096 May 19 12:39 archive/
drwxr-xr-x 2 root root 4096 May 19 12:39 csr/
drwx------ 2 root root 4096 May 19 12:39 keys/
drwx------ 3 root root 4096 May 19 12:39 live/
drwxr-xr-x 2 root root 4096 May 19 12:39 renewal/
# ls -l /etc/letsencrypt/live/weechat.example.com/
total 0
lrwxrwxrwx 1 root root 43 May 19 12:39 cert.pem -> ../../archive/weechat.example.com/cert1.pem
lrwxrwxrwx 1 root root 44 May 19 12:39 chain.pem -> ../../archive/weechat.example.com/chain1.pem
lrwxrwxrwx 1 root root 48 May 19 12:39 fullchain.pem -> ../../archive/weechat.example.com/fullchain1.pem
lrwxrwxrwx 1 root root 46 May 19 12:39 privkey.pem -> ../../archive/weechat.example.com/privkey1.pem

The "cert.pem" file is your public certificate for your CN. The "chain.pem" file in the Let's Encrypt intermediate certificate. The "fullchain.pem" file is the "cert.pem" and "chain.pem" files combined. Of course, the "privkey.pem" file is your private key. For the WeeChat relay, it needs the "privkey.pem" and "fullchain.pem" files combined into a single file.

Because the necessary directories under "/etc/letsencrypt/" are accessible only by the root user, you will need root access to copy the certificates out and make them available to WeeChat, which hopefully isn't running as root. Also, Let's Encrypt certificates need to be renewed no sooner than every 60 days and no later than every 90 days. So, not only will you want to automate renewing the certificate, but you'll probably want to automate moving it into the right directory when the renewal is complete.

As you can see from above, I setup my certificate on a Thursday at 12:39. So weekly, on Thursday, at 12:39, I'll check to see if the certificate needs to be nenewed. Because it won't renew any more frequently than every 60 days, but I have to have it renewed every 90 days, this gives be a 30-day window in which to get the certificate updated. So, I'll keep checking weekly. If a renewal isn't needed, the certbot(1) tool will gracefully exit. If a renewal is needed, the tool will update the certificate. Unfortunately, certbot(1) does not provide a useful exit code when renewals aren't needed, so rather than parsing text, I'll just copy the new certs into my WeeChat directory, regardless if they get updated or not.

So, in my root's crontab, I have the following:

39 12 * * 4 /usr/local/sbin/renew.sh

Where the contents of "/usr/local/sbin/renew.sh" are:

#!/bin/bash

certbot renew -q
cat /etc/letsencrypt/live/weechat.example.com/privkey.pem \
    /etc/letsencrypt/live/weechat.example.com/fullchain.pem > \
    ~aaron/.weechat/ssl/relay.pem
chown aaron.aaron ~aaron/.weechat/ssl/relay.pem

Now the only thing left to do is setup the relay itself in WeeChat. So, from within the client:

/relay sslcertkey
/relay add ssl.weechat 8443

You will need port 8443 open in your firewall, of course.

That's it. I have had some problems with certificate caching in WeechatAndroid it seems. So far, I have had to manually restart the relay in WeeChat, and flush the cache in WeechatAndroid and restart it to get the new certificate (I was previously using a self-signed certificate). Hopefully, this can also be automated, so I don't have to manually keep restarting the relay in WeeChat and flushing the cache in WeechatAndroid.

Regardless, this is how you use Let's Encrypt certificates with WeeChat SSL relay. Hopefully this is beneficial to someone.

Say Allo To Insecurity

Yesterday, Google announced two new encrypted messaging apps called "Allo" and "Duo". There has been some talk about the security of Allo's end-to-end encryption and incognito mode. Most of it was speculation, until Thai Duong blogged about it. Well, it's time to see what he said, and see if Allo stands up to scrutiny.

"Allo offers two chat modes: normal and incognito. Normal is the default, but incognito can be activated with one touch. I want to stress that both modes encrypt chat messages when they are in transit or at rest. The Allo clients talk to Google servers using QUIC or TLS 1.2. When messages are temporarily stored on our servers waiting for delivery they are also encrypted, and will be deleted as soon as they're delivered."

There are a few things in this paragraph that need some explanation. First, "both modes encrypt chat messages when they are in transit or at rest". This is good, but the devil is in the details. In transit, Thai explains how they're encrypted: "Allo clients talk to Google servers using QUIC or TLS 1.2". This has a couple of ramifications. First, this isn't end-to-end encryption (E2E). This is client-server encryption, which means both the server and the client are encrypting and decrypting the data. As a result, any Google employee with the appropriate privileges can read the messages delivered to Google servers. That's sort of the point of why E2E encryption exists- to prevent this from happening.

Second, kudos for storing the messages encrypted on disk. But, realize that Google has the master key to decrypt these messages if needed. Also, kudos for deleting them off of Google's servers as soon as they're delivered. However, just like VPN service providers promising they don't log your connections, Google promising not to log your message sort of falls into this category. That is, although they might not be storing the messages right now, they may store them later, especially if presented with a warrant from law enforcement. So, Google promising to not store your messages really doesn't amount to much other than maybe they don't want to unnecessarily chew through disk unless forced. Just remember, Google isn't going to go to jail for you, so they will comply with law enforcement.

"In normal mode, an artificial intelligence run by Google (but no humans including the Allo team or anyone at Google) can read your messages. This AI will use machine learning to analyze your messages, understand what you want to do, and give you timely and useful suggestions. For example, if you want to have dinner, it'll recommend restaurants or book tables. If you want to watch movies, it can buy you tickets.

Like it or not, this AI will be super useful. It's like having a personal assistant that can run a lot of errands for you right in your pocket. Of course, to help it help you you'll have to entrust it with your chat messages. I really think that this is fine, because your chat messages are used to help you and you only, and contrary to popular beliefs Google never sells your personal information to anyone."

Herein lies the real reason why E2E is not enabled by default- Google would like to mine your messages, on your phone, and present you with real-time ads. Ads not just on your phone, but likely when you're logged into your Google account on your desktop or laptop as well. If the data is E2E encrypted, this poses a problem for a company that has made Big Bucks off of advertising. With incognito mode, you are enabling E2E encryption, and the AI no longer has access to this data. Application and browser ads become generic, or must get their mining elsewhere. Because Google is allowing an AI to mine Allo messages for targeted ads, could it be possible that this same AI could be mining other data on your phone for the same goal? Could this AI be mining your email, Twitter, Facebook, photos, and other data? Will this AI be shipping solely with Allo, or will it be a separate service in Android N?

While Google might not be selling your data, they are making a percentage of sales that come from ads. The more targeted the ads become, the more likely you are to make a purchase, and the more likely Google will be to get a percentage of that sale. Google isn't selling your data, but they are making money off of it.

"But what if I want to stay off the grid? What if I don't want even the AI or whatever to see my messages?

"That's fine. We understand your concerns. Everybody including me has something to hide. This is why we develop the incognito mode. In this mode, all messages are further encrypted using the Signal protocol, a state of the art end-to-end chat encryption protocol which ensures that only you and your recipients can read your messages."

WhatsApp, acquired by Facebook, and pushing nearly one billion active messaging accounts, recently enabled E2E encryption also with the Signal Protocol. The difference being, with WhatsApp, E2E is default for every account when they update their app. E2E is not default for Allo, and only enabled for incognito mode. So, if "everybody including me has something to hide", then why isn't E2E default with Allo?

Thai then quotes a survey explaining that users want self-destructing messages more than E2E. He explains that survey with (emphasis mine):

"So to most users what matters the most is not whether the NSA can read their messages, but the physical security of their devices, blocking unwanted people, and being able to delete messages already sent to other people. In other words, their threat model doesn't include the NSA, but their spouses, their kids, their friends, i.e., people around and near them. Of course it's very likely that users don't care because they don't know what the NSA has been up to. If people know that the NSA is collecting their dick pics, they probably want to block them too. At any rate, NSA is just one of the threat sources that can harm normal users."

Sure, my threat model is also losing my phone. I find that much more likely than either the NSA confiscating my phone, issuing a warrant to collect my data, or decrypting my traffic in real-time (which isn't practical anyway). However, while the NSA isn't in my threat model, the NSA should be in Google's threat model. In other words, Google should be worrying about the NSA for me.

This is why I created d-note is is running at https://secrets.xmission.com. As a system administrator, I don't want to turn over logs to the NSA or any other organization. As such, the messages are encrypted server-side before stored to disk, and destroyed immediately upon viewing. The goal isn't necessarily to protect the end user, but to protect the server administrator. By legitimately not being able to provide logs or data when a warrant is issued is extremely valuable.

Google should be protecting the "dick pics" of users from getting into the NSA hands. Apple recently made a strong stand here against the FBI regarding Syed Farook's iPhone. Apple technically could not help the FBI, because of the protections that Apple baked into their product. Apple's hands were tied. As such, the FBI wanted to set a precedent about enabling government backdoors into the OS for future releases, so they would no longer be blocked from access. Apple is protecting the "dick pics" of its users from the NSA, FBI, and everyone else. Why isn't Google? As we mentioned earlier, the answer to that question is data mining and advertising revenue.

"This is why I think end-to-end encryption is not an end in itself, but rather a means to a real end which is disappearing messaging. End-to-end encryption without disappearing messaging doesn't cover all the risks a normal user could face, but disappearing messaging without end-to-end encryption is an illusion. Users need both to have privacy in a way that matters to them."

Emphases mine. So, Thai recognizes that disappearing messaging without E2E encryption is an illusion. So, why isn't it default? The higher powers that be, likely. He mentions in his conclusion that he would like E2E to be default, with a single tap. Something of an option with "Always start in incognito", thus always starting with E2E and always having self-destructing messages. However, rather than opt-in, it should be opt-out. If the prior message history is more important to you than the security of E2E encryption and self-destructing messages, then it should be something that you switch. If SnapChat is so popular because of self-destructing massages, and WhatsApp has one billion users with E2E encryption be default, Google, a company larger than both combined, should be able to do the same.

Finally, one point that Thai does not mention in his post. Allo is proprietary closed-source software. From a security perspective, this is problematic. First, because you don't have access to the source, you cannot audit it to make sure it holds up to the security claims that it has. As security and software engineers, not having access to the source code should be a major block when considering the use of non-free software.

Second, without access to the source code, you cannot create reproducible builds. Even if you did have access to the source code, are you sure the binary you have installed matches the binary you can build? If not, how do you know the binary isn't spying on you? Or compromised? Or just compiled incorrectly, causing undesired behavior? Not being able to create reproducible builds of software means not being able to verify the integrity of the shipped binary. Debian is making it a high priority to ship packages with reproducible builds. It's important to Debian, because they want to be transparent with their userbase. If you don't trust Debian is doing what they claim, you can rebuild the binaries and packages yourself, and they should match what Debian has shipped.

I know this sounds very Richard Stallman and GNU, but proprietary closed-source software is scary when it comes to security. While your immediate threat model might just be those you interact with on a daily basis, the immediate threat model to Google, Apple, SnapChat, and others, are well-funded organizations that have legal weight. Ultimately, they're after your data, which in the end, puts them in your threat model. There are no safety or security guarantees with proprietary closed-source software. You are at the mercy of the software vendor to Do The Right Thing, and many companies just don't.

So, while Allo might be the new kid on the block with E2E encrypted and self-destructing messages, as I've shown, it can't be trusted for your security and privacy. You're best off ignoring it, not recommending it to family and friends, and sticking with Free Software alternatives where E2E messages are default.

How To Always Encrypt Chromium Saved Passwords On GNU/Linux - No Matter What

One of the things that has always bothered me about the Chromium project (the project the Google Chrome browser is based on) is that passwords are encrypted, if and only if your operating system provides an authentication API through your account login. For example, on Windows, is is accomplished through the "CryptProtectData" function. This function uses your existing account credentials when logging into your computer, as a "master key" to encrypt the passwords on your hard drive. For Mac OS X, this is accomplished with Keychain, and with GNU/Linux users, KWallet if you're running KDE or GNOME Keyring if you're running GNOME.

In all those cases, your saved passwords will be encrypted before getting saved to disk. But, what if you're like me, and do not fall into any of those situations? Now, granted, GNU/Linux and BSD users (you're welcome) make up about 3% of the desktop installs.

Graph showing operating system market share.

Of that 3%, although I don't have any numbers, maybe 2/3 run GNOME or KDE. That leaves 1 out of every 100 users where Chromium is not encrypting passwords on disk by default. For me, who lands in that 1%, this is unacceptable. So, I wanted a solution.

Before I go any further, let me identify the threat and adversary. The threat is offline disk analysis. I'm going to assume that you're keeping your operating system up-to-date with the latest security patches, and that your machine is not infected with malware. Instead, I'm going to assume that after you are finished using your machine, upgrading the hardware, or a hard drive fails, that the disk is discarded. I'm further going to assume that you either can't or didn't digitally wipe or physically destroy the drive once decommissioned. So, the threat is someone getting a hold of that drive, or laptop, or computer, and imaging the drive for analysis. This means that our adversary is a global adversary- it could be anyone.

Now, the obvious solution would be to run an encrypted filesystem on that drive. dm-crypt with or without LUKS makes this possible. But, let's assume you're not running FDE. Any options? In my case, I run eCryptfs, and store the Chromium data there, symbolically linking to it from the default location.

By default, Chromium stores its passwords in ~/.config/chromium/Default/Login\ Data. This is an SQLite 3.x database, and as mentioned, the passwords are stored in plaintext. A simple solution is to create an eCryptfs private directory, and symlink the database to that location. However, Chromium also stores cookies, caches, and other data in ~/.config/chromium/ that might be worth encrypting as well. So, you can just symlink the entire ~/.config/chromium/ directory to the eCryptfs mount.

I'll assume you've already setup eCryptfs and have it mounted to ~/Private/. If not, run the "ecryptfs-setup-private" command, and follow the prompts, then run "ecryptfs-mount-private" to get it mounted to ~/Private/.

Make sure Chromium is not running and move the ~/.config/chromium/ directory to ~/Private/. Then create the necessary symlink, so Chromium does not create a new profile:

$ mv ~/.config/chromium/ ~/Private/
$ ln -s ~/Private/chromium/ ~/.config/

At this point, all your Chromium data is now stored in your eCryptfs encrypted filesystem, and Chromium will follow the symlink, reading and writing passwords in the encrypted mount. This means, no matter if using KWallet or GNOME Keyring, or nothing at all, your passwords will be always be encrypted on disk. Of course, in the SQLite 3.x database, the passwords are still in plaintext, but the database file is encrypted in eCryptfs, thus giving us our security that we're looking for.

However, there is a caveat which needs to be mentioned. The entire security of the encryption rests solely on the entropy of your eCryptfs passphrase. If that passphrase does not have sufficient entropy to withstand a sophisticated attack from a well-funded organization (our global adversary), then all bets are off. Essentially, this eCryptfs solution is acting like a "master password", and all encryption strengths rests on your ability to use a strong password defined by Shannon entropy. Current best-practice to guard against an offline password cracking attack, is to pick a password with at least 128-bits of entropy. You can use zxcvbn.js from Dropbox to estimate your passphrase entropy, which I have installed at http://ae7.st/ent/ (no, I'm not logging passphrases- save the page offline, pull your network cable and run it locally if you don't believe me).

Opera, VPNs, and Security

Yesterday, Opera announced that they are bundling a VPN with the latest release of their browser. This is what the release says:

Why we are adding free VPN in Opera

Bringing this important privacy improvement marks another step in building a browser that matches up to people’s expectations in 2016. When you think about it, many popular options offered by desktop browsers today were invented (quite frequently by Opera) many years ago. The innovation energy in the industry has been recently so focused on mobile, even if the desktop is still thriving.

In January, we were reviewing our product plans, and we realized that people need new features in order to browse the web efficiently in 2016. It also became apparent to us that what people need are not the same features that were relevant for their browsers ten years ago. This is why we today have more engineers than ever before working on new features for our desktop browser.

So far we have the native ad blocker. And, we’re introducing another major feature in just a matter of a few weeks; a native, unlimited and free VPN client, right inside your browser!

Enhanced privacy online with Opera’s free VPN

According to Global Web Index*, more than half a billion people (24% of the world’s internet population) have tried or are currently using VPN services. According to the research, the primary reasons for people to use a VPN are:

– To access better entertainment content (38%)
– To keep anonymity while browsing (30%)
– To access restricted networks and sites in my country (28%)
– To access restricted sites at work (27%)
– To communicate with friends/family abroad (24%)
– To access restricted news websites in my country (22%)

According to the research, young people are leading the way when it comes to VPN usage, with almost one third of people between 16-34 having used a VPN.

Better than traditional VPNs

Until now, most VPN services and proxy servers have been limited and based on a paid subscription. With a free, unlimited, native VPN that just works out-of-the-box and doesn’t require any subscription, Opera wants to make VPNs available to everyone.

That’s why Opera’s built-in free VPN feature is easy to use. To activate it, Mac users just need to click the Opera menu, select “Preferences” and toggle the feature VPN on, while Windows and Linux users need to go to the “Privacy and Security” section in “Settings” and enable VPN there. A button will appear in the browser address field, from which the user can see and change location (more locations will appear later), check whether their IP is exposed and review statistics for their data used. It’s free and unlimited to use, yet it offers several must-have options available in paid VPNs, such as:

  • Hide your IP address – Opera will replace your IP address with a virtual IP address, so it’s harder for sites to track your location and identify your computer. This means you can browse the web more privately.
  • Unblocking of firewalls and websites – Many countries, schools and workplaces block video-streaming sites, social networks and other services. By using a VPN you can access your favorite content, no matter where you are.
  • Public Wi-Fi security – When you’re surfing the web on public Wi-Fi, intruders can easily sniff data. By using a VPN, you can improve the security of your personal information.

There were a couple things that stuck out to me rather quickly when reading this press release:

  1. Is it a true VPN, or just an HTTP proxy?
  2. If either a VPN or an HTTP proxy, how is it handling DNS requests?
  3. If an HTTP proxy, is the request through a transparent TLS connection to Opera?
  4. Why is the press release specifically absent about logs and tracking?

Well, some of these questions have been answered. First, it's not a true VPN. Instead, it's just an HTTP/HTTPS proxy. Here's the details:

How the “VPN” works

Once the user enables the feature in settings, Opera VPN sends API requests to https://api.surfeasy.com to obtain credentials and proxy IPs. The browser then talks to a proxy like de0.opera-proxy.net, and its IP address can only be resolved from within Opera when the VPN feature is turned on. It’s an HTTP/S proxy that requires authentication.

When the Opera browser with enabled VPN loads a page, it sends many requests to de0.opera-proxy.net with a Proxy-Authorization request header.

The Proxy-Authorization header decoded:
CC68FE24C34B5B2414FB1DC116342EADA7D5C46B:9B9BE3FAE67
4A33D1820315F4CC94372926C8210B6AEC0B662EC7CAD611D86A3

Since we’re talking about a proxy, these credentials can be used with de0.opera-proxy.net even when connecting from a different machine. This means that if you use the proxy on a computer with no Opera installed, you’ll get the same IP as when using Opera’s VPN.

From this, we can learn that it's not a VPN at all. In fact, it's not even deploying a TLS tunnel for the HTTP/S proxy. So, traditional HTTP requests will still be in the clear, just with a different target. So while a school or library might be filtetring requests based on DNS, this HTTP/S proxy in Opera doesn't address more active smart filtering based on content.

Unfortunately, Help Net Security also suggests the use of a general VPN service provider (emphasis mine):

"What Opera offers is not a VPN as such. It's just a proxy for the browser. You still need a full VPN if privacy is what you care about (and you should care about your privacy). Other tools you use, including for example email clients like Outlook, won’t use this 'VPN'," Špaček told Help Net Security.

VPN service providers are scary. Sven Slootweg posted a "Don't use VPN services" Gist where he addresses some real concerns with using VPN service providers (I don't agree with a couple points):

  • VPN service providers log connections and other metadata.
  • VPN service providers have full accounting and payment information of their customers.
  • VPNs really are just glorified proxies, and don't provide any meaningful security or privacy.
  • VPNs don't obfuscate your IP address like Tor, and your IP address is meaningless to trackers anyway.
  • VPN service providers exist, because it's easy money.

I don't fully agree with a couple points (IP addresses are extremely valuable to trackers), but I think the overall topics Sven is trying to drive home, are the following: know how VPNs work, who has access to data at the VPN endpoint(s), and your security and privacy risks when using a VPN. There are valid times when using a VPN. Data is encrypted between your VPN client and the provider, so it is an easy way to get around restrictive firewalls, which you would think Opera would be trying to address with their HTTP/S proxy. You may also need to access your corporate internal network when "on the road", in which case using your corporate VPN server is needed.

But in both cases, understand the security and privacy concerns when using the VPN. Your VPN provider isn't going to go to jail for you. If the FBI catches unsavory traffic coming out of the VPN provider, you can rest assured they'll give the authorities all the logging, account, and payment information to comply with the request. You can rest assured that if your employer catches you breaking policy with the VPN, you will lose your VPN access, and possibly your job.

So, what to do? Well, realistically, if you want to obfuscate your traffic dynamically, security, and pseudoanonymously, then use Tor. Install a Tor client on your machine, install a Tor proxy extension in your browser, and when you want to get around restrictive firewalls, flip the proxy switch, and get on Tor.

Of course, Tor isn't a security and privacy panacea. You still need to understand the risks associated with using Tor. For example, the extension you installed may not tunnel DNS requests through Tor. Of course, HTTP traffic is still in the clear when it leaves the Tor exit relay. Tor clients and extensions may contain vulnerabilities that reveal metadata about you. Basically, don't be ignorant or stupid with your Tor connection.

Regardless, I think we can take a few things away from this post:

  1. The Opera VPN is just an HTTP/S proxy.
  2. Opera is very likely logging all your traffic.
  3. Your Opera VPN browsing habits are likely unique enough for Opera to identify you.
  4. VPN service providers should be avoided.
  5. VPN service providers also are likely logging all your traffic.
  6. Your VPN service provider won't go to jail for you.
  7. When in doubt, use Tor, just understand the risks.

Tor and the CloudFlare Problem

Before I go anywhere with this post, let me make three things very clear:

  1. I do not work for CloudFlare.
  2. I work for a small local ISP in Utah.
  3. I have been using Tor probably almost as long as many of you have been alive.

I first blogged about Tor in 2006. I had discovered it around 2004, only a couple of years after it's first release. I had used it as a way to prevent my ISP and my employer from tracking what I'm doing with my Internet connection. I would setup a simple SOCKS proxy in Firefox, then switch to it when I wanted to get on the Tor network, and switch away from it when I didn't. Oh, and you think latencies are bad on Tor now? You should have been on it back then.

Here is a metrics graph showing the time it took to download a 50 KiB file over the Tor network. Unfortunately, they don't have the data back when I started using the network, but you get a rough idea of what it was like:

Graph from metrics.torproject.org showing the latencies of downloading a 50 KiB file.

This makes a good deal of sense, because back then, ISPs didn't provide a lot of bandwidth to customers (it can be argued they still don't), and there wasn't a lot of exit nodes in the Tor network to handle the bandwidth (again, it can be argued there still isn't enough):

Graph from metrics.torproject.org showing the bandwidth of of relays in the network.

Graph from metrics.torproject.org showing the size of the Tor network

Spend some time on metrics.torproject.org looking over the historical data, and you'll get a good sense that using Tor in 2004 was a lot like getting data over dial-up. It was anything but pleasant.

What's the point? The point is, that while things can still be improved (we need more exit nodes, and we need more bandwidth on each exit node), the Tor network latencies, bandwidth, and relays is in a good position compared to 12 years ago when I started using it. So running large-scale attacks through the network is now practical.

So, where does CloudFlare fit into this? CloudFlare deploys solving captchas when you wish to consume a service behind the CloudFlare CDN. For example, while connected to Tor, visit medium.com, and you will be presented with a captcha, similar to something like this:

Screenshot of Chromium showing the need to solve a visual coptcha while trying to browse medium.com while connected to Tor.

This has gotten a lot of criticism from the "cypherpunk" millennials who feel that Tor access should be unrestricted. If you follow the "#dontblocktor" hashtag on Twitter, you will see the continued repeated criticism of CloudFlare deploying these captchas to Tor users on their CDN. Some of the arguments include:

  • Solving the captcha may only bring up another, repeatedly, never being able to consume the website in question.
  • Visually impaired users cannot solve the visual captchas.
  • Non-native English speakers will not be able to solve the audio version of the cpatcha.
  • People using browsers that disable JavaScript will not be able to reach the page.
  • There may be other security concerns where the choice of Tor is preferred over not using Tor.

No doubt, all captchas on the Web should be reconsidered. Personally, for JavaScript enabled browsers, I think forcing a proof-of-work puzzle onto the browser is transparent, and provides exactly the sort of rate-limiting needed for mitigating large-scale malicious attacks. For non-javascript puzzles, captchas seem to be the best alternative. But, I'm sure as a society, can can find alternatives to non-javascript browsers (such as network-based proof-of-work puzzles).

No doubt physical limitations, such as visual or audible impairments, can make solving a visual and audible captcha challenging, if not impossible. I don't have good solutions here except for JavaScript-based proof-of-work puzzles. But the real question that need to be addressed, is why is CloudFlare deploying captchas for Tor users?

CloudFlare addressed this due to the on-going criticism a select few on Twitter have giving the company. The blog post "The Trouble with Tor" basically comes down to the following:

  1. You must pick two between: security, anonymity, and convenience.
  2. CloudFlare is a large CDN that deals regularly with malicious traffic sourced from Tor exit relays.
  3. Captchas are a compromise, allowing Tor users to remain anonymous, while also getting access to the website.
  4. A CloudFlare CDN customer has an option in their control panel to whitelist Tor or captcha Tor.
  5. CloudFlare is investigating "blind token" proof-of-work client puzzles for something long-term.

I don't see anything unreasonable here. As a system administrator and security engineer for XMission, I understand and sympathize with CloudFlare's stance toward captachas, even if I don't agree with the implementation of the captcha itself. I have had to fight off malicious Tor traffic from our network many times during my employment, such as DNS and NTP amplification attacks, HTTP POST DDoS attacks, SQL injection and XSS attacks, and many others.

So, even as CloudFlare put it in their reasonable post, how do you allow honest Tor users with high degrees of convenience to consume the website while also minimizing and proactively mitigating malicious Tor traffic?

Again, I don't care for captchas, and wish they would die in a fire. But, what should CloudFlare do? Should they abandon the captcha altogether? If so, how should they proactively prevent malicious Tor traffic from negatively impacting their customer base? It's easy and knee-jerky to post screenshots to Twitter with the "#dontblocktor" hashtag, and shame CloudFlare and the customer using the CDN. I don't think that's the right approach, personally (nevermind that a captcha isn't a block (yes, semantics are important)). I'm curious how many of those who are reacting to CloudFlare captchas are actual system or network administrators that have to deal with these attacks. Instead, I would try to architect solutions to the problem.

Personally, I see the following:

  • Consume CloudFlare without Tor. There are no captchas, but you sacrifice a level of anonymity.
  • Consume CloudFlare behind Tor, but understand the compromise you are making to solve captchas sacrificing convenience.
  • Consume CLoudFlare beind a VPN, thus providing both anonymity and convenience.

If it really bothers you that you have to solve a captcha to reach a CloudFlare website, then rather than shaming CloudFlare, it might be worth your time to reach out to the site operator, and let them know about whitelisting Tor. If they engage in conversation, they may not have been aware of the configuration option, or they may have reasons why they want you to solve the captcha. Either way, you've come out ahead without the knee-jerking of #dontblocktor.

I guess in conclusion, while I hate captchas as much as the next guy, what would you do if you were employed by CloudFlare and in charge of this problem? What is a reasonable solution to keeping customers happy by mitigating malicious Tor traffic while also allowing honest Tor users to consume the website with high levels of convenience? Let's engage in discussion about how to create and architect these solutions, so we get as many people happy as possible- CloudFlare network admins, customers, and clients.

A final note about the term "block". The CloudFlare captcha is not blocking you from the reading the website. Instead, it's rate-limiting you. Some will argue that you get caught in endless captcha loops, consistently solving them over and over, never to actually reach the service. Personally, I have never encountered this, but others swear it exists. At most, I've had to solve 3 captchas in a row, usually because I did not solve them quick enough. I guess the effect is the same, but as already mentioned, the "#dontblocktor" hash tag is a knee-jerk, and incorrectly placed. Semantics are important, because CloudFlare is not actually blocking Tor, like Akamai does with "Access denied". It's one thing to provide a 502 HTTP error, it's quite another to rate limit requests.

Two OCB Block Cipher Mode Patents Expired Due To Nonpayment

Peter Gutmann on the "[Cryptography]" mailing list wrote some thoughts about the impending crypto monoculture of all-things-Bernstein that seems to be currently sweeping the crypto world. In his post, he mentions the following (emphasis mine):

The remaining mode is OCB, which I'd consider the best AEAD mode out there (it shares CBC's graceful-degradation property in which reuse or misuse of the IV doesn't lead to a total loss of security, only the authentication property breaks but not the confidentiality). Unfortunately it's patented, and even though there are fairly broad exceptions allowing it to be used in many situations, the legal minefield that ensues makes it untouchable for most potential users. For example does the prohibition on military use cover the situation where an open-source crypto package is used in a vendor library that's used in a medical insurance app that's used by the US Navy, or where banking transactions protected by TLS may include ones of a military nature (both of these are actual examples that affected decisions not to use OCB). Since no-one wants to call in lawyers every time a situation like this comes up, and indeed can't call in lawyers when the crypto is several levels away in the service stack, OCB won't be used even though it may be the best AEAD mode out there.

Dr. Matthew Green also wrote about authenticated encryption and block cipher modes. He had this to say about OCB mode (emphasis mine):

In performance terms Offset Codebook Mode blows the pants off of all the other modes I mention in this post. It's 'on-line' and doesn't require any real understanding of Galois fields to implement** -- you can implement the whole thing with a block cipher, some bit manipulation and XOR. If OCB was your kid, he'd play three sports and be on his way to Harvard. You'd brag about him to all your friends.

I've known that OCB mode was patented, and as a result, why it has not been included in OpenSSL and other cryptographic protocol implementations. Peter said it correctly, it is a legal minefield. However, I wanted to read up on the patents, their design, operation, etc., mostly because I wanted to get out of doing the dishes. Discover my shock when I stumbled upon the following:

  • Patent 7,046,802 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Lapsed
  • Patent 7,200,227 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Lapsed

Not fully understanding what "Lapsed" means, I went to the official source: The United States Patent and Trademark Office website. I searched for those two patent numbers, and got the following:

  • Patent 7,046,802 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Patent Expired Due to NonPayment of Maintenance Fees Under 37 CFR 1.362
    • Status Date: 06-06-2014
  • Patent 7,200,227 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Patent Expired Due to NonPayment of Maintenance Fees Under 37 CFR 1.362
    • Status Date: 05-04-2015

Sure enough, Phillip Rogaway's first two patents regarding the OCB block cipher mode of encryption are expired due to nonpayment. I had to tweet this:

Patents 7949129 (Method and apparatus for facilitating efficient authenticated encryption) and 8321675 (Method and apparatus for facilitating efficient authenticated encryption) are still valid however. I'm not sure how this applies to the Charanjit Jutla's IAPM mode patents now owned by IBM. Also, I don't know exactly what OCB modes patents 7,046,802 and 7,200,227 cover. OCB1 and OCB2? if someone can comment here, that would be great.

So, what does this mean for the cryptography world? It means that OCB covered by those two patents can now be implemented royalty-free, without fear of legal entanglements, in Free Software as well as proprietary and commercial software. OpenSSL, LibreSSL, BoringSSL, OpenPGP, Open Whisper Systems Signal, and so many other protocols, projects, and software should be able to implement OCB now.

All because Phillip Rogaway did not make the payments necessary to keep the patent valid. Two more software patents bite the dust.

Linux Kernel CSPRNG Performance

I'm hardly the first one to notice this, but I was having a discussion in ##crypto on Freenode about the Linux kernel CSPRNG performance. It was mentioned that the kernelspace CSPRNG was "horrendously slow". Personally, I found the performance sufficient for me needs, but I decided to entertain his definition. I'm glad I did; I wasn't disappointed.

Pull up a terminal, and run the following command, passing 10GB of data from /dev/urandom to /dev/null:

$ dd if=/dev/urandom of=/dev/null bs=1M count=1024 iflag=fullblock  
1024+0 records in  
1024+0 records out  
1073741824 bytes (1.1 GB) copied, 80.1537 s, 13.4 MB/s
$ pv < /dev/urandom > /dev/null # cancel in a different terminal, unless you have "-S"
1.02GB 0:01:20 [13.3MB/s] [                   < =>                              ]

13.4 MBps of throughput for reading data directly out of the kernelspace CSPRNG. But, can we do better?

In the ##crypto channel, and as should be across development mailing lists, forums, groups, and discussion channels, I recommend that developers should not generally develop their own userspace CSPRNG. There are all sorts of pitfalls and traps waiting for you when you attempt it. Unless you know what you're doing, you could end up with a CSPRNG that isn't actually cryptographically secure (the "CS" in "CSPRNG").

However, what happens when I do actually run a userspace CSPRNG on the same machine? What can I expect out of performance? For example, I could implement AES-128 in CTR mode as a CSPRNG. In fact, we can do this with OpenSSL:

$ dd if=/dev/zero bs=10M count=1024 iflag=fullblock 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 15.3137 s, 701 MB/s
$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null
31.9GB 0:00:34 [ 953MB/s] [                                  < =>               ]

700-950 MBps (notice that dd(1) incurs a performance penalty). That's 52-70x the speed of reading the kernelspace CSPRNG directly. That's more than a full order of magnitude faster. However, this is on a box with AES-NI. What about disabling AES-NI on the same box? How badly does it damage performance, and how does it compare to reading the kernelspace CSPRNG? We can use OpenSSL speed(1SSL) to benchmark algorithms.

First, with AES-NI enabled:

$ openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr     468590.43k  1174849.02k  1873606.83k  2178642.60k  2244471.47k

And with AES-NI disabled:

$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)  
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      74272.21k    83315.43k   340393.30k   390135.47k   391279.96k

In this case, we see about a 5x performance improvement when using the AES-NI instruction set as compared to when not using it. That's significant. And even with AES-NI disabled in userspace, we're still outperforming /dev/urandom by almost 30x.

Interestingly enough, even the OpenBSD CSPRNG (different hardware than previously tested), which uses ChaCha20, outperforms the Linux CSPRNG (although its userspace CSPRNG with openssl(1) doesn't outperform kernelspace):

% dd if=/dev/urandom of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 13.630 secs (78775541 bytes/sec)
% dd if=/dev/zero bs=1M count=1024 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes transferred in 33.498 secs (32052998 bytes/sec)
% openssl speed -elapsed -evp aes-128-ctr 2> /dev/null
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      41766.37k    46930.74k    49593.54k    50669.32k    50678.33k

Roughly 78 MBps for OpenBSD on an Intel Xeon CPU running at 2.80GHz. Basically, six times the speed of the Linux kernel CSPRNG on an Intel Xeon CPU running at 2.67GHz.

So why is the Linux CSPRNG so slow? And, what can we do about it? Well, first, the kernel is using SHA-1 for its cryptographic primitive. In very loose terms, the CSPRNG hashes the input pool with SHA-1, and spits out the output to /dev/urandom. It's output is also its input, so its digesting its own output.

But, that's not all it's doing actually. The first function actually adds data into the input pool without increasing the entropy estimate. Then, after adding those bytes, the input pool is mixed with a Skein-like mixing function. Then some math is done to credit the entropy estimator, and the system is polled for data to add to the input entropy pool. Things like disk IO, CPU timings, interrupts, and user activity. Finally, we're ready to hash the data. This is done by extracting the data out of the input pool, and hashing it with SHA-1. But, we don't want any recognizable output, so the output is left-rotated and folded in half. Then, and only then, is the data ready for consumption.

W.T.F.

Unfortunately, the Linux kernel CSPRNG is not based on any sound theoretical security design. It's very much a hodge-podge home-brew design by developers who think they know what they're doing, when in reality, they don't. In 2013, a security audit and analysis was performed on the Linux kernel CSPRNG (PDF), and concluded that not only is it not robust, but it has some weaknesses:

In the literature, four security notions for a PRNG with input have been proposed: resilience (RES), forward security (FWD), backward security (BWD) and robustness (ROB), with the latter being the strongest notion among them.

(...snip...)

Distributions Used in Attacks based on the Entropy Estimator As shown in Section 5.4, LINUX uses an internal Entropy Estimator on each input that continuously refreshes the internal state of the PRNG. We show that this estimator can be fooled in two ways. First, it is possible to define a distribution of zero entropy that the estimator will estimate of high entropy, secondly, it is possible to define a distribution of arbitrary high entropy that the estimator will estimate of zero entropy. This is due to the estimator conception: as it considers the timings of the events to estimate their entropy, regular events (but with unpredictable data) will be estimated with zero entropy, whereas irregular events (but with predictable data) will be estimated with high entropy.

(...snip...)

As shown in Section 5.7, it is possible to build a distribution D0 of null entropy for which the estimated entropy is high (cf. Lemma 3) and a distribution D1 of high entropy for which the estimated entropy is null (cf. Lemma 4). It is then possible to mount attacks on both /dev/random and /dev/urandom, which show that these two generators are not robust.

(...snip...)

We have proposed a new property for PRNG with input, that captures how it should accumulate the entropy of the input data into the internal state. This property actually expresses the real expected behavior of a PRNG after a state compromise, where it is expected that the PRNG quickly recovers enough entropy. We gave a precise assessment of Linux PRNG /dev/random and /dev/urandom security. In particular, we prove that these PRNGs are not robust. These properties are due to the behavior of the entropy estimator and the mixing function used to refresh its internal state. As pointed by Barak and Halevi [BH05], who advise against using run-time entropy estimation, we have shown vulnerabilities on the entropy estimator due to its use when data is transferred between pools in Linux PRNG. We therefore recommend that the functions of a PRNG do not rely on such an estimator.

Finally, we proposed a construction that meets our new property in the standard model and we showed that it is noticeably more efficient than the Linux PRNGs. We therefore recommend to use this construction whenever a PRNG with input is used for cryptography.

TL;DR? The Linux CSPRNG does not meet the definitions of a secure CSPRNG per the PDF. It's not that it's theoretically broken, it's just not theoretically secure either. It's really nothing theoretically at all. This isn't great.

A replacement for random.c in the kernel would be to ditch the homebrew entropy collection, mixing, and output mangling, and instead, stick with AES-128 in CTR mode. Of course, as per the PDF, the entropy collectors need serious work, but if AES-128-CTR was deployed as the CSPRNG instead of SHA-1, then the generator could take advantage of hardware AES performance, which as I've shown, is exceptionally superior. It's frustrating, because the kernel already ships AES, so the code is already there. It's just not being utilized.

The Linux kernel could have 1 GBps in CSPRNG output, but is deliberately choosing not to. That's like having a V12 turbo-charged sleeper, without the turbo, and only firing on 3 of the 12 cylinders, with a duct taped muffler on the back.

Why does 1 GBps of performance matter? How about wiping hard drives or secure data removal in general? With 20 MBps, we can't even saturate a single drive in IOPS. With 1 GBps, we could saturate many simultaneously. As someone who wipes old employee workstations when they leave the company, backup servers with dozens of drives, or old decommissioned hardware, I see great benefit here.

Or, how about HTTPS web sites for a shared web hosting provider? I have seen countless times HTTPS and SSH connections lag due to waiting on the CSPRNG. Not that it's being intentionally blocked, but because the load is so intense on the server, it just can't generate enough cryptographic randomness to keep up with requests.

I'm sure there are plenty of other examples where end userspace applications could benefit with improved performance of the CSPRNG. And, as shown, it can't be that difficult to implement correctly. The real question is, of course, who will do the work and submit the patch?

Cryptographic Hashing, Part I- Introduction

Introduction

Lately, I've been seeing some discussion online about cryptographic hashing functions, along with some confusion between a cryptographic digest, a cryptographic signature, and a message authentication codes. At least in that last post, I think I did well defining and clarifying the differences between those terms, but I also feel like I could take this discussion a lot further. So, I decided to dedicate a series to generic cryptographic hashing functions, which will include building compression frameworks with security proofs, specific implementations of cryptographic hashing functions, and some implementations of these functions. So, without further ado, let's get started.

Collisions

When we talk about a hashing function (cryptographic or otherwise), we are referring to any function that can take an arbitrary length of data, and compress it into a fixed-length digest. Typically, this digest is called a "fingerprint", a "checksum", or a "hash". The goal, is that any time we input the same data, our function outputs the same digest. Further, it's important that not only can I produce that digest, but anyone can produce the same digest. This gets us prepared for the Random Oracle, but we still have some ground to cover first.

Because our hashing function has a fixed length output, say 128-bits, then an ideal function would map every input to one of those outputs. In other words, our function maps an element in the domain (our data to be hashed) to exactly one element in the range (our actual hash). So, if our function produces 128-bit digests, then there are a total of 2^128 digests in the range. This means, that we have at least a one-to-one mapping of elements in the domain to elements in the range. Again, speaking about an ideal hashing function.

However, we know that there are many more inputs than just 2^128; there are infinitely many, actually. But think about it for a second. Take the number zero, and send it through our hashing function. Increment that number by 1, then hash that number. Continue in this manner, assuming infinite computing resources and infinite time, until you've hashed every number between 0 and 2^128. Ideally, you've produced exactly 2^128 unique digests. But, what happens when you now want to hash 2^128+1? Now we have what is called a collision. In other words, two distinct inputs was hashed to the same output. To put it formally:

Definition: A collision is when two distinct pieces of data hash to the same digest, checksum, or fingerprint.
Theorem: For any fixed-length hashing function, there are infinitely many collisions.
Proof: This can be proven using the pigeon-hole principle. Given a fixed-length hashing function of n-bits of output, hashing n+1 inputs from the domain will produce a collision in the range. As n tends to infinity, the collisions tend to infinity. Q.E.D.

I don't think I need to tell you how much larger infinity is to 128-bits. As a result, collisions are overwhelming. In fact, would you like to see a collision in practice? Below are 2 different hexadecimal strings. The differences are very subtle, but they indeed distinct (emphasized in bold red). Here, we'll take the two strings, and hash them with the known MD5 algorithm. Then, just to show I'm not cheating, we'll hash the same strings with SHA-1. While we produce a collision in MD5, we have distinct digests with SHA-1. Go ahead, and verify that you get the same results.

$ INPUT1=d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89\
55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b\
d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0\
e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
$ INPUT2=d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89\
55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b\
d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0\
e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
$ printf "$INPUT1" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT2" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT1" | xxd -r -p | sha1sum
a34473cf767c6108a5751a20971f1fdfba97690a  -
$ printf "$INPUT2" | xxd -r -p | sha1sum 
4283dd2d70af1ad3c2d5fdc917330bf502035658  -

Crazy, right? With an ideal hashing function, it should be at least as difficult as a brute force search to find these collisions, and it should take searching an entire 128-bit domain to find a collision. Unfortunately, however, finding blind collisions with a brute force search turns out to be much faster, thanks to the Birthday Paradox. The Birthday Paradox says the following:

In a room of just 23 people, there is a 50% probability that at least two of them share the same birthday. In a room of just 75 people, there is a 99.9% probability that at least two of them share the same birthday.

Wait, what? Uhm, last I checked, there are 366 days days in a year, assuming leap year. Soooo, if there are 23 people in a room, then there should be a 23/366, or about a 6% probability that two people share the same birthday. Unfortunately, this isn't how it works. There may be a 6% chance someone shares your birthday, but there is a 50% chance two arbitrary people share the same birthday. Now do you see the problem? Not only must you compare your birthday to everyone, but so must everyone else. This is a case of permutations. So, with 23 people in the room, there are actually 253 possible comparisons that must be made (23*22/2). The math gets a little hairy, and to be honest, it's a bit outside the scope of this post, and this series (it's going to be long enough as it is). Refer to the Wikipedia article if you want to work through the theory and the proof.

We can use this Birthday Paradox to work out an attack on finding two distinct inputs that produce an identical digest. This is called the Birthday Attack, and it's the primary driver in finding collisions. The attack basically says something like this:

To find a collision in a n-bit range with approximately 50% probability, you need to only search the the square root of 'n' of elements in the domain.

So, for a 128-bit digest (2^128 possible distinct outputs), using the Birthday Attack, I only need to search 2^64 possible inputs to have approximately a 50% probability that I have found a collision. If you don't think 2^64 is very small, the bitcoin network is currently mining 2^64 SHA-256 digests about every 20 seconds.

Blind, preimage, and second preimage collisions

Armed with this knowledge, we can now formalize some definitions of collision attacks. This might be confusing, so I'll define it first, then give some examples.

Collision attack:
A blind search, where two distinct inputs produce the same digest.
Preimage attack:
A search to find an input that matches a defined digest.
Second preimage attack:
A search to find a second file that matches the digest of a defined file.

Let's break these down individually. A collision search is literally a blind search, without any respect to inputs or outputs. You don't know what the inputs will be nor do you know what their outputs will be. You only know that you have found two distinct inputs that collide to the same output, all of which is entirely arbitrary.

A preimage attack is where you have a digest in your possession, but you would like to find an input that matches it. In this case, while the input is completely arbitrary, the output is static. For example, suppose you have the 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9". A preimage attack would be successfully finding the input that produced it. In this case, it was "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz".

A second preimage attack means having both the input "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz" and its 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9", and finding a second input that produces that same digest.

Usually, when breaking cryptographic hash functions, the first thing to break is the compression function, which I'll cover in later posts. Once the compression function is broken, the next step is to break searching for blind collisions. This is generally done by analyzing the weaknesses in mathematics, find bias in the output, observe the quality of the avalanche effect, and so forth. You eventually learn where the hashing function is weak, and where you can take "shortcuts" to get to your goal. Eventually, the algorithm is broken to the point that finding blind collisions is practical. MD5 is broken in this regard.

After breaking the compressing function, and weakening the algorithm to the point of practical collision attacks, preimage attacks become the next focus of analysis. However, when the compression function is broken, such as in the case of SHA-1, it's a strong sign to start moving away from the algorithm, long before you find collisions. So, analysis tends to slow down after collisions have been found, because no one should be using the function anymore. This also means continuing to find second preimage collisions gets even less attention.

Avalanche Effect

The final property of cryptographic hashing functions that needs to be addressed is the "avalanche effect". It is absolutely critical in cryptographic hashing functions that even though inputs may be sequential, their outputs do not show that to be the case. For example, consider the SHA-256 of the first 10 digits:

$ for I in {1..10}; do printf "$I: "; printf "$I" | sha256sum -; done
1: 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b  -
2: d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35  -
3: 4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce  -
4: 4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a  -
5: ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d  -
6: e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683  -
7: 7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451  -
8: 2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3  -
9: 19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7  -
10: 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5  -

Notice that there is no clear indication on sequential digests. For all practical purposes, they are truly randomized output, despite the sequential input (merely flipping a single bit on each input from the previous). However, can we formally define the avalanche effect? What would be ideal is that with each bit change on the input, every bit in the digest output has as close to a 50% chance of being flipped as theoretically possible.

I'll talk more about "rounds" in future posts when I talk about specific implementations and designs. Suffice it to say that a cryptographic hashing function will iterate through the compression functions a certain number of times, before outputing the state. On each round, the bits in the output should each have a 50% chance of being flipped. So, on each output of each round iteration, close to half of the bits have been flipped in some pseudorandom manner. After a certain number of rounds, the final output should be indistinguishable to true random noise.

So, how about this as a formal definition:

When a single input bit is flipped, each output bit should change with a 50% probability.

Of course, the cryptographic strength doesn't rest solely on the avalanche effect. There are mathematical properties that determine that. But, the output should be completely unpredictable. You could apply the "next bit test", in that there is no algorithm you could produce that would determine the next state of the next bit, without actually compromising the state of the machine (this is a test held to cryptographically secure pseudorandom number generators).

Unfortunately, all we have to test the avalanche effect is standard randomness tests, such as the chi-square distribution, Monte Carlo for Pi, and the Birthday Paradox, among others. This doesn't say anything about the cryptographic strength of the hashing function, but says a lot about randomness properties (non-cryptographic hashing functions can also exhibit strong randomness qualities).

There are a couple software utilities we can use to test and analyze cryptographic hashing functions. First, we have standard randomness tests, such as Dieharder and the FIPS 140-2 suite. But, for something more specific on analyzing cryptographic primitives, I would recommand Cryptol. On the one side, this isn't an out-of-the-box software solution for just running a battery of tests and analysis. It is actually a domain-specific language that will require a bit of a learning curve. On the other hand, it's Free Software, and you'll probably learn more about cryptanalysis with this tool, than just playing with randomness tests.

Conclusion

This was just a primer post to get you thinking about cryptographic hashes, specifically thinking about their output, and the task of finding collisions. The rest of the posts in the series will cover specific functions such as MD5, SHA-1, -2, and -3, as well as some others. We'll talk about hashing constructions, and where you'll find cryptographic functions in practice (I think you'll be surprised). I may even throw in a post or two about random oracles, and how we want cryptographic hashing functions to not only imitate them, but be proven secure under the "Random Oracle Model".

Regardless, this post will get you started, and hopefully excited for what is to come.

Manual Authenticated File Encryption With OpenSSL

One thing that bothers me about OpenSSL is the lack of commandline support for AEAD ciphers, specifically AES in CCM and GCM block modes. Why does this matter? Suppose you want to save an encrypted file to disk, without GnuPG, because you don't want to get into key management. Further, suppose you want to send this data to a recipient or store it on a server outside of your full control. The authenticated encryption is important, otherwise the ciphertext is malleable and vulnerable to bit flipping.

So, when you get to the shell, you may try using AES in GCM mode with OpenSSL's "enc(1)" command, only to be left wanting. Here, we generate a key from /dev/urandom, convert it to hexadecimal, and provide the key as an argument on the command line.

$ LC_CTYPE=C tr -cd 'abcdefghjkmnpqrstuvwxyz23456789-' < /dev/urandom | head -c 20; echo
sec2tk24ppprcze33ucs
$ echo sec2tk24ppprcze33ucs | xxd -p
73656332746b323470707072637a6533337563730a
$ openssl enc -aes-256-gcm -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
AEAD ciphers not supported by the enc utility
$ echo $?
1

So, rather than using GCM, however, we can build the authentication tag manually with HMAC-SHA-512, which OpenSSL does support. This means using a non-authenticated block cipher mode, such as CTR, as a first step, then authenticating the ciphertext manually as a second step.

Using our same password from the previous example, we'll do this in two steps now:

$ openssl enc -aes-256-ctr -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
$ openssl dgst -sha512 -binary -mac HMAC -macopt hexkey:73656332746b323470707072637a6533337563730a -out file.txt.aes.mac file.txt.aes

Now you have three files- your plaintext file, your AES encrypted ciphertext file, and your HMAC-SHA-512 authentication file:

$ ls -l file.txt*
-rw-rw-r--. 1 aaron aaron 1050 Feb 27 10:26 file.txt
-rw-rw-r--. 1 aaron aaron 1066 Feb 27 10:27 file.txt.aes
-rw-rw-r--. 1 aaron aaron   64 Feb 27 10:28 file.txt.aes.mac

When sending or remotely storing the "file.txt.aes" file, you'll want to also make sure the "file.txt.aes.mac" authentication file is accompanied with it. Unfortunately, the OpenSSL dgst(1) command does not support verifying message authentication codes, so you'll have to script this manually. So, you'll need to generate a second file, maybe "file.txt.tmp.mac", then compare the two. If they match, you can decrypt the "file.txt.aes" ciphertext file. If not, discard the data.

This isn't elegant, and I wish enc(1) supported AEAD, but as it stands, it doesn't. So, you'll have to stick with doing things manually. However, this is something simple enough to script, and provides both data confidentiality and authenticity, which should be the goal of every ciphertext.