Aaron Toponce https://pthree.org Linux. GNU. Freedom. Wed, 27 Sep 2017 06:38:41 +0000 en-US hourly 1 https://wordpress.org/?v=4.9-alpha-41547 1,000 Books Read In One Year? No, Not By A Long Shot https://pthree.org/2017/09/27/1000-books-read-in-one-year-no-not-by-a-long-shot/ https://pthree.org/2017/09/27/1000-books-read-in-one-year-no-not-by-a-long-shot/#respond Wed, 27 Sep 2017 06:34:31 +0000 https://pthree.org/?p=4902 Recently, Goodreads sent out a tweet about how to remove social media and the Internet from your life, so you can focus on reading 1,000 books in one year. The post follows this sort of math:

  1. The average person reads 400 words per minute.
  2. The typical non-fiction books have around 50,000 words.
  3. Reading 200 books will take you 417 hours.
  4. The average person spends 608 hours on social media annually.
  5. The average person spends 1,642 hours watching TV annually.
  6. Giving up 2,250 hours annually will allow you to read 1,000 books in one year.

This blew my mind. I'm a very avid reader. Since signing up for Goodreads in 2013, I've been hitting at least 20,000 pages read every year, and I'm on track to read 25,000 pages this year. But, I'm only putting down 75 books each year. Now granted, a spare 2,250 hours per year ÷ 365 days per year is just over 6 hours per day of reading. I'm not reading 6 hours per day. I don't watch TV, I have a job, kids and a wife to take care of, and other things that keep me off the computer most of my time at home (I'm writing this blog post after midnight).

No doubt, 6 hours per day is a lot of reading. But I average 2 hours per day, and I'm only putting down 75 books annually. 6 hours of reading per day would only put me around 225 books each year, a far cry from the 1,000 I should be hitting. What gives?

Well, it turns out, Charles Chu is being a bit ... liberal with his figures. First off, the average person does not read 400 words per minute. Try about half, at only 200 words per minute, according to Iris Reading, a company that sells a product on improving your reading speed and memory comprehension. This cuts our max books from 1,000 in a year to 500.

Second, Chu claims the average non-fiction book is 50,000 words in length. I can tell you that 50,000 words is a very slim novel. This feels like a Louis L'Amour western length to me. Most books that I have read are probably closer to twice that length. However, according to HuffPost which quotes Amazon Text Stats, the average book is 64,000 words in length. But, according to this blog post by Writers Workshop, the average "other fiction" novel length is 70,000 to 120,000 words. This feels much more in line with what I've read personally, so I'll go with about 100,000 words in a typical non-fiction novel.

So now that brings our annual total down from 500 to 250 books. That's reading 200 words per minute, for 6 hours every day, with non-fiction books that average 100,000 words in length. I claimed that I would probably come in around 225 books, so this seems to be a much closer ballpark figure.

But, does it line up? Let's look at it the another way, and see if we can agree that 200-250 books annually, reading 6 hours per day, is more realistic.

I claimed I'm reading about 2 hours per day. I read about 3 hours for 4 of the 7 days in a week while commuting to work. For the other three days in the week, I can read anywhere from 1 hour to 3 hours, depending on circumstances. So my week can see anywhere from 13 hours to 15 hours on average. That's about 2 hours per day.

During 2016, I read 24,048 pages. That's about 65 pages per day, which feels right on target. But, how many words are there per page? According to this Google Answers answer, which offers a couple citations, a novel averages about 250 words per page.

But, readinglength.com shows that many books I've read are over 300 words per page, and some denser at 350 words per page, with the average sitting around 310. So 250 words per page at 65 pages per day is 16,250 words per day, and 310 words per page at 65 pages per day 20,150 pages that I'm reading.

Because I'm only reading about 2 hours per day, that means I'm reading at a meager 135 to 168 words per minute, based on the above stats. I guess I'm a slow reader.

If I highball it at 168 words per minute, then in 6 hours, I will have read 60,480 words. After a year of reading, that's 22,075,200 words. An independent blog post confirms this finding of 250-300 words per page, but also uses that to say that most adult books are 90,000 - 100,000 words in length (additional confirmation from earlier), and young adult novels target the 55,000 word length that Chu cited (maybe Chu likes reading young adult non-fiction?). As such, I can expect to read 22,075,200 words per year ÷ 100,000 words per book, or about 220 books in a year of reading 6 hours every day.

Bingo.

So, what can we realistically expect from reading?

  1. Readers average 200 words per minute.
  2. A page averages 250 words.
  3. A novel averages 100,000 words.
  4. One hour of reading per day can hit 30-40 books per year.
  5. Six hours of reading per day can hit 200-250 books per year.
  6. To read 1,000 books in a year, you need to read 22 hours per day.

This is reading average length adult non-fiction books at an average speed of 200 words per minute. The calculus completely changes if your average reading speed is faster than 200 wpm, you read primarily graphic novels with little text, or read shorter non-fiction novels. Fourth-grade chapter books? Yeah, I could read 1,000 of those in a year. 🙂

]]>
https://pthree.org/2017/09/27/1000-books-read-in-one-year-no-not-by-a-long-shot/feed/ 0
Password Best Practices I - The Generator https://pthree.org/2017/09/18/password-best-practices-i-the-generator/ https://pthree.org/2017/09/18/password-best-practices-i-the-generator/#respond Mon, 18 Sep 2017 13:00:46 +0000 https://pthree.org/?p=4882 This is the first in a series of posts about password best practices. The series will cover best practices from a few different angles- the generator targeted at developers creating those generators, the end user (you, mom, dad, etc.) as you select passwords for accounts from those generators, and the service provider storing passwords in the database for accounts that your users are signing up for.

Motivation

When end users are looking for passwords, they may turn to password generators, whether they be browser extensions, websites, or offline installable executables. Regardless, as a developer, you will need to ensure that the passwords you your providing for your users are secure. Unfortunately, that's a bit of a buzzword, and can be highly subjective. So, we'll motivate what it means to be "secure" here:

  • The generator is downloaded via HTTPS, whether it's a website, executable ZIP, or browser extension.
  • The generator uses a cryptographically secure random number generator.
  • The generator provides at least 70-bits of entropy behind the password.
  • The generator is open source.
  • The generator generates passwords client-side, not server-side.
  • The generator does not serve any ads or client-side tracking software.

I think most of us can agree on these points- the software should be downloaded over HTTPS to mitigate man-in-the-middle attacks. A cryptographically secure RNG should be used to ensure unpredictability in the generated password. In addition to that, the CRNG should also be uniformly distributed across the set, so no elements of the password are more likely to appear than any other. Creating an open source password generator ensures that the software can be audited for correctness and instills trust in the application. Generating passwords client-side, means the server hos now possible way of knowing what passwords were generated, unless the client is also calling home (the code should be inspected). And of course, we don't want any adware or malware installed in the password generating application to further compromise the security of the generator.

Okay. That's all well and good, but what about this claim to generate passwords from at least 70-bits in entropy? Let's dig into that.

Brute force password cracking

Password cracking is all about reducing possibilities. Professional password crackers will have access to extensive word lists of previously compromised password databases, they'll have access to a great amount of hardware to rip through the password space, and they'll employ clever tricks in the password cracking software, such as Hashcat or MDXFind, to further reduce the search space, to make finding the passwords more likely. In practice, 90% of leaked hashed password databases are reversed trivially. With the remaining 10%, half of that space takes some time to find, but those passwords are usually recovered. The remaining few, maybe 3%-5%, contain enough entropy that the password cracking team likely won't recover those passwords in a week, or a month, or even a year.

So the question is this- what is that minimum entropy value that thwarts password crackers? To answer this question, let's look at some real-life brute force searching to see if we can get a good handle on the absolute minimum security margin necessary to keep your client's leaked password hash out of reach.

Bitcoin mining

Bitcoin mining is the modern-day version of the 1849 California Gold Rush. As of right now, Bitcoin is trading at $3,665.17 per BTC. As such, people are fighting over each other to get in on the action, purchasing specialized mining hardware, called "Bitcoin ASICs", to find those Bitcoins as quickly as possible. These ASICs are hashing blocks of data with SHA-256, and checking a specific difficulty criteria to see if it meets the requirements as a valid Bitcoin block. If so, the miner that found that block is rewarded that Bitcoin and it's recorded in the never-ending, ever-expanding, non-scalable blockchain.

How many SHA-256 hashes is the word at large calculating? As of this writing, the current rate is 7,751,843.02 TH/s, which is 7,751,843,020,000,000,000 SHA-256 hashes per second. At one point, it peaked at 8,715,000 THps, and there is no doubt in my mind that it will pass 10,000,000 THps before the end of the year. So let's run with that value, of 10,000,000,000,000,000,000 SHA-256 hashes per second, or 1019 SHA-256 hashes per second.

If we're going to talk about that in terms of bits, we need to convert it to a base-2 number, rather than base-10. Thankfully, this is easy enough. All we need to calculate is the log2(X) = log(X)/log(2). Doing some math, we see that Bitcoin mining is roughly flipping every combination of bits in a:

  • 63-bit number every second.
  • 69-bit number every minute.
  • 74-bit number every hour.
  • 79-bit number every day.
  • 84-bit number every month.
  • 88-bit number every year.

What does this look like? Well, the line is nearly flat. Here in this image, the x-axis is the number of days spent mining for Bitcoin, starting from 0 through a full year of 365 days. The y-axis is the search space exhaustion in bits. So, you can see that in roughly 45 days, Bitcoin mining have calculated enough SHA-256 hashes to completely exhaust an 85-bit search space (click to enlarge):

Plot showing log(x*10^19*86400)/log(2) for Bitcoin mining.

Real-world password cracking

That's all fine and dandy, but I doubt professional password crackers have access to that sort of hardware. Instead, let's look at a more realistic example.

Recently, Australian security researcher Troy Hunt, the guy that runs https://haveibeenpwned.com/, released a ZIP of 320 million SHA-1 hashed passwords that he's collected over the years. Because the passwords were hashed with SHA-1, recovering them should be like shooting fish in a barrel. Sure enough, a team of password crackers got together, and made mincemeat of the dataset.

In the article, it is mentioned that they had a peak password cracking speed of 180 GHps, or 180,000,000,000 SHA-1 hashes per second, or 18*1010 SHA-1 hashes per second. The article mentions that's the equivalent of 25 NVidia GTX1080 GPUs working in concert. To compare this to Bitcoin mining, the team was flipping every combination of bits in a:

  • 41-bit number every second.
  • 47-bit number every minute.
  • 53-bit number every hour.
  • 58-bit number every day.
  • 63-bit number every month.
  • 66-bit number every year.

As we can see, this is a far cry from the strength of Bitcoin mining. But, are those numbers larger than you expected? Let's see how it looks on the graph, compared to Bitcoin (click to enlarge):

Plot showing log(x*18*10^10*86400)/log(2) for this cluster of password cracking hobbyists.

So, it seems clear that our security margin is somewhere above that line. Let's look at one more example, a theoretical one.

Theoretical password cracking by Edward Snowden

Before Edward Snowden became known to the world as Edward Snowden, he was known to Laura Poitras as "Citizenfour". In emails back-and-forth between Laura and himself, he told her (emphasis mine):

"Please confirm that no one has ever had a copy of your private key and that it uses a strong passphrase. Assume your adversary is capable of one trillion guesses per second. If the device you store the private key and enter your passphrase on has been hacked, it is trivial to decrypt our communications."

But one trillion guesses per second is only about 5x the collective power of our previous example of a small team of password cracking hobbyists. That's only about 125 NVidia GTX1080 GPUs. Certainly interested adversaries would have more money on hand to invest in more computing power than that. So, let's increase the rate to 10 trillion guesses per second. 1,250 NVidia GTX1080 GPUs would cost our adversary maybe $500,000. A serious investment, but possibly justifiable, and certainly not outside the $10 billion annual budget of the NSA. So let's roll with it.

At 1013 password hashes per second, we are flipping every combination of bits in a:

  • 43-bits every second.
  • 49-bits every minute.
  • 54-bits every hour.
  • 59-bits every day.
  • 64-bits every month.
  • 68-bits every year.

Plotting this on our chart with both Bitcoin mining and clustered hobbyist password cracking, we see (click to enlarge):

Plot of log(x*86400*10^13)/log(2)

The takeaway

What does all this math imply? That as a developer of password generator software, you should be targeting a minimum of 70-bits of entropy with your password generator. This will give your users the necessary security margins to steer clear of well-funded adversaries, should some service provider's password database get leaked to the Internet, and they find themselves as a target.

As a general rule of thumb, for password generator developers, these are the sort of security margins your can expect with entropy:

  • 70-bits or more: Very secure.
  • 65-69 bits: Moderately secure.
  • 60-64 bits: Weakly secure.
  • 59 bits or less: Not secure.

Colored recommendation of the previous plot showing all brute force attempts.

What does this mean for your generator then? This means that the number of size of the password or passphrase that you are giving users should be at least:

  • Base-94: 70/log2(94)=11 characters
  • Base-64: 70/log2(64)=12 characters
  • Base-32: 70/log2(32)=14 characters
  • Base-16: 70/log2(16)=18 characters
  • Base-10: 70/log2(10)=22 characters
  • Diceware: 70/log2(7776)=6 words

Now, there is certainly nothing wrong with generating 80-bit, 90-bit, or even 128-bit entropy. The only thing you should consider with this, is the size of the resulting password and passphrases. For example, if you were providing a minimum of 128-bit security for your users with the password generator, then things would look like:

  • Base-94: 128/log2(94)=20 characters
  • Base-64: 128/log2(64)=22 characters
  • Base-32: 128/log2(32)=26 characters
  • Base-16: 128/log2(16)=32 characters
  • Base-10: 128/log2(10)=39 characters
  • Diceware: 128/log2(7776)=10 words

As you can see, as you increase the security for your users, the size of the generated passwords and passphrases will also increase.

Conclusion

It's critical that we are doing right by our users when it comes to security. I know Randall Munroe of XKCD fame created the "correct horse battery staple" comic, advising everyone to create 4-word passphrases. This is fine, provided that those 4 words meets that minimum 70-bits of entropy. In order for that to happen though, the word list needs to be:

   4 = 70/log2(x)
=> 4 = 70/log(x)/log(2)
=> 4 = 70*log(2)/log(x)
=> 4*log(x) = 70*log(2)
=> log(x) = 70/4*log(2)
=> x = 1070/4*log(2)
=> x ~= 185,364

You would need a word list of at least 185,364 words to provide at least 17.5-bits of entropy per word, which brings us to required 70-bits of total entropy for 4 words. All too often, I see generators providing four words, but the word list is far too small, like around Diceware size, which is only around 51-bits of entropy. As we just concluded, that's not providing the necessary security for our users.

So, developers, when creating password and passphrase generators, make sure they are at least targeting the necessary 70-bits of entropy, in addition to the other qualifications that we outlined at the beginning of this post.

]]>
https://pthree.org/2017/09/18/password-best-practices-i-the-generator/feed/ 0
Colorful Passphrases https://pthree.org/2017/09/15/colorful-passphrases/ https://pthree.org/2017/09/15/colorful-passphrases/#comments Fri, 15 Sep 2017 13:00:03 +0000 https://pthree.org/?p=4872 Since the development of my passphrase and password generator, I started working toward improving the other online generators out there on the web. I created a Google Spreadsheet to work toward that goal, by doing reasonable audits to "rank" each generator, and see how they stacked up against the rest. Then, I started submitting patches in hopes of making things better.

One passphrase generator that was brought to my attention was Pass Plum. Pass Plum supplies an example word list to use for generating your passphrases, if you choose to install the software on your own server. Unfortunately, the list is only 140 words in size, so if you choose to use that for your word list, then you only get about 7.13-bits of entropy per word. Sticking to the default configuration 4 words given to the user, that's a scant 28-bits of security on your passphrase, which is trivially reversed. I submitted a pull request to extend it to 4,096 words, providing exactly 13-bits of entropy per word, or about 52-bits of entropy for a 4-word passphrase- a significant improvement.

I noticed, however, that the default list was nothing but color names, and that got me thinking- what if not only the generator provided color names for passphrases, but also colored the word that color name? Basically, a sort of false visual synesthesia. What I want to know is this, is it easier to remember passphrases when you can associate each word with a visual color?

So, over the past several nights, and during weekends, I've been putting this together. So, here is is- colorful passphrases.

Collage of 4 separate screenshots of what the color passphrase generator looks like.

Head over to my site to check it out. If a color is too light (its luma value is very high), then the word is outlined with CSS. Every word is bold, to make the word even more visible on the default white background.

As I mentioned, the idea is simple: people struggle remembering random meaningless strings of characters for passwords, so passphrases are a way to make a random series of words easier to recall. After all, it should be easier to remember "gnu hush gut modem scamp giddy" than it is to remember "$5hKXuE[\NK". It's certainly easier to type on mobile devices, and embedded devices without keyboards, like smart TVs and video game consoles.

But, even then, there is nothing that is really tying "gnu hush gut modem scamp giddy" together, so you force yourself in some sort of mnemonic to recall it. Visually stimulated color passphrases have the benefit of not only using a mnemonic to recall the phrase, but an order of colors as well. For example, you might not recall "RedRobin Pumpkin Revolver DeepPuce Lucky Crail TealDeer", but you may remember its color order of roughly "red orange black purple gold brown teal". "A RedRobin is red. A pumpkin is orange. A revolver (gun) is black. DeepPuce is a purple. Lucky coins are gold. Crail, Soctand has brown dirt. TealDeer are teal."

However, it also comes with a set of problems. First, what happens if you actually have visual synesthesia? Will seeing these colors conflict with your mental image of what the color should be for that word? Second, many of the words are very obscure, such as "Crail" or "Tussock" or "Tuatara" (as all seen in the previous screenshot collage). Finally, what happens when you have a color passphrase where two similar colors are adjacent to each other? Something like "Veronica Affair Pipi DeepOak Atoll BarnRed RedOxide"? Both "BarnRed" and "RedOxide" are a deep reddish color. Will it be more difficult to recall which comes first?

A screenshot showing a color collision between "BarnRed" and "RexOxide"

As someone who is interested in password research, I wanted to see what sort of memory potential visually colorful passphrases could have. As far as I know, this has never been investigated before (at least I could find any research done in this area, and I can't find any passphrase generators doing it). This post from Wired investigates alternatives to text entry for password support, such as using color wheels, but doesn't say anything about visual text. Here is a browser extension that colors password form fields on websites, with the SHA-1 hash of your password as you type it. You know if it's correct, by recognizing if the pattern is the same it always is when logging in.

Long story short, I think I'm wading into unknown territory here. If you find this useful, or even if you don't, I would be very interested in your feedback.

]]>
https://pthree.org/2017/09/15/colorful-passphrases/feed/ 1
A Practical and Secure Password and Passphrase Generator https://pthree.org/2017/09/04/a-practical-and-secure-password-and-passphrase-generator/ https://pthree.org/2017/09/04/a-practical-and-secure-password-and-passphrase-generator/#respond Mon, 04 Sep 2017 17:59:59 +0000 https://pthree.org/?p=4787 The TL;DR

Go to https://ae7.st/g/ and check out my new comprehensive password and passphrase generator. Screenshots and longer explanation below.

Introduction

Sometime during the middle of last summer, I started thinking about password generators. The reason for this, was that I noticed a few things when I used different password generators, online or offline:

  1. The generator created random meaningless strings.
  2. The generator created XKCD-style passphrases.
  3. The generator gave the user knobs and buttons galore to control
    • Uppercase characters
    • Lowercase characters
    • Digits
    • Nonalphanumeric characters
    • Pronounceable passwords
    • Removing ambiguous characters
    • Password Length

The Problem

Here is just one example of what I'm talking about:

Screenshot showing a "secure" password generator from a website.

This password generator has a lot of options for tweaking your final password.

Ever since Randal Munroe published https://xkcd.com/936/, people started creating "XKCD-style" passphrase generators. Here's a very simple one that creates a four-word passphrase. No knobs, bells, or whistles. Just a button to generate a new XKCD passphrase. Ironically, the author provides an XKCD passphrase generator for you to use, then tells you not to use it. 🙂

On the other hand, why not make the XKCD password generation as complex as possible? Here at https://xkpasswd.net/s/, not only do you have an XKCD password generator, but you have all the bells, whistles, knobs, buttons, and control to make it as ultimately complex as possible. Kudos to the generator even make entropy estimates about the generated passwords!

Screenshot showing a very complex control board of an XKCD style password generator.

Why not add all the complexity of password generation to XKCD passwords?

What bothers me about the "XKCD password" crowd, however, is that no one knows that Diceware was realized back in 1995, making passphrases commonplace. Arnold Reinhold created a list of 7,776 words, enough for every combination of a 6-sided die rolled 5 times. Arnold explains that the passphrase needs to be chosen from a true random number generator (thus the dice) and as a result each word in the list will have approximately 12.9-bits of entropy. Arnold recommends throwing the dice enough times to create a five-word Diceware passphrase. That would provide about 64-bits of entropy, a modestly secure result.

A five-word Diceware passphrase could be:

  • soot laid tiger rilly feud pd
  • 31 al alibi chick retch bella
  • woven error rove pliny dewey quo

My Solution

While these password generators are all unique, and interesting, and maybe even secure, it boils down to the fact that my wife, never mind my mom or grandma, isn't going to use them. They're just too complex. But worse, they give the person using them a false sense of security, and in most cases, they're not secure at all. I've talked with my wife, family, and friends about what it requires to have a strong password, and I've asked them to give me examples. You can probably guess what I got.

  • Spouse's first name with number followed by special character. EG: "Alan3!"
  • Favorite sports team in CamelCase. EG: "UtahUtes"
  • Keyboard patterns. EG: "qwertyasdf"

The pain goes on and on. Usually, the lengths of each password is somewhere around 6-7 characters. However, when you start talking about some of these generators, and they see passwords like "(5C10#+b" or "V#4I5'4c", their response is usually "I'm never going to remember that!". Of course, this is a point of discussion about password managers, but I'll save that for another post.

So I wanted to create a password and passphrase generator that met everyone's needs:

  • Simplicity of use
  • Length and complexity
  • Provably secure
  • Desktop and mobile friendly

If you've been a subscriber to my blog, you'll know that I post a lot about Shannon entropy. Entropy is maximized when a uniform unbiased random function controls the output. Shannon entropy is just a fancy way for estimating the total number of possibilities something could be, and it's measured in bits. So, when I say a Diceware passphrase as approximately 64-bits of entropy, I'm saying that the passphrase that was generated is 1 in 2^64 or 18,446,744,073,709,551,616 possibilities. Again, this is only true if the random function is uniform and unbiased.

So, I built a password generator around entropy, and entropy only. The question became, what should the range be, and what's my threat model? I decided to build my threat model after offline brute force password cracking. A single computer with a few modest GPUs can work through every 8-character password built from all 94 graphical characters on the ASCII keyboard hashed with SHA-1 in about a week. That's 94^8 or 6,095,689,385,410,816 total possibilities. If chosen randomly, Shannon entropy places any password built from that set at about 52-bits. If the password chosen randomly from the same set of 94 graphical characters was 9 characters long, then the password would have about 59-bits of Shannon entropy. This would also take that same GPU password cracking machine 94 weeks to fully exhaust every possibility.

This seemed like a good place to start the range. So, for simplicity sake, I started the entropy range at 55-bits, then incremented by 5 bits until the maximum of 80-bits. As you can see from the screenshot of the entropy toolbar, 55-bits is red as we are in dangerous territory of an offline password cracker with maybe a cluster of GPUs finding the password. But things get exponentially expensive very quickly. Thus, 60-bits is orange, 65-bits is yellow, and 70-bits and above are green. Notice that the default selection is 70-bits.

Screenshot showing the entropy toolbar of my password generator.

The entropy toolbar of my password generator, with 70-bits as the default.

When creating the generator, I realized that some sites will have length restrictions on your password, such as not allowing more than 12 characters, or not allowing certain special characters, or forcing at least one uppercase character and one digit, and so forth. Some service providers, like Google, will allow you any length with any complexity. But further, people remember things differently. Some people don't need to recall the passwords, as they are using password managers on all their devices, with a synced database, and can just copy/paste. Others want to remember the password, and others yet want it easy to type.

So, it seemed to me that not only could I build a password generator, but also a passphrase generator. However, I wanted this to be portable, so rather than create a server-side application, I made a client-side one. This does mean that you download the wordlists as you need them to generate the passphrases, and the wordlists are anything but light. However, you only download them as you need them, rather than downloading all on page load.

To maximize Shannon entropy, I am using the cryptographically secure pseudorandom number generator from the Stanford Javascript Crypto Library. I'm using this, rather than the web crypto API, because I use some fairly obscure browsers, that don't support it. It's only another 11KB download, which I think is acceptable. SJCL does use the web crypto API to seed its generator, if the browser supports it. If not, a entropy collector listener event is launched, gathering entropy from mouse movements. The end result, is that Shannon entropy is maximized.

Passphrases

There are 5-types of passphrases in my generator:

  • Alternate
  • Bitcoin
  • Diceware
  • EFF
  • Pseudowords

Diceware

For the Diceware generator, I support all the languages that you'll find on the main Diceware page, in addition to the Beale word list. As of this writing, that's Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Italian, Japanese (Romaji), Maori, Norwegian, Polish, Portuguese, Russian, Slovenian, Spanish, Swedish, and Turkish. There are 7,776 words in each word list, providing about 12.9248-bits of entropy per word.

EFF

For the EFF generator, I support the three word lists that the EFF has created- the short word list, the long word list, and the "distant" word list, where every work has an edit distance of at least three from the others in the list. The long list is similar to the Diceware list, in that it is 7,776 words providing about 12.9248-bits of entropy per word. However, the number of characters in each word in the word list are longer on average, at around 7 characters per word than the Diceware word list, at around 4.3 characters per word. So, for the same entropy estimate, you'll have a longer EFF passphrase than a Diceware passphrase. The short word list contains only 1,296 words, to be used with 4 dice, instead of 5, and the maximum character length of any word is 5 characters. The short word list provides about 10.3399-bits of entropy per word. Finally, the "distant" word list is short in number of words also at 1,296 words, but longer in character count, averaging 7 characters per word.

Bitcoin

For the Bitcoin generator, I am using the BIP-0039 word lists to create the passphrase. These lists are designed to be a mnemonic code or sentence for generating deterministic Bitcoin wallets. However, because they are a list of words, they can be used for building passphrases too. Each list is 2,048 words, providing exactly 11-bits of entropy per word. Like Diceware, I support all the languages of the BIP-0039 proposal, which as of this writing includes Simplified Chinese, Traditional Chinese, English, French, Italian, Japanese (Hiragana), Korean (Hangul), and Spanish.

Alternate

Elvish

In the Alternate generator, I have a few options that provide various strengths and weaknesses. The Elvish word list is for entertainment value only. The word list consists of 7,776 words, making it suitable for Diceware, and provides about 12.9248-bits of entropy per word. However, because the generator is strictly electronic, and I haven't assigned dice roll values to each word, I may bump this up to 8,192 words providing exactly 13-bits of entropy per word. The word list was built from the Eldamo lexicon.

Klingon

Another passphrase generator for entertainment value is the Klingon generator. This word list comes from the Klingon Pocket Dictionary, and my word list provides exactly 2,604 unique words from the 3,028 words in the Klingon language. Thus, each word provides about 11.3465-bits of entropy.

PGP

The PGP word list was created to make reading hexadecimal strings easier to speak and phonetically unambiguous. It comprises of exactly 256 words providing exactly 8-bits of entropy per word. This generator works well in noisy environments, such as server rooms, where passwords need to be spoken from one person to another to enter into a physical terminal.

Simpsons

The Simpson's passphrase generator consists of 5,000 words, providing about 12.2877-bits of entropy per word. The goal of this generator is not only educational to show that any source of words can be used for a password generator, such as a television series of episodes, but also more memorable. Because this list contains the most commonly spoken 5,000 words from the Simpson's episodes, a good balance of verbs, nouns, adjectives, etc. are supplied. As such, the generated passphrases seem to be easier to read, and less noun-heavy than the Diceware or EFF word lists. These passphrases may just be the easiest to recall, aside from the Trump word list.

Trump

And now my personal favorite. The Trump generator was initially built for entertainment purposes, but ended up having the advantage of providing a good balanced passphrase of nouns, verbs, adjectives, etc. much like the Simpson's generator. As such, these passphrases may be easier to recall, because they are more likely to read as valid sentences than the Diceware or EFF generators. This list is pulled from Donald J. Trump's Twitter account. The list is always growing, currently at 5,343 words providing about 12.3404-bits of entropy per word.

Pseudowords

The pseudowords generator is a cross between unreadable/unpronounceable random strings and memorable passphrases. They are pronounceable, even if the words themselves are gibberish. They are generally shorter in practice than passphrases, and longer than pure random strings. The generators are here to show what you can do with random pronounceable strings.

Bubble Babble

Bubble Babble is a hexadecimal encoder, with builtin checksumming, initially created Antti Huima, and implemented in the original proprietary SSH tool (not the one by the OpenSSH developers). Part of the specification is that every encoded string begins and ends with "x". However, rather than encode data from the RNG, it is randomly generating 5-characters words in the syntax of "". As such, each 5-character word, except for the end points, provides 21521521=231,525 unique combinations, or about 17.8208-bits of entropy. The end points are in the syntax of "x" or "x, which is about 21521*5=11,025 unique combinations, or about 13.4285-bits of entropy.

Secret Ninja

This generator comes from a static character-to-string assignment that produces pronounceable Asian-styled words. As such, there are only 26 assignments, providing about 4.7004-bits of entropy per string. There are three strings concatenated together per hyphenated word.

Cosby Bebop

I was watching this YouTube video with Bill Cosby and Stewie from Family Guy, and about half-way through the skit, Bill Cosby starts using made-up words as part of his routine. I've seen other skits by comedians where they use made-up words to characterize Bill Cosby, so I figured I would create a list of these words, and see how they fell out. There are 32 unique words, providing exactly 5-bits of entropy per word. Unlike the Bubble Babble and Secret Ninja generators, this generator uses both uppercase and lowercase Latin characters.

Korean K-pop

In following with the Bill Cosby Bebop generator, I created a Korean "K-pop" generator that used the 64-most common male and female Korean names, providing exactly 6-bits of entropy per name. I got the list of names from various sites listing common male and female Korean names.

Random

These are random strings provided as a last resort for sites or accounting software that have very restrictive password requirements. These passwords will be some of the shortest generated while meeting the same minimum entropy requirement. Because these passwords are not memorable, they should be absolutely stored in a password manager (you should be using one anyway).

  • Base-94: Uses all graphical U.S. ASCII characters (does not include horizontal space). Each character provides about 6.5546-bits of entropy. This password will contain ambiguous characters.
  • Base-64- Uses all digits, lowercase and uppercase Latin characters, and the "+" and "/". Each character provides exactly 6-bits of entropy. This password will contain ambiguous characters.
  • Base-32: Uses the characters defined in RFC 4648, which strives to use an unambiguous character set. Each character provides exactly 5-bits of entropy.
  • Base-16: Uses all digits and lowercase characters "a" through "f". Each character provides exactly 4-bits of entropy. This password will contain fully unambiguous characters.
  • Base-10: Uses strictly the digits "0" through "9". This is mostly useful for PINs or other applications where only digits are required. Each digits provides about 3.3219-bits of entropy. This password will contain fully unambiguous characters.
  • Emoji: There are 881 emoji glyphs provided by that font, yielding about 9.7830-bits per glyph. One side-effect, is that even though there is a character count in the generator box, each glyph may be more than 1 byte, so some input forms may count that glyph as more than 1 character. Regardless, the minimum entropy is met, so the emoji password is still secure.

I want to say something a bit extra about the Emoji generator. With the rise of Unicode and the UTF-8 standard, and the near ubiquitous popularity of smartphones and mobile devices, having access to non-Latin character sets is becoming easier and easier. As such, password forms are more likely supporting UTF-8 on input to allow Cyrillic, Coptic, Arabic, and East Asian ideographs. So, if Unicode is vastly becoming the norm, why not take advantage of it while having a little fun?

I opted for the black-and-white font, as opposed to the color font, to stay consistent with the look and feel of the other generators. This generator uses the emoji character sets provided by Google's Noto Emoji fonts, as that makes it easy for me to support the font in CSS 3, allowing every browser that supports CSS 3 to take advantage of the font and render the glyphs in a standard fashion. The license is also open so that I can redistribute the font without paying royalties, and others can do the same.

Screenshots

The post wouldn't be complete without some screenshots. The generator is both desktop friendly, fitting comfortably in a 1280x800 screen resolution, as well a mobile friendly, working well on even some of the oldest mobile devices.

Desktop screenshot of my password generator.

Desktop screenshot.

First mobile screenshot of my password generator.

First mobile screenshot.

Second mobile screenshot of my password generator.

Second mobile screenshot.

]]>
https://pthree.org/2017/09/04/a-practical-and-secure-password-and-passphrase-generator/feed/ 0
Random Passphrases Work, Even If They're Built From Known Passwords https://pthree.org/2017/08/03/random-passphrases-work-even-if-theyre-built-from-known-passwords/ https://pthree.org/2017/08/03/random-passphrases-work-even-if-theyre-built-from-known-passwords/#respond Thu, 03 Aug 2017 23:40:23 +0000 https://pthree.org/?p=4841 Just this morning, security researcher Troy Hunt released a ZIP containing 306 million passwords that he's collected over the years from his ';--have i been pwned? service. As an extension, he created a service to provide either a password or a SHA-1 hash to see if your password has been pwnd.

Screenshot showing the entry text field to test your password from the new password checking service Troy Hunt has provided.

In 2009, the social network RockYou was breached, and 32 million accounts of usernames and passwords was released to the public Internet. No doubt those 32 million passwords are included in Troy Hunt's password dump. What's interesting, and the point of this post, is individually, each password from the RockYou breach will fail.

Collage of six screenshots of individual RockYou passwords failing Troy Hunt's password check. They were: caitlin, peachy, keeley, doreen, ursulet, & juggalo.

However, what would happen if you took 6 random RockYou passwords, and created a passphrase? Below is screenshot demonstrating just that using the above 6 randomly chosen RockYou passwords. Individually, they all fail. Combined, they pass.

Screenshot showing how creating the passphrase 'caitlinpeachykeeleydoreenursuletjuggalo' passes Troy Hunt's check.

Now, to be fair, I'm choosing these passwords from my personalized password generator. The list is the top 7,776 passwords from the 32 million RockYou dump. As such, you could use this list as a Diceware replacement with 5 dice. Regardless, each password is chosen at random from that list, and enough passwords are chosen to reach a 70-bits of entropy target, which happen to be 6 passwords. Mandatory screenshot below:

Screenshot showing my password generator using the RockYou passwords as a source for a strong passphrase. One screenshot is hyphenating the passwords, the other is not.

The point of this post, is that you can actually build a secure password for sites and services using previously breached passwords for your word list source, in this case, RockYou. The only conditions is that you have a word list large enough create a reasonable passphrase with few selections, and that the process picking the passwords for you is cryptographically random.

]]>
https://pthree.org/2017/08/03/random-passphrases-work-even-if-theyre-built-from-known-passwords/feed/ 0
Electronic Slot Machines and Pseudorandom Number Generators https://pthree.org/2017/02/17/electronic-slot-machines-and-pseudorandom-number-generators/ https://pthree.org/2017/02/17/electronic-slot-machines-and-pseudorandom-number-generators/#respond Fri, 17 Feb 2017 19:22:53 +0000 https://pthree.org/?p=4834 TL;DR

An Austrian casino company used a predictable pseudorandom number generator, rather than a cryptographically secure one, and people are taking advantage of it, and cashing out big.

The Story

Wired reported on an article about an amazing operation at beating electronic slot machines, by holding your phone to the slot machine screen for a time while playing, leaving the slot machine, then coming back an additional time, and cashing in big.

Unlike most slots cheats, he didn’t appear to tinker with any of the machines he targeted, all of which were older models manufactured by Aristocrat Leisure of Australia. Instead he’d simply play, pushing the buttons on a game like Star Drifter or Pelican Pete while furtively holding his iPhone close to the screen.

He’d walk away after a few minutes, then return a bit later to give the game a second chance. That’s when he’d get lucky. The man would parlay a $20 to $60 investment into as much as $1,300 before cashing out and moving on to another machine, where he’d start the cycle anew.

These machines were made by Austrian company Novomatic, and when Novomatic engineers learned of the problem, after a deep investigation, the best thing they could come up with, was that the random number generator in the machine was predictable:

Novomatic’s engineers could find no evidence that the machines in question had been tampered with, leading them to theorize that the cheaters had figured out how to predict the slots’ behavior. “Through targeted and prolonged observation of the individual game sequences as well as possibly recording individual games, it might be possible to allegedly identify a kind of ‘pattern’ in the game results,” the company admitted in a February 2011 notice to its customers.

The article, focused on a single incident in Missouri, mentions that the state vets the machines before they go into production:

Recognizing those patterns would require remarkable effort. Slot machine outcomes are controlled by programs called pseudorandom number generators that produce baffling results by design. Government regulators, such as the Missouri Gaming Commission, vet the integrity of each algorithm before casinos can deploy it.

On random number generators

I'll leave you to read the rest of the article. Suffice it to say, the Novomatic machines were using a predictable pseudorandom number generator after observing its output for a period of time. This poses some questions that should immediately start popping up in your head:

  1. What is the vetting process by states to verify the quality of the pseudorandom number generators in solt machines?
  2. Who is on that vetting commission? Is it made up of mathematicians and cryptographers? Or just a board of executives and politicians?
  3. Why aren't casino manufacturers using cryptographically secure pseudorandom number generators?

For me, that third item is the most important. No doubt, as the Wired article states, older machines just cannot be fixed. They need to be taken out of production. So long as they occupy casinos, convenience stores, and gas stations, they'll be attacked, and the owner will lose money. So let's talk about random number generators for a second, and see what the gambling industry can do to address this problem.

You can categorize random number generators into four categories:

  1. Nonsecure pseudorandom
  2. Cryptographically secure pseudorandom
  3. Chaotic true random
  4. Quantum true random

What I would be willing to bet, is that most electronic machines out there are of the "nonsecure pseudorandom" type of random number generator, and Novomatic just happened to pick a very poor one. Again, there likely isn't anything they can do about existing machines in production now, but what can they do moving forward? They should start using cryptographically secure pseudorandom number generators (CSPRNGs).

In reality, this is trivial. There are plenty of CSPRNGs to choose from. CSPRNGs can be broken down further into three subcategories:

  1. Designs based on cryptographic primitives.
  2. Number theoretic designs.
  3. Special-purpose designs.

Let's look at each of these in turn.

Designs based on cryptographic primitives.

These are generators that use things like block ciphers, stream ciphers, or hashing functions for the generator. There are some NIST and FIPS standardized designs:

  • NIST SP 800-90A rev. 1 (PDF): CTR_DRBG (a block cipher, such as AES in CTR mode), HMAC_DRBG (hash-based message authentication code), and Hash_DRBG (based on cryptographically secure hashing functions such as SHA-256).
  • ANSI X9.31 Appendix A.2.4: This is based on AES, and obsoletes ANSI X9.17 Appendix C, which is based on 3DES. It requires a high-precision clock to initially seed the generator. It was eventually obsoleted by ANSI X9.62-1998 Annex A.4.
  • ANSI X9.62-2005 Annex D: This standard is defines an HMAC_DRBG, similar to NIST SP 800-90A, using an HMAC as the cryptographic primitive. It obsoletes ANSI X9.62-1998 Annex A.4, and also requires a high-precision clock to initially seed the generator.

It's important that these designs are backtracking resistant, meaning that if you know the current state of the RNG, you cannot construct all previous states of the generator. The above standards are backtracking resistant.

Number theoretic designs

There are really only two current designs, that are based on either the factoring problem or the discrete logarithm problem:

  • Blum-Blum-Shub: This is generator based on the fact that it is difficult to compute the prime factors of very large composites (on the order of 200 or more digits in length). Due to the size of the prime factors, this is a very slow algorithm, and not practical generally.
  • Blum-Micali: This is a generator based on the discrete logarithm problem, when given two known integers "b" and "g", it is difficult to find "k" where "b^k = g". Like Blum-Blum-Shub, this generator is also very slow, and not practical generally.

Special-purpose designs

Thankfully, there are a lot of special purpose designs designed by cryptographers that are either stream ciphers that can be trivially ported to a CSPRNG, or deliberately designed CSPRNGs:

  • Yarrow: Created by cryptographer Bruce Schneier (deprecated by Fortuna)
  • Fortuna: Also created by Bruce Schneier, and obsoletes Yarrow.
  • ISAAC: Designed to address the problems in RC4.
  • ChaCha20: Designed by cryptographer Daniel Bernstein, our crypto Lord and Savior.
  • HC-256: The 256-bit alternative to HC-128, which is part of the eSTREAM portfolio.
  • eSTREAM portfolio: (7 algorithms- 3 hardware, 4 software)
  • Random123 suite: Contains four highly parallelizable counter-based algorithms, only two of which are cryptographically secure.

The solution for slot machines

So now what? Slot machine manufacturers should be using cryptographically secure algorithms in their machines, full stop. To be cryptographically secure, the generator:

  • Must past the next-bit test (you cannot predict the next bit any better than 50% probability).
  • Must withstand a state compromise (you cannot reconstruct past states of the generator based on the current state).

If those two properties are met in the generator, then the output will be indistinguishable from true random noise, and the generator will be unbiased, not allowing an adversary, such as someone with a cellphone monitoring the slot machine, to get the upperhand on the slot machine, and prematurely cash out.

However, the question should then be raised- "How do you properly seed the CSPRNG, so it starts in an unpredictable state, before release?" Easy, you have two options here:

  • Seed the CSPRNG with a hardware true RNG (HWRNG), such as a USB HWRNG, or....
  • Build the machine such that it collects environmental noise as entropy

The first point is much easier to achieve than the second. Slot machines likely don't have a lot of interrupts built into the system-on-a-chip (SoC). So aside from a microphone, video camera, or antenna recording external events, you're going to be hard-pressed to get any sort of high-quality entropy into the generator. USB TRNGs are available all over the web, and cheap. When the firmware is ready to be deployed, read 512-bits out of the USB generator, hash it with SHA-256, and save the resulting hash on disk as an "entropy file".

Then all that is left is when the slot machine boots up and shuts down:

  • On startup, read the "entropy file" saved from the previous shutdown, to seed the CSPRNG.
  • On shutdown, save 256-bits of data out of the generator to disk as an "entropy file".

This is how most operating systems have solved the problem with their built-in CSPRNGs. Provided that the very first "entropy file" was initially seeded with a USB true HWRNG, the state of every slot machine will be always be different, and will always be unpredictable. Also, 256-bits is more than sufficient to make sure the initial state of the generator is unpredictable; physics proves it.

Of course, the SoC could have a HWRNG onboard, but then you run the risk of hardware failure, and the generator becoming predictable. This risk doesn't exist with software-based CSPRNGs, so provided you can always save the state of the generator on disk at shutdown, and read it on startup, you'll always have an unpredictable slot machine.

]]>
https://pthree.org/2017/02/17/electronic-slot-machines-and-pseudorandom-number-generators/feed/ 0
Adblockers Aren't Part Of The Problem- People Are https://pthree.org/2016/11/30/adblockers-arent-part-of-the-problem-people-are/ https://pthree.org/2016/11/30/adblockers-arent-part-of-the-problem-people-are/#comments Wed, 30 Nov 2016 15:06:44 +0000 https://pthree.org/?p=4766 Troy Hunt, a well-respected security researcher, and public speaker, wrote a blog post recently about how adblockers are part of the bad experience of the web. His article is about a sponsorship banner he posts at the top of his site, just below the header. It's not flashy, intrusive, loud, obnoxious, or a security or privacy concern. He gets paid better for the sponsorship strip than he does for ads, and the strip is themed with the rest of his site. It's out of the way of the site content, and scrolls with the page. In my opinion, it's in perfectly good taste. See for yourself:

Screenshot of Troy Hunt's homepage, showing the sponsorship strip just below the header.

Troy was surprised to find out, however, that his sponsorship strip is not showing when AdBlock Plus or UBlock Origin ad blockers are installed and enabled in the browser. He is understandably upset, as he is avoiding everything that piss off the standard web user when it comes to ads. He reached out to ABP about whitelisting his strip, and they've agreed it's hardly violating web user experience. However, someone added it to the EasyList filters, which means any ad blocker outside of ABP, will filter the sponsorship strip.

So, here's my question- are users wrong in filtering it?

Let's look at the state of web ads over the past couple decades. First, there was the ad popup, where the web page you were visiting would popup an ad right in front of the page. Sometimes they were difficult to close, and sometimes closing one would open a different one. Some pages would open dozens of popups, some fullscreen. It wasn't long before browsers across the board blocked popups by default, baked right into the browser.

Screenshot showing a Windows XP desktop littered with ad popups.

After popups were unanimously blocked across every browser, advertisers turned to ad banners. These were just as obnoxious as the popups, even if you didn't have to close a window. The flashed, blinked, falsely promised free trips and gadgets, and even sometimes auto-played videos. They were rarely relevant to the site content, but web page owners were promised a revenue per click, regardless. So, the more you could fit on the page, the more likely someone would click on an ad, and you would get paid. Web page owners placed these obnoxious ads above the header, below the header, in the sidebars, in the middle of the pages breaking up paragraphs in posts, in the footers. In some cases, the screen real estate dedicated to ads was more than the actual content on the site.

An image showing a collection of annoying banner ads.

Some HTML5 and CSS3 solutions now include overlays, that have to be manually closed or escaped, in order to continue parsing the site content. Unfortunately, ad blockers don't do a great job at blocking these. While they're great at finding and filtering out elements, blocking CSS overlay popups seems to be too difficult, as they are prevalent on the web, much to the chagrin of many ad block users.

Screenshot showing a CSS overlay on a web page showing an ad.

Ad blockers then became a mainstay. Web users were pissed off due to Flash crashing the browser (most ads were Flash-based), slowing down their connection to download additional content (at the time, most were on dial-up on slow DSL), and in general just getting in the way. It got so bad, that DoubleClick's "privacy chief" wrote a rant about ad blockers, and how they were unethical, and ad blocker users were stealing revenue.

As web page analytics started becoming a thing, more and more website owners wanted to know how traffic was arriving at their site, so they could further increase that traffic, and in addition, increase ad revenue. Already, page counters like StatCounter existed, to help site owners understand partially how traffic was hitting them, where they came from, what time, what search engine they used, how long they stayed, etc. Well, advertisers started putting these analytics in their ads. So, not only did the website owner know who you were, the advertising company did too. And worse, while the website owner might not be selling that tracking data, the advertiser very likely is.

The advertiser also became a data broker.

But here's the tricky part- ad blocking was no longer enough. Now website owners were adding JavaScript trackers to their HTML. They're not visible on the page, so the ad blocker isn't hiding an element. It's not enough to block ads any longer. Privacy advocates begin warning about "browser fingerprinting" based on the specific details in your browser that can uniquely identify you. Those unique bits are then tracked with these tracking scripts, and set to advertisers and data brokers, which change many hands along the way. The EFF created a project to help users understand how unique they appeared on the web through the Panopticlick Project.

Screenshot of my browser results after testing at https://panopticlick.eff.org.

As a result, other browser extensions dedicated to blocking trackers started showing up. Things like Ghostery, Disconnect, Privacy Badger, and more. Even extensions that completely disable JavaScript and Flash became popular. Popular enough, that browsers implemented a "click-to-play" setting, where flash and other plugin content was blocked by default, and you would need to click the element to display it. It's not uncommon now to visit a web page where you tracking blocker will block a dozen or more trackers.

Screenshot of Ghostery blocking 20 trackers on a web page.

I wish I could stop here, but alas, many advertisers have made a turn for the ugly. Now, web ads are the most common way to get malware installed on your computer. Known as "malvertising", it is more common at infecting your computer than shady porn sites. Even more worrisome, is that this trend is shifting away from standard desktops to mobile. Your phone is now more valuable than your desktop, and advertisers know it. Never mind shady apps that compromise your device, ads in legitimate "safe" apps are compromising devices as well.

Infographic showing the threat of malvertising on mobile in 2014.

So, to summarize, the history of ads has been:

  1. Annoying popups.
  2. Annoying banners.
  3. Annoying CSS overlays.
  4. Transparent trackers.
  5. Malvertising.

So, to Troy Hunt, here's my question: Given the awful history of advertisements on the web, are you honestly surprised that users don't trust a sponsorship strip?

Consider the following analogy: Suppose I brought a bunch of monkeys to your home, and they trashed the place. Smashed dishes, tore up furniture, destroyed computers and televisions, ruined floors, broke windows, and generally just destroyed anything and everything in sight. Then, after cleaning the place up, not only do I bring the monkeys back, but this time, they have digital devices (cameras, microphones, etc.) that report back to me about what your house looks like, where you live, what you're doing in response to the destruction. Again, you kick them out, clean up the place, and I return with everything as before, with some of them carrying a contagious disease that can get you and your family sick. I mean, honestly, one visit of these monkeys is enough, but they've made three visits, each worse than before.

Now, you show up at my doorstep, with a well-trained, leashed, groomed, clean, tame monkey, and I'm supposed to trust that it isn't anything like the past monkeys I've experienced before? As tame as it may be, call me rude, but I'm not trusting of monkeys right now. I've installed all sorts of alarm and monitoring systems, to warn me when monkeys are nearby, and nuke them with lasers. I've had too many bad experiences with monkeys in the past, to trust anyone bringing a new monkey to the premises.

So, you can see, it's not ad blockers that are the problem. It's the people behind the advertising firms and it's the people not trusting the Internet. The advertising c-level executives are trying to find ways to get their ad in front of your eyes, and are using any sort of shady means necessary to do it. The average web user is trying to find ways to have a pleasant experience on the web, without getting tracked, infected with malware, shouted at by a video, while still being able to consume the content.

People arguably don't trust ads. The people in the advertising firms have ruined that trust. You may have a clean privacy-aware non-intrusive sponsorship strip, but you can't blame people for not trusting it. We've just had too long of a history of bad ad experiences. So, while reaching out to the ad blocker developers to whitelist the sponsorship strip is a good first step, ultimately, if people don't trust it, and want to block, you can't blame them. Instead, continue focusing on what makes you successful, for your revenue from the ad blockers- blogging, speaking, developing, engaging. Your content, who you are, how you handle yourself is your most valuable ad.

]]>
https://pthree.org/2016/11/30/adblockers-arent-part-of-the-problem-people-are/feed/ 1
Breaking HMAC https://pthree.org/2016/07/29/breaking-hmac/ https://pthree.org/2016/07/29/breaking-hmac/#respond Fri, 29 Jul 2016 13:56:46 +0000 https://pthree.org/?p=4749 Okay. The title might be click bait, just a little, but after you finish reading this post, I think you'll be a bit more careful picking your HMAC keys. After learning this, I know I will be. However, HMAC is not broken. It just has an interesting ... property that's worth knowing about.

First off, let's remind ourselves what HMAC is. HMAC, or Hashed Message Authentication Codes, are the ability to authenticate a cryptographic message. This is done through an asymmetric key agreement protocol, such as Diffie-Hellman, where two parties securely share symmetric keys. These keys are used to encrypt messages as well as authenticate data. HMAC tags prevent chosen plaintext attacks, where the attacker can insert malicious data into the payload (send $1,000,000 to my account), and HMAC tags prevent adaptive chosen ciphertext attacks where the attacker can send encrypted data to the server, and learn what is being protected (is "password" in the payload?).

Authenticated messages are absolutely essential to modern cryptographic software. If you're writing cryptographic software, and you're not authenticating your ciphertext, you're doing it wrong. It doesn't matter if the data is at rest or in motion. Authenticate your ciphertext. This is where HMAC fits in.

So, best practice, when not using native AEAD ciphers, such as AES-GCM, or ChaCha20-Poly1305, is to encrypt the plaintext then authenticate it with HMAC, and prepend or append the digest (called a "MAC tag" or just "tag") to the ciphertext.

Something like this pseudocode:

ciphertext = AES-256-CTR(nonce, plaintext, key1)
tag = HMAC-SHA-512(ciphertext, key2)
ciphertext = tag || ciphertext

Then we ship off the newly bundled ciphertext and MAC tag. When it arrives at our destination, the recipient can verify if the ciphertext is what it should be, before decrypting it, by verifying the MAC tag, which is the whole point:

tag1 = ciphertext[:64]
data = ciphertext[64:]
tag2 = HMAC-SHA-512(ciphertext, key2)
hmac_check = 0

for char1, char2 in zip(tag1, tag2):
    hmac_check |= ord(char1) ^ ord(char2)

if hmac_check == 0:
    plaintext = AES-256-CTR(nonce, data, key1)
else:
    return False
return True

Notice that we're doing constant time comparison of the shipped tag and the calculated tag. Last thing we want is to introduce a timing attack by doing "if tag1 != tag2". However, after the constant time comparison, if tag1 and tag2 match, we can decrypt the data, and retrieve the plaintext.

So, now you have the background on HMAC, let's look at some examples with Python, and see where HMAC breaks. After all, that's the click bait title, no?

1
2
3
4
5
6
7
>>> import os
>>> import hmac
>>> import hashlib
>>> key = os.urandom(16)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg).hexdigest()
'd5a94f051b1e6ff67065b6f4c3a60130'

In this example, the default HMAC is HMAC-MD5. HMAC-MD5 is still considered cryptographically secure, even though vanilla MD5 is broken. Regardless, it'll suffice for this example, and we'll look at SHA-1 and SHA-2 also.

In RFC 2104, where HMAC is standardized, section 3 has this odd little tidbit (emphasis mine):

The key for HMAC can be of any length (keys longer than B bytes are
first hashed using H
). However, less than L bytes is strongly
discouraged as it would decrease the security strength of the
function. Keys longer than L bytes are acceptable but the extra
length would not significantly increase the function strength. (A
longer key may be advisable if the randomness of the key is
considered weak.)

The "B bytes" length is the block size of the underlying HMAC operations. If the key is shorter than this block size, zeroes need to be appended to the key. If the key is longer than this block size, then it is hashed with the HMAC cryptographic hash.

The block size is 64 bytes for the following HMACs:

  • HMAC-MD5
  • HMAC-RIPEMD128
  • HMAC-RIPEMD160
  • HMAC-SHA1

In other words, HMAC wants to key to be exactly one block in length. If it's longer, it's hashed with zeros appended to fit exactly into one block.

Can we test this?

1
2
3
4
5
6
>>> key = os.urandom(65) # longer than one block
>>> msg = os.urandom(256)
>>> hmac.new(key, msg).hexdigest()
'f887a4146e94ed47405c97931798885d'
>>> hmac.new(hashlib.md5(key).digest(),msg).hexdigest()
'f887a4146e94ed47405c97931798885d'

We have a collision. In other words:

For:
* 'H' a cryptographic hash
* 'k' a private key
* 'm' a message
* 'B' an HMAC block size

    HMAC(k, m) == HMAC(H(k), m)

for all 'k', where len(k) > B

Does this work with HMAC-SHA1?

1
2
3
4
5
6
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha1).hexdigest()
'1070312944223b36928382d7a53ca54f7204ad4a'
>>> hmac.new(hashlib.sha1(key).digest(), msg, hashlib.sha1).hexdigest()
'1070312944223b36928382d7a53ca54f7204ad4a'

How about all the SHA-2 functions? (SHA-224, SHA-256, SHA-384, & SHA-512)?

SHA-224:

1
2
3
4
5
6
7
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha224).hexdigest()
'9ea8c3f667e55e6e9c5d63c5dd1b569ca69e2cc69f5e3fa3f87e94ba'
>>> hmac.new(hashlib.sha224(key).digest(), msg, hashlib.sha224).hexdigest
()
'9ea8c3f667e55e6e9c5d63c5dd1b569ca69e2cc69f5e3fa3f87e94ba'

SHA-256:

1
2
3
4
5
6
7
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha256).hexdigest()
'2aa02e678fcfe7ecaa1475efb70fe284fe91cc81e5a9c543433b70f5f5112c4b'
>>> hmac.new(hashlib.sha256(key).digest(), msg, hashlib.sha256).hexdigest
()
'2aa02e678fcfe7ecaa1475efb70fe284fe91cc81e5a9c543433b70f5f5112c4b'

SHA-384:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha384).hexdigest()
'0941e6502233a72d01beeec729eaa7db2469f8ce96339cd5b3b2c9a4684501e6a7025fac
6c9c20a511c48df76b453ec3'

>>> hmac.new(hashlib.sha384(key).digest(), msg, hashlib.sha384).hexdigest
()
'5380905fb89fee68836be076ebfccff600e3b89c6554840fe61fed01b049d6a6a77423d2
f5f4be1afb9d1c6f63b8b7fc'

SHA-512:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(65)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha512).hexdigest()
'36063bdc2d02ce8ea4b01b40ba040094c640959e0cc5716f7a75f119cbc348aa93d555f8
6bfcdaee5dad4ec2e5d53ed4362f9df0720ec0e1272288d49a912f7e'

>>> hmac.new(hashlib.sha512(key).digest(), msg, hashlib.sha512).hexdigest
()
'8bfe675e3ca35a8680243d3747b5d3ce7ded1731e1a307cf5d1b00ae9243395ab94039f2
2585b417d7cbdf09f3d8dcf39c85ce147ff77c901c1a21f8de981b6a'

So it appears that the block size is indeed 64 bytes for MD5, SHA-1, SHA-224, and SHA-256, but for SHA-384 and SHA-512, it doesn't appear to be working. That is because the block size has changed to 128 bytes for these two functions. So, if our key is 129 bytes, we should be able to replicate collisions:

SHA-384 with a 129-byte key:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(129)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha384).hexdigest()
'bda2b586637e3bd73a27919601d7d5a1c1743f1f9f5cb72a0aa874f832046f4bc396ff8e
307f9318dc404c4b432ca491'

>>> hmac.new(hashlib.sha384(key).digest(), msg, hashlib.sha384).hexdigest
()
'bda2b586637e3bd73a27919601d7d5a1c1743f1f9f5cb72a0aa874f832046f4bc396ff8e
307f9318dc404c4b432ca491'

SHA-512 with a 129-byte key:

1
2
3
4
5
6
7
8
9
>>> key = os.urandom(129)
>>> msg = os.urandom(256)
>>> hmac.new(key, msg, hashlib.sha512).hexdigest()
'd0153f8bb6a549539abbcff8ee5ac7592c48c082bbbb7b3cc95dcb2166f162e5c59bb7bb
3316e65d1481bd8697e8d3bc91deb46ad44845b972c57766f45c54bd'

>>> hmac.new(hashlib.sha512(key).digest(), msg, hashlib.sha512).hexdigest
()
'd0153f8bb6a549539abbcff8ee5ac7592c48c082bbbb7b3cc95dcb2166f162e5c59bb7bb
3316e65d1481bd8697e8d3bc91deb46ad44845b972c57766f45c54bd'

This isn't a poor Python implementation of HMAC. Try it in your favorite language, and you should be able to replicate the collisions. This is a "bug" in HMAC. If the key is longer than the block size, it's hashed with the HMAC cryptographic hash, then appended with zeros to fit the single block.

So what does this mean? It means that when choosing your HMAC keys, you should stay within one block size of bytes- 64 bytes or less for MD5, RIPEMD-128/160, SHA-1, SHA-224, SHA-256, and 128-bytes or less for SHA-384 and SHA-512. If you do this, you'll be fine.

Then again, you should probably be using NaCl or libsodium rather than piecing these cryptographic primitives manually yourself. These sorts of pitfalls are already handled for you.

This post is the result of a Twitter discussion between Scott Arciszewski, myself, and some others.

UPDATE: I'm not aware of any actual security implications where this is a problem that two distinct inputs produce the same HMAC digest. Ultimately, what are you trying to get out of HMAC? It's used as an authentication and integrity mechanism. So what if a long key, and the hash of that key produce the same HMAC digest? At 64 bytes, or 512-bits, the amount of work required at guessing the right key is anything but practical. Still, this is interesting.

]]>
https://pthree.org/2016/07/29/breaking-hmac/feed/ 0
Further Investigation Into Scrypt and Argon2 Password Hashing https://pthree.org/2016/06/29/further-investigation-into-scrypt-and-argon2-password-hashing/ https://pthree.org/2016/06/29/further-investigation-into-scrypt-and-argon2-password-hashing/#respond Thu, 30 Jun 2016 04:59:11 +0000 https://pthree.org/?p=4717 Introduction

In my previous post, I didn't pay close attention to the memory requirements of Argon2 when running my benchmarks. Instead, I just ran them until I got tired of waiting around. Further, I really didn't do justice to either scrypt nor Argon2 when showing the parallelization factor. So, as a result, I did a lot more benchmarks on both, so you could more clearly see how the cost affects the time calculating the password hash, and how parallelization can affect that time.

More scrypt Benchmarks and Clarification

So, let's run some more scrypt benchmarks, and take a closer look at what's going on. Recall that the cost parameters for scrypt are:

  • N: The CPU and RAM cost
  • r: The mixing loop memory block size
  • p: The product factor

The recommended cost factors from the Colin Percival are:

  • N: 16384 (214)
  • r: 8
  • p: 1

To calculate the amount of memory used, we use the following equation:

Memory in bytes = (N * r * 128) + (r * p * 128)

So, according to the recommendation:

(214 * 8 * 128) + (8 * 1 * 128) = 16,778,240 bytes

Or, about 16 megabytes. According to Anthony Ferrara, you should be using at least 16 MiB with scrypt. At 4 Mib or less, it is demonstrably weaker than bcrypt. So, when you're looking at the images below, you'll notice that the first memory result is in red text, to show that 8 MiB is the weak point with those cost factors, and the 16 MiB is green, showing cost factors from there up are preferred. As a result, anything between 8 MiB and 16 MiB is default black, in that you should probably be cautious using these cost factors. While you might be demonstrably stronger than bcrypt with these work loads, you're not at the developer's recommendation of 16 MiB.

So, knowing that, let's look at the results. Notice that when we increase our product factor, our execution time increases by that factor. Despite urban legend, this isn't a parallelization constant (well, it is, but it's not breaking up the problem into smaller ones- it's multiplying it). The idea is that once you've reached a reasonable memory cost, you can increase the execution time by creating more mixing loops with the same cost on each loop. So, instead of one mixing loop costing you 16 MiB, you can have two mixing loops costing you 16 MiB. We didn't divide the problem, we multiplied it. As such, our execution time will double from one mixing loop to two mixing loops.

This seems strange, and indeed it is, but you should start with "p=1" at the memory cost you can afford, then increase the proudct factor if you can donate more time to the execution. In other words, the product factor is designed for hardware limited scenarios. In general, you'll want to look at your execution time, and let the memory come in as an after thought (provided it's at or more than 16 MiB).

As in the last post, I have highlighted the green cells with interactive password sessions targeting half-of-a-second and red cells with symmetric key derivation targeting a full five seconds.


Scrypt table showing memory block sizes of 1 and 2 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 4 and 6 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 8 and 10 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 12 and 14 with product multipliers of 1, 2, and 4.

Scrypt table showing memory block sizes of 16 and 18 with product multipliers of 1, 2, and 4.

More Argon2 Benchmarks

When showing Argon2 in my last post, I did a poor job demonstrating several additional cost factors, and I didn't make it clear how the parallelization played a roll when keeping the same cost factor. As a result, I ran additional benchmarks to make it more clear exactly how CPU and RAM play with each other in your work load.

As a reminder, the cost parameters for Argon2 are as follows:

  • n: The number of iterations on the CPU.
  • m: The memory work load.
  • p: The parallelization factor.

Unlike scrypt, where a single factor manipulates both the CPU and RAM cost, Argon2 separates them out. You deliberately have two knobs to play with- "n" for the CPU and "m" for the RAM. But, one affects the other. If you are targeting a specific execution time, and you increase your memory factor by two, then your CPU work factor will be decreased by half. On the reverse, if you increase your CPU work factor by two, then your memory work factor will be decreased by half. So, affecting one work factor affects the other.

Why is this important? Well, let's consider setting your memory requirement into the gigabyte range. At 1 GiB, for an interactive login session of .5 seconds, you would need at least both cores working on the hash, and you would only get a single iteration. In other words, your work is entirely memory dependent without any significant CPU cost. Maybe you're trying to thwart FPGAs or ASICs with the large memory requirement. However, is it possible that an adversary has 1 GiB of on-die cache? If so, because you're entirely memory-dependent, and no CPU work load, you've been able to cater to the adversary, without significant hardware cost.

On the reverse, you could get CPU heavy with 2048 iterations to hit your .5 seconds execution time, but then you would only be using 256 KiB of memory. You're likely not defeating the FPGAs and ASICs that Argon2 is designed for, as you're almost entirely processor-driven.

So, what to do? It would probably be a good idea to target a balance- require a significant amount of memory, even if it doesn't break the on-die cache barrier, while also requiring a significant amount of processor work. Sticking with Colin's recommendation of 16 MiB (214) of memory and 32 iterations on 4 cores for interactive logins is probably a good balance. Then again, it will all depend on your hardware, what you can expect in customer execution time, load, and other variables.

However, here are additional timings of Argon2, just like with scrypt, so you can see how parallelization affects identical costs. Again, green cells are targeting .5 seconds for interactive logins, and red cells are targeting 5 seconds for symmetric key derivation.


Argon2 table showing memory requirements of 256 KiB and 512 KiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 1 MiB and 2 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 4 MiB and 8 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 16 MiB and 32 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 64 MiB and 128 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 256 MiB and 512 MiB with parallel factors of 1, 2, and 4.

Argon2 table showing memory requirements of 1 GiB and 2 GiB with parallel factors of 1, 2, and 4.

Conclusion

Hopefully, this will help you make a more educated decision about your cost factors when deploying either scrypt or Argon2 as your password hash or symmetric key derivation function. Remember, that you have a few things to consider when picking your costs:

  • Make it expensive for both CPU and memory.
  • Target a realistic execution time for the situation.
  • Guarantee that you can always meet these goals after deployment.

Also, don't deploy Argon2 into production quite yet. Let is bake for a while. If it still stands secure in 2020, then you're probably good to go. Otherwise, deploy scrypt, or the other functions mentioned in the prior post.

]]>
https://pthree.org/2016/06/29/further-investigation-into-scrypt-and-argon2-password-hashing/feed/ 0
Let's Talk Password Hashing https://pthree.org/2016/06/28/lets-talk-password-hashing/ https://pthree.org/2016/06/28/lets-talk-password-hashing/#comments Tue, 28 Jun 2016 06:43:48 +0000 https://pthree.org/?p=4699 TL;DR

In order of preference, hash passwords with:

  1. scrypt
  2. bcrypt
  3. Argon2
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2

Do not hash passwords with:

  1. MD5
  2. md5crypt
  3. UNIX crypt(3)
  4. SHA-1/2/3
  5. Skein
  6. BLAKE2
  7. Any general purpose hashing function.
  8. Any encryption algorithm.
  9. Your own design.
  10. Plaintext

Introduction

Something that comes up frequently in crypto circles, aside from the constant database leaks of accounts and passwords, are hashing passwords. Because of the phrase "hashing passwords", developers who may not know better will think of using generic one-way fixed-length collision-resistant cryptographic hashing functions, such as MD5, SHA-1, SHA-256, or SHA-512, without giving a second thought to the problem. Of course, using these functions is problematic, because they are fast. Turns out, we don't like fast hashing functions, because password crackers do like fast hashing functions. The faster they can do, the sooner they can recover the password.

The Problem

So, instead of using MD5, SHA-1, SHA-256, SHA-512, etc., the cryptographic community got together, and introduced specifically designed password hashing functions, where a custom work factor is included as a cost. Separately, key derivation functions were also designed for creating cryptographic keys, where a custom work factor was also included as a cost here. So, with password-based key derivation functions and specifically designed password hashing functions, we came up with some algorithms that you should be using instead.

The Solution

The most popular algorithms of this type would include, in my personal order of preference from most preferred to least preferred:

  1. scrypt (KDF)
  2. bcrypt
  3. Argon2 (KDF)
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2 (KDF)

The only difference between a KDF and a password hashing function, is that the digest length can be arbitrary with KDFs, whereas password hashing functions will have a fixed length output.

For the longest time, I was not a fan of scrypt as a password hashing function. I think I've changed my mind. Even though scrypt is sensitive to the parameters picked, and it suffers from a time-memory trade-off (TMTO), it's still considered secure, provided you pick sane defaults. I also place bcrypt over Argon2, because Argon2 was just recently announced as the Password Hashing Contest winner. As with all cryptographic primitives, we need to time to analyze, attack, and pick apart the design. If after about 5 years, it still stands strong and secure, then it can be recommended as a solution for production. In the meantime, it's something certainly worth testing, but maybe not for production code. Finally, I prefer sha512crypt and sha256crypt over PBKDF2, mostly because they are included with every GNU/Linux distribution by default, they are based on the strong SHA-2 hashing function, which has had years and mountains of analysis, and unlike PBKDF2, you know exactly which hashing function is used. PBKDF2 could be using SHA-2 functions by default, or it could be using SHA-1. You'll need to check your library to be sure.

Different Strokes for Different Folks

Regardless, all of the above functions include cost parameters for manipulating how long it takes to calculate the hash from a password. It's less important exactly what the cost parameters are, and more important that you are targeting an appropriate time to work through the cost, and create the hash. This means you need to identify your threat model and your adversary.

The two common scenarios you'll find yourself in, are:

  1. Password storage
  2. Encryption keys

For password storage, your threat model is likely the password database getting leaked to the Internet, and password crackers around the world working on the hashes in the database to recover passwords. Thus, your adversary is malware, Anonymous, and password crackers. For encryption keys, your threat model is likely private encrypted keys getting compromised and side-channel attacks. Thus, your adversary is also malware, poor key exchanges, or untrusted networks. Knowing your threat model and your adversary changes how you approach the problem.

With password storage, you may be dealing with an interactive login, such as through a website. As such, you probably want the password hashing time to be quick, while still maintaining a work factor that would discourage large distributed attacks on your leaked database. Possibly, .5 seconds. This means if the database was leaked, the password cracker could do no more than 2 passwords per second. When you compare this to the millions of hashes per second a GPU could execute on Windows NTLM passwords, 2 passwords per second is extremely attractive. For encryption keys, you probably don't need to worry about interactive sessions, so taking 5 seconds to create the key from the password probably isn't a bad thing. So key crackers spending 5 seconds per guess trying to recover the password that created the encrypted private key is really nice.

bcrypt, sha256crypt, sha512crypt, & PBKDF2

So, knowing the work factors, what would it look like for the above algorithms? Below, I look at bcrypt, sha256crypt, sha512crypt, and PBKDF2 with their appropriate cost. I've highlighted the row green where a possible work factor could mean spending 0.5 seconds on hashing the password, and a red row where a possible work factor could mean spending 5 full seconds on creating a password-based encryption key.

Spreadsheet table showing various cost factors for bcrypt, sha256crypt, sha512crypt, and PBKDF2.

Notice that for bcrypt, this means for password hashing, a factor of 13 would provide a cost of about 0.5s to hash the password, where a factor of 16 would get me close to my cost of about 5 seconds for creating a password-based key. For sha256crypt, sha512crypt, and PBKDF2, that seems to be about 640,000 and 5,120,000 iterations respectively.

scrypt

When we move to scrypt, things get a touch more difficult. With bcrypt, sha256crypt, sha512crypt, and PBKDF2, our cost is entirely a CPU load factor. Unfortunately, while possibly problematic for fast GPU clusters, they still fall victim to algorithm-specific FPGAs and ASICs. In order to combat this, we need to also include a memory cost, seeing as though memory on these devices is expensive. However, having both a CPU and a RAM cost, means multiple knobs to tweak. So, Colin Percival, the designer of scrypt, decided to bundle both the CPU and the RAM cost three factors: "N", "r", and "p". The resulting memory usage is calculated as follows:

Memory in bytes = (N * r * 128) + (r * p * 128)

There are a lot of suggestions out there about what's "best practice". It seems that you should at least have the following cost factors with scrypt, which provides a 16 MiB memory load:

  • N: 16384 (214)
  • r: 8
  • p: 1

While you should be aware of the sensitivity of scrypt parameters, provided you are working with at least 16 MiB of RAM, you aren't any worse than other password hashing functions or KDFs. So, in the following tables, I increase the memory cost needed for the hash by tweaking the three parameters.

Update 2016-06-29: I've clarified these parameters in a follow-up post, which you should most definitely read at https://pthree.org/2016/06/29/further-investigation-into-scrypt-and-argon2-password-hashing/.

First table showing the cost factors of scrypt. Second table showing different cost factors of scrypt. Third table showing yet different cost factors of scrypt.

Because I only have access to a single-socket-quad core CPU in this testing machine, I wanted to limit my "p" cost to 1, 2, and 4, which is displayed in those tables. Further, I'm limited on RAM, and don't want to disrupt the rest of the applications and services running on the box, so I've limited my "r" cost to 4, 8, and 16 multiplied by 128 bytes (512 bytes, 1024 bytes, and 2048 bytes).

Interestingly enough, Colin Precival recommends 16 MiB (N=16384 (214), r=8, p=1) for interactive logins and 16 MiB (N=131072 (217), r=1, p=1) for symmetric key derivation. If I were targeting my 0.5s password hashing time, then I could improve that to 256 MiB (N=65536 (216), r=8, p=1), or 2 GiB (N=2097152 (221), r=8, p=1), if targeting just slightly more than 5 seconds for symmetric key derivation.

Argon2

Finally, we look at Argon2. Argon2 comes in two flavors- Argon2d and Argon2i; the first of which is data (d)ependent and the latter is data (i)independent. The former is supposed to be resistant against GPU cracking while the latter is supposed to be resistant against side-channel attacks. In other words, Argon2d would be suitable for password hashing, while Argon2i would be suitable for encryption key derivation. However, regardless of Argon2d or Argon2i, the cost parameters will perform the same, so we'll treat them as a single unit here.

Like scrypt, Argon2 has both a CPU and a RAM cost. However, both are handled separately. The CPU cost is handled through standard iterations, like with bcrypt or PBKDF2, and the RAM cost is handled through specifically ballooning the memory. When I started playing with it, I found that just manipulating the iterations felt very much like bcrypt, but I could affect the overall time it took to calculate the hash by just manipulating the memory also. When combining the two, I found that iterations affected the cost more than the RAM, but both had significant say in the calculation time, as you can see in the tables below. As with scrypt, it also has a parallelization cost, defining the number of threads you want working on the problem:

First table showing the cost factors of Argon2. Second table showing different cost factors of Argon2. Third table showing yet different cost factors of Argon2.

Note the RAM cost between 256 KiB and 16 MiB, in addition to the number of iterations and the processor count cost. As we balloon our RAM, we can bring our iteration cost down. As we require more threads to work on the hash, we can bring that iteration count down even further. Regardless, we are trying to target 0.5s for an interactive password login, and a full 5 seconds for password-based encryption key derivation.

Conclusion

So, what's the point? When hashing passwords, whether to store them on disk, or to create encryption keys, you should be using password-based cryptographic primitives that were specifically designed for this problem. You should not be using general purpose hashing functions of any type, because of their speed. Further, you should not be rolling out your own "key-stretching" algorithm, such as recursively hashing your password digest and additional output.

Just keep in mind- if the algorithm was specifically designed to handle passwords, and the cost is sufficient for your needs, threat model, and adversary, then you're doing just fine. Really, you can't go wrong with any of them. Just avoid any algorithm not specifically designed around passwords. The goal is security through obesity.

Best practice? In order of preference, use:

  1. scrypt
  2. bcrypt
  3. Argon2
  4. sha512crypt
  5. sha256crypt
  6. PBKDF2

Do not use:

  1. MD5
  2. md5crypt
  3. UNIX crypt(3)
  4. SHA-1/2/3
  5. Skein
  6. BLAKE2
  7. Any general purpose hashing function.
  8. Any encryption algorithm.
  9. Your own design.
  10. Plaintext
]]>
https://pthree.org/2016/06/28/lets-talk-password-hashing/feed/ 2
The Physics of Brute Force https://pthree.org/2016/06/19/the-physics-of-brute-force/ https://pthree.org/2016/06/19/the-physics-of-brute-force/#respond Sun, 19 Jun 2016 19:34:14 +0000 https://pthree.org/?p=4675 Introduction

Recently, MyDataAngel launched a Kickstarter project to sell a proprietary encryption algorithm and software with 512-bit and 768-bit symmetric keys. The motivation was that 128-bit and 256-bit symmetric keys just isn't strong enough, especially when AES and OpenSSL are older than your car (a common criticism they would mention in their vlogs). Back in 2009, Bruce Schneier blogged about Crypteto having a 49,152-bit symmetric key. As such, their crypto is 100% stronger, because their key is 100% bigger (than 4096-bit keys?). Meganet, which apparently still exists, has a 1 million-bit symmetric key!

It's hard to take these encryption products seriously, when there are no published papers on existing primitives, no security or cryptography experts on your team, and you're selling products with ridiculous key lengths (to be fair, 512-bit and 768-bit symmetric keys aren't really that ridiculous). Nevermind that your proprietary encryption algorithm is not peer-reviewed nor freely available to the public. Anyone can create a symmetric encryption algorithm that they themselves cannot break. The trick is releasing your algorithm for peer review, letting existing cryptography experts analyze the design, and still coming out on top with a strong algorithm (it wouldn't hurt if you analyzed existing algorithms and published papers yourself).

So with that, I want to talk a bit about the length of symmetric keys, and what it takes to brute force them. Bruce Schneier addressed this in his "Applied Cryptography" book through the laws of thermodynamics. Unfortunately, he got some of the constants wrong. Although the conclusion is basically the same, I'm going to give you the same argument, with updated constants, and we'll see if we come to the same conclusion.

Counting Bits

Suppose you want to see how many bits you can flip in one day by counting in binary every second. Of course, when you start counting, you would start with "0", and your first second would flip your first bit to "1". Your second second would flip your second bit to "1" while also flipping your first bit back to "0". Your third second would flip the first bit back to "1", and so forth. Here is a simple GIF (pronounced with a hard "G") counting from 0 to 127, flipping bits each second.

Bit counter animation counting from 0 to 127.

By the end of a 24-hour period, I would have hit 86,400 seconds, which is represented as a 17-bit number. In other words, every 24 hours, flipping 1 bit per second, I can flip every combination of bits in a 16-bit number.

Binary representation of 86400

By the end of a single year, we end up with a 25-bit number, which means flipping a single bit every second can flip every combination of 24-bits every year.

Binary representation of 31536000

So, the obvious question is then this- what is the largest combination of bits that I can flip through to exhaustion? More importantly, how many computers would I need to do this work (what is this going to cost)?

Some Basic Physics

One of the consequences of the second law of thermodynamics, is that it requires energy to do a certain amount of work. This could be anything from lifting a box over your head, to walking, to even getting out of bed in the morning. This also includes computers and hard drives. When the computer wishes to store data on disk, energy is needed to do that work. This is expressed with the equation:

Energy = kT

Where "k" is Boltzmann's constant of 1.38064852×10−16 ergs per Kelvin, and "T" is the temperature of the system. I'm going to use ergs as our unit, as we are speaking about work, and an "erg" is a unit of energy. Of course, a "Kelvin" is a unit of temperature, where 0 Kelvin is defined as a system devoid of energy; also known as "absolute zero".

It would make the most sense to get our computer as absolutely cool as possible to maximize our output while also minimizing our energy requirements. Current background radiation in outer space is about 2.72548 Kelvin. To run a computer cooler than that would require a heat pump, which means adding additional energy to the system than what is needed for our computation. So, we'll run this ideal computer at 2.72548 Kelvin.

As a result, this means that to flip a single bit with our ideal computer, it requires:

Energy = (1.38064852×10−16 ergs per Kelvin) * (2.72548 Kelvin) = 3.762929928*10-16 ergs

Some Energy Sources

The Sun

Now that we know our energy requirement, let's start looking at some energy sources. The total energy output from our star is about 1.2*1034 Joules per year. Because one Joule is the same as 1*107 ergs, then the total annual energy output of the Sun is about 1.2*1041 ergs. So, doing some basic math:

Bits flipped = (1.2*1041 ergs) / (3.762929928*10-16 ergs per bit) = 3.189004374*1056 bits

3.189004374*1056 bits means I can flip every combination of bits in a 2187-bit number, if I could harness 100% of the solar energy output from the sun each year. Unfortunately, our Sun is a weak star.

A Supernova

A supernova is calculated to release something around 1044 Joules or 1051 ergs of energy. Doing that math:

Bits flipped = (1051 ergs) / (3.762929928*10-16 ergs per bit) = 2.657503608*1066 bits

2.657503608*1066 bits is approximately 2220-bits. Imagine flipping every bit in a 220-bit number in an orgy of computation.

A Hypernova

A hypernova is calculated to release something around 1046 Joules or 1053 ergs of energy. Doing that math:

Bits flipped = (1053 ergs) / (3.762929928*10-16 ergs per bit) = 2.657503608*1068 bits

2.657503608*1068 bits is approximately 2227-bits. This is a computation orgy turned up to 11.

Of course, in all 3 cases, I would have to harness 100% of that energy into my ideal computer, to flip every combination of these bits. Never mind finding transportation to get me to that hypernova, the time taken in that travel (how many millions of light years away is it?), and the cost of the equipment to harness the released energy.

Bitcoin Mining

As a comparative study, Bitcoin mining has almost surpassed 2 quintillion SHA-256 hashes per second. If you don't think this is significant, it is. That's processing all of a 60-bit number (all 260 bits) every second, or an 85-bit number (all 285 bits) every year. This is hard evidence, right now, of a large scale 256-bit brute force computing project, and it's barely flipping all the bits in an 85-bit number every year. The hash rate would have to double (4 quintillion SHA-256 hashes every second) to surpass flipping all the bits in an 86-bit number every year.

Further, we do not have any evidence of any clustered supercomputing project that comes close to that processing rate. It can be argued that the rate of Bitcoin mining is the upper limits of what any group of well-funded organizations could afford (I think it's fair to argue several well-funded organizations are likely Bitcoin mining). To produce a valid conspiracy theory to counteract that claim, you would need to show evidence of organizations that have their own semiconductor chip manufacturing, that has outpaced ARM, AMD, Intel and every other chip maker on the market, by several orders of magnitude.

Regardless, we showed the amount of energy needed anyway to flip every bit in a 256-bit number, and the laws of thermodynamics strongly imply that it's just not physically possible.

Asymmetric Cryptography

Things change when dealing with asymmetric cryptography. Now, instead of creating a secret 256-bit number, you're using mathematics, such as prime number factorization or elliptic curve equations. This changes things drammatically when dealing with key lengths, because even though we assume some mathematical problems are easy to calculate, but hard to reverse, we need to deal with exceptionally large numbers to give us the security margins necessary to prove that hardness.

As such, it because less of a concern about energy, and more a concern about time. Of course, key length is important up to a point. We just showed with the second law of thermodynamics, that brute forcing your way from 0 to 2256 is just physically impossible. However, finding the prime factors of that 256-bit number is a much easier task, does not require as much energy, and can be done by only calculating no more than half of the square root amount of numbers (in this case, 2127, assuming we're only testing prime numbers).

As such, we need to deal with prime factors that are difficult to find. It turns out that it's not enough to just have a 512-bit private key to prevent the Bad Guys from finding your prime factors. This is largely because there are efficient algorithms for calculating and testing prime numbers. So, it must also be expensive to calculate and find those primes. Currently, best practice seems to be generating 2 1024-bit prime factors to produce a 2048-bit private RSA key.

Table showing recommendations of key length from various authorities

Fixed-length Collision-resistant Hashing

Fixed-length collision-resistant hashing puts a different twist on brute force searching. The largest problem comes from the Birthday Attack. This states that if you have approximately the square root of 2 times 365 people in the room (about 23 people), the chances that any two people share the same birthday is 50%. Notice that this comes from any two people in the room. This means that you haven't singled out 1 person, and the odds that the other 22 people in the room have that same birthday is 50%. This isn't a pre-collision search. This is a blind search. You ask the first person what their birthday is, and compare it with the other 22 people in the room. Then you ask the second person what their birthday is, and compare it with the remaining 21 people in the room. And so on and so forth. After working through all 23 people comparing everyone's birthday to everyone else's birthday, the odds you found a match between two random people is 50%.

Why is this important? Suppose you are processing data with SHA-1 (160-bit output). You only need to calculate 280 SHA-1 hashes before your odds of finding a duplicate hash out of the currently calculated hashes reaches 50%. As we just learned with Bitcoin, this is practical within one year with a large orchestrated effort. Turns out, SHA-1 is weaker that that (we only need to calculate 264 hashes for a 50% probability), which is why the cryptographic community has been pushing so hard to get everyone and everything away from SHA-1.

Now you may understand why 384-bit and 512-bit (and more up to 1024-bit) cryptographically secure fixed-length collision-resistant hashing functions exist. Due to the Birthday Attack, we can make mince meat of our work.

Conclusion

As clearly demonstrated, the second law of thermodynamics provides a clear upper bound on what can be found with brute force searches. Of course, brute force searches are the least effective way to find the private keys you're looking for, and indeed, there are more efficient ways to get to the data. However, if you provide a proprietary encryption algorithm with a closed-source implementation, that uses ridiculously long private keys, then it seems clear that you don't understand the physics behind brute force. If you can't grasp the simple concept of these upper bounds, why would I want to trust you and your product in other areas of security and data confidentiality?

Quantum computing does give us some far more efficient algorithms that classical computing cannot achieve, but even then, 256-bits still remains outside of the practical realm of mythical quantum computing when brute force searching.

As I've stated many times before- trust the math.

]]>
https://pthree.org/2016/06/19/the-physics-of-brute-force/feed/ 0
Webcam Random Number Generation https://pthree.org/2016/06/12/webcam-random-number-generation/ https://pthree.org/2016/06/12/webcam-random-number-generation/#respond Sun, 12 Jun 2016 21:13:28 +0000 https://pthree.org/?p=4667 A couple weeks ago, I purchased a lava lamp for $5 at a thrift store. It was in brand spanking new condition, and worked like a charm. The only thing going through my head at the time? I can't wait to point my webcam at it, and start generating some random numbers! Okay, well that, and mood lighting for the wife.

Anyway, I wrote a quickie Python script which will capture a frame from the webcam, hash it with a keyed BLAKE2, and output the result to a FIFO file to be processed. The BLAKE2 digest of the frame also becomes the key for the next BLAKE2 instance, making this script very CBC-like in execution (the first function is keyed from /dev/urandom, and each digest keys the next iteration).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/python

# Create true random seeds (near as we can tell) with your webcam.
#
# This script will use your webcam pointed at a source of entropy, keyed with
# random data from the OS CSPRNG. You could point the camera at:
#
#   * Lava lamps
#   * Plasma globes
#   * Double pendulums
#   * Rayleigh-Benard convection
#   * Brownian motion
#
# Performance is ~ 2 KiB/s.
# Requires pyblake2: https://pypi.python.org/pypi/pyblake2
#
# Released to the public domain.

import os
import cv2
import pyblake2

cap = cv2.VideoCapture(0)
webcamfile = '/tmp/webcamfile.fifo'
key = os.urandom(64)

try:
    os.mkfifo(webcamfile)
except OSError, e:
    print "Cannot create FIFO: {0}".format(e)
else:
    fifo = open(webcamfile, 'w+')

while True:
    ret, frame = cap.read()
    if not ret:
        break

    b2sum = pyblake2.blake2b(key)
    b2sum.update(frame)
    digest = b2sum.digest()
    key = digest

    fifo.write(digest)
    fifo.flush()

    cv2.imshow('webcamlamp', frame)
    k = cv2.waitKey(1) & 0xFF
    if k == 27:
        break

fifo.close()
os.remove(webcamfile)
cap.release()
cv2.destroyAllWindows()

As you'll notice in the code, you should point your webcam at a source of either chaotic randomness, like a lava lamp, or quantum randomness, like a plasma globe. Because the frame is whitened with a keyed BLAKE2, it could be considered as a true random number generator, or you could use it as a seed for a cryptographically secure pseudorandom number generator, such as those shipped with modern operating systems. If you do use this as a TRNG, realize that it's slow- it only operates at about 2 KiBps.

Here is a screenshot of the webcam itself looking at a USB desk plasma globe, that you can purchase of ThinkGeek for $10.

Webcam view of a plasma globe in operation.

The data is sent to a FIFO in /tmp/. If you don't do anything with the data, and let the buffer fill, the script will hang, until you read data out of the FIFO. As such, you could do something like this to reseed your CSPRNG (of course, it's not increasing the entropy estimate, just reseeding the generator):

$ < /tmp/webcamrng.fifo > /dev/random

Lava lamps and plasma globes are only the beginning. Anything quantum or chaotic that can be visually observed also works. Things like:

  • Double pendulums
  • Brownian motion
  • Rayleigh-Benard convection
  • CCD noise from the webcam itself
  • A bouncing ball on a sinusoidal vibrating table

So, there you have it. Plasma globes and lava lamps providing sufficiently random data via a webcam, either to be used as a secret seed, or as a TRNG itself. Any other systems that could be used to point a webcam at, or suggestions for improvement in the Python code, let me know in the comments.

]]>
https://pthree.org/2016/06/12/webcam-random-number-generation/feed/ 0
CPU Jitter Entropy for the Linux Kernel https://pthree.org/2016/05/24/cpu-jitter-entropy-for-the-linux-kernel/ https://pthree.org/2016/05/24/cpu-jitter-entropy-for-the-linux-kernel/#comments Wed, 25 May 2016 02:14:23 +0000 https://pthree.org/?p=4663 Normally, I keep a sharp eye on all things cryptographic-related with the Linux kernel. However, in 4.2, I missed something fantastic: jitterentropy_rng.ko. This is a Linux kernel module that measures the jitter of the high resolution timing available in modern CPUs, and uses this jitter as a source of true randomness. In fact, using the CPU timer as a source of true randomness isn't anything new. If you're read my blog for some time, you're already familiar with haveged(8). This daemon also collects CPU jitter and feeds the collected data into the kernel's random number generator.

The main site is at http://www.chronox.de/jent.html and the PDF paper describing the CPU jitter entropy can be found on the site.

So why the blog post about jitterentropy_rng.ko? Because now that we have something in the mainline kernel, we get a few benefits:

  1. More eyes are looking at the code, and can adjust, analize, and refine the entropy gathering process, making sure it's not to aggressive nor conservative in its approach.
  2. We now have something that can collect entropy much earlier in the boot sequence, even before the random number generator has been initialized. This means we can have a properly seeded CSPRNG when the CSPRNG is initialized.
  3. While not available now, we could have a kernelspace daemon collecting entropy and feeding it to the CSPRNG without the need for extra software.
  4. This isn't just for servers, desktops, and VMs, but anything that runs the Linux kernel on a modern CPU, including Android phones, embedded devices, and SoC.
  5. While haveged(8) has been a good solution for a long time, it has been heavily criticized, and it seems development on it has stalled. Here is another software solution for true randomness without the need of potentially dangerous 3rd party USB random number generators.
  6. You don't need Intel's RDRAND. Any modern CPU with a high resolution timer will work. AMD, SPARC, ARM, MIPS, PA-RISC, Power, etc.

As mentioned in the list, unfortunately, loading the kernel doesn't automatically top-off the entropy estimate of the internal state of the CSPRNG (/proc/sys/kernel/random/entropy_avail). As such, /dev/random will still block when the estimate is low or exhausted. So you'll still need to run a userspace daemon to prevent this behavior. The author has also shipped a clean, light userspace daemon that just reads the data provided by the jitterentropy_rng.ko kernel module, and uses ioctl(2) to increase the estimate. The jitterentropy_rng.ko module provides about 10 KBps of random data.

Again, this isn't anything that something like haveged(8) doesn't already have access to. However, by taking advantage of a loaded kernel module, we can ensure that randomness is being collected before the CSPRNG is initialized. So, when CSPRNG initialization happens, we can ensure that it is properly seeded on first boot, minimizing the likelihood that exact keys will be created on distinct systems. This is something haveged(8) can't provide, as it runs entirely in userspace.

Unfortunately, jitterentropy-rngd(8) isn't available in the Debian repositories yet, so you'll need to download the compressed tarball from the author's website, manually compile and install yourself. However, he does ship a systemd(8) service file, which makes it easy to get the daemon up and running on boot with minimal effort.

I've had the jitterentropy_rng.ko module installed with the jitterentropy-rngd(8) userspace daemon running all day today, without haveged(8), and needless to say, I'm pleased. It keeps the CSPRNG entropy estimate sufficiently topped off for software that still relies on /dev/random (please stop doing this developers- start using /dev/urandom please) and provides adequate performance. Near as I can tell, there is not a character device created when loading the kernel module, so you can't access the unbiased data before feeding it into the CSPRNG. As such, I don't have a way to test its randomness quality. Supposedly, there is a way to access this via debugfs, but I haven't found it.

Anyway, I would recommend using jitterentropy_rng.ko and jitterentropy-rngd(8) over haveged(8) as the source for your randomness.

]]>
https://pthree.org/2016/05/24/cpu-jitter-entropy-for-the-linux-kernel/feed/ 3
Weechat Relay With Let's Encrypt Certificates https://pthree.org/2016/05/20/weechat-relay-with-lets-encrypt-certificates/ https://pthree.org/2016/05/20/weechat-relay-with-lets-encrypt-certificates/#comments Fri, 20 May 2016 15:40:26 +0000 https://pthree.org/?p=4658 I've been on IRC for a long time. Not as long as some, granted, but likely longer than most. I've had my hand in a number of IRC clients, mostly terminal-based. Yup, I was (shortly) using the ircII client, then (also shortly) BitchX. Then I found irssi, and stuck with that for a long time. Search irssi help topics on this blog, and you'll see just how long. Then, after getting hired at XMission in January 2012, I switched full-time to WeeChat. I haven't looked back. This IRC client is amazing.

One of the outstanding features of WeeChat is the relay, effectively turning your IRC client into a bouncer. This feature isn't unique- it's in irssi also. However, the irssi proxy does not support SSL (2009). The WeeChat relay does. And with Let's Encrypt certificates freely available, this is the perfect opportunity to use TLS with a trusted certificate.

This post assumes that you are running WeeChat on a box that you can control the firewall to. In my case, I run WeeChat on an externally available SSH server behind tmux. With Let's Encrypt certificates, you will need to provide a FQDN for your Common Name (CN). This is all part of the standard certificate verification procedure. I purchased a domain that points to the IP of that server, and you will need to do the same.

The official Let's Encrypt "certbot" package used for creating Let's Encrypt certificates is already available in Debian unstable. A simple "apt install certbot" will get that up and running for you. Once installed, you will need to create your certificate.

$ certbot certonly --standalone -d weechat.example.com -m aaron.toponce@gmail.com

Per Let's Encrypt documentation, you needs ports 80 and 443 open to the world when creating and renewing your certificate. The execution will create four files:

# ls -l /etc/letsencrypt/
total 24
drwx------ 3 root root 4096 May 19 12:36 accounts/
drwx------ 3 root root 4096 May 19 12:39 archive/
drwxr-xr-x 2 root root 4096 May 19 12:39 csr/
drwx------ 2 root root 4096 May 19 12:39 keys/
drwx------ 3 root root 4096 May 19 12:39 live/
drwxr-xr-x 2 root root 4096 May 19 12:39 renewal/
# ls -l /etc/letsencrypt/live/weechat.example.com/
total 0
lrwxrwxrwx 1 root root 43 May 19 12:39 cert.pem -> ../../archive/weechat.example.com/cert1.pem
lrwxrwxrwx 1 root root 44 May 19 12:39 chain.pem -> ../../archive/weechat.example.com/chain1.pem
lrwxrwxrwx 1 root root 48 May 19 12:39 fullchain.pem -> ../../archive/weechat.example.com/fullchain1.pem
lrwxrwxrwx 1 root root 46 May 19 12:39 privkey.pem -> ../../archive/weechat.example.com/privkey1.pem

The "cert.pem" file is your public certificate for your CN. The "chain.pem" file in the Let's Encrypt intermediate certificate. The "fullchain.pem" file is the "cert.pem" and "chain.pem" files combined. Of course, the "privkey.pem" file is your private key. For the WeeChat relay, it needs the "privkey.pem" and "fullchain.pem" files combined into a single file.

Because the necessary directories under "/etc/letsencrypt/" are accessible only by the root user, you will need root access to copy the certificates out and make them available to WeeChat, which hopefully isn't running as root. Also, Let's Encrypt certificates need to be renewed no sooner than every 60 days and no later than every 90 days. So, not only will you want to automate renewing the certificate, but you'll probably want to automate moving it into the right directory when the renewal is complete.

As you can see from above, I setup my certificate on a Thursday at 12:39. So weekly, on Thursday, at 12:39, I'll check to see if the certificate needs to be nenewed. Because it won't renew any more frequently than every 60 days, but I have to have it renewed every 90 days, this gives be a 30-day window in which to get the certificate updated. So, I'll keep checking weekly. If a renewal isn't needed, the certbot(1) tool will gracefully exit. If a renewal is needed, the tool will update the certificate. Unfortunately, certbot(1) does not provide a useful exit code when renewals aren't needed, so rather than parsing text, I'll just copy the new certs into my WeeChat directory, regardless if they get updated or not.

So, in my root's crontab, I have the following:

39 12 * * 4 /usr/local/sbin/renew.sh

Where the contents of "/usr/local/sbin/renew.sh" are:

#!/bin/bash

certbot renew -q
cat /etc/letsencrypt/live/weechat.example.com/privkey.pem \
    /etc/letsencrypt/live/weechat.example.com/fullchain.pem > \
    ~aaron/.weechat/ssl/relay.pem
chown aaron.aaron ~aaron/.weechat/ssl/relay.pem

Now the only thing left to do is setup the relay itself in WeeChat. So, from within the client:

/relay sslcertkey
/relay add ssl.weechat 8443

You will need port 8443 open in your firewall, of course.

That's it. I have had some problems with certificate caching in WeechatAndroid it seems. So far, I have had to manually restart the relay in WeeChat, and flush the cache in WeechatAndroid and restart it to get the new certificate (I was previously using a self-signed certificate). Hopefully, this can also be automated, so I don't have to manually keep restarting the relay in WeeChat and flushing the cache in WeechatAndroid.

Regardless, this is how you use Let's Encrypt certificates with WeeChat SSL relay. Hopefully this is beneficial to someone.

]]>
https://pthree.org/2016/05/20/weechat-relay-with-lets-encrypt-certificates/feed/ 4
Say Allo To Insecurity https://pthree.org/2016/05/19/say-allo-to-insecurity/ https://pthree.org/2016/05/19/say-allo-to-insecurity/#respond Thu, 19 May 2016 12:45:24 +0000 https://pthree.org/?p=4654 Yesterday, Google announced two new encrypted messaging apps called "Allo" and "Duo". There has been some talk about the security of Allo's end-to-end encryption and incognito mode. Most of it was speculation, until Thai Duong blogged about it. Well, it's time to see what he said, and see if Allo stands up to scrutiny.

"Allo offers two chat modes: normal and incognito. Normal is the default, but incognito can be activated with one touch. I want to stress that both modes encrypt chat messages when they are in transit or at rest. The Allo clients talk to Google servers using QUIC or TLS 1.2. When messages are temporarily stored on our servers waiting for delivery they are also encrypted, and will be deleted as soon as they're delivered."

There are a few things in this paragraph that need some explanation. First, "both modes encrypt chat messages when they are in transit or at rest". This is good, but the devil is in the details. In transit, Thai explains how they're encrypted: "Allo clients talk to Google servers using QUIC or TLS 1.2". This has a couple of ramifications. First, this isn't end-to-end encryption (E2E). This is client-server encryption, which means both the server and the client are encrypting and decrypting the data. As a result, any Google employee with the appropriate privileges can read the messages delivered to Google servers. That's sort of the point of why E2E encryption exists- to prevent this from happening.

Second, kudos for storing the messages encrypted on disk. But, realize that Google has the master key to decrypt these messages if needed. Also, kudos for deleting them off of Google's servers as soon as they're delivered. However, just like VPN service providers promising they don't log your connections, Google promising not to log your message sort of falls into this category. That is, although they might not be storing the messages right now, they may store them later, especially if presented with a warrant from law enforcement. So, Google promising to not store your messages really doesn't amount to much other than maybe they don't want to unnecessarily chew through disk unless forced. Just remember, Google isn't going to go to jail for you, so they will comply with law enforcement.

"In normal mode, an artificial intelligence run by Google (but no humans including the Allo team or anyone at Google) can read your messages. This AI will use machine learning to analyze your messages, understand what you want to do, and give you timely and useful suggestions. For example, if you want to have dinner, it'll recommend restaurants or book tables. If you want to watch movies, it can buy you tickets.

Like it or not, this AI will be super useful. It's like having a personal assistant that can run a lot of errands for you right in your pocket. Of course, to help it help you you'll have to entrust it with your chat messages. I really think that this is fine, because your chat messages are used to help you and you only, and contrary to popular beliefs Google never sells your personal information to anyone."

Herein lies the real reason why E2E is not enabled by default- Google would like to mine your messages, on your phone, and present you with real-time ads. Ads not just on your phone, but likely when you're logged into your Google account on your desktop or laptop as well. If the data is E2E encrypted, this poses a problem for a company that has made Big Bucks off of advertising. With incognito mode, you are enabling E2E encryption, and the AI no longer has access to this data. Application and browser ads become generic, or must get their mining elsewhere. Because Google is allowing an AI to mine Allo messages for targeted ads, could it be possible that this same AI could be mining other data on your phone for the same goal? Could this AI be mining your email, Twitter, Facebook, photos, and other data? Will this AI be shipping solely with Allo, or will it be a separate service in Android N?

While Google might not be selling your data, they are making a percentage of sales that come from ads. The more targeted the ads become, the more likely you are to make a purchase, and the more likely Google will be to get a percentage of that sale. Google isn't selling your data, but they are making money off of it.

"But what if I want to stay off the grid? What if I don't want even the AI or whatever to see my messages?

"That's fine. We understand your concerns. Everybody including me has something to hide. This is why we develop the incognito mode. In this mode, all messages are further encrypted using the Signal protocol, a state of the art end-to-end chat encryption protocol which ensures that only you and your recipients can read your messages."

WhatsApp, acquired by Facebook, and pushing nearly one billion active messaging accounts, recently enabled E2E encryption also with the Signal Protocol. The difference being, with WhatsApp, E2E is default for every account when they update their app. E2E is not default for Allo, and only enabled for incognito mode. So, if "everybody including me has something to hide", then why isn't E2E default with Allo?

Thai then quotes a survey explaining that users want self-destructing messages more than E2E. He explains that survey with (emphasis mine):

"So to most users what matters the most is not whether the NSA can read their messages, but the physical security of their devices, blocking unwanted people, and being able to delete messages already sent to other people. In other words, their threat model doesn't include the NSA, but their spouses, their kids, their friends, i.e., people around and near them. Of course it's very likely that users don't care because they don't know what the NSA has been up to. If people know that the NSA is collecting their dick pics, they probably want to block them too. At any rate, NSA is just one of the threat sources that can harm normal users."

Sure, my threat model is also losing my phone. I find that much more likely than either the NSA confiscating my phone, issuing a warrant to collect my data, or decrypting my traffic in real-time (which isn't practical anyway). However, while the NSA isn't in my threat model, the NSA should be in Google's threat model. In other words, Google should be worrying about the NSA for me.

This is why I created d-note is is running at https://secrets.xmission.com. As a system administrator, I don't want to turn over logs to the NSA or any other organization. As such, the messages are encrypted server-side before stored to disk, and destroyed immediately upon viewing. The goal isn't necessarily to protect the end user, but to protect the server administrator. By legitimately not being able to provide logs or data when a warrant is issued is extremely valuable.

Google should be protecting the "dick pics" of users from getting into the NSA hands. Apple recently made a strong stand here against the FBI regarding Syed Farook's iPhone. Apple technically could not help the FBI, because of the protections that Apple baked into their product. Apple's hands were tied. As such, the FBI wanted to set a precedent about enabling government backdoors into the OS for future releases, so they would no longer be blocked from access. Apple is protecting the "dick pics" of its users from the NSA, FBI, and everyone else. Why isn't Google? As we mentioned earlier, the answer to that question is data mining and advertising revenue.

"This is why I think end-to-end encryption is not an end in itself, but rather a means to a real end which is disappearing messaging. End-to-end encryption without disappearing messaging doesn't cover all the risks a normal user could face, but disappearing messaging without end-to-end encryption is an illusion. Users need both to have privacy in a way that matters to them."

Emphases mine. So, Thai recognizes that disappearing messaging without E2E encryption is an illusion. So, why isn't it default? The higher powers that be, likely. He mentions in his conclusion that he would like E2E to be default, with a single tap. Something of an option with "Always start in incognito", thus always starting with E2E and always having self-destructing messages. However, rather than opt-in, it should be opt-out. If the prior message history is more important to you than the security of E2E encryption and self-destructing messages, then it should be something that you switch. If SnapChat is so popular because of self-destructing massages, and WhatsApp has one billion users with E2E encryption be default, Google, a company larger than both combined, should be able to do the same.

Finally, one point that Thai does not mention in his post. Allo is proprietary closed-source software. From a security perspective, this is problematic. First, because you don't have access to the source, you cannot audit it to make sure it holds up to the security claims that it has. As security and software engineers, not having access to the source code should be a major block when considering the use of non-free software.

Second, without access to the source code, you cannot create reproducible builds. Even if you did have access to the source code, are you sure the binary you have installed matches the binary you can build? If not, how do you know the binary isn't spying on you? Or compromised? Or just compiled incorrectly, causing undesired behavior? Not being able to create reproducible builds of software means not being able to verify the integrity of the shipped binary. Debian is making it a high priority to ship packages with reproducible builds. It's important to Debian, because they want to be transparent with their userbase. If you don't trust Debian is doing what they claim, you can rebuild the binaries and packages yourself, and they should match what Debian has shipped.

I know this sounds very Richard Stallman and GNU, but proprietary closed-source software is scary when it comes to security. While your immediate threat model might just be those you interact with on a daily basis, the immediate threat model to Google, Apple, SnapChat, and others, are well-funded organizations that have legal weight. Ultimately, they're after your data, which in the end, puts them in your threat model. There are no safety or security guarantees with proprietary closed-source software. You are at the mercy of the software vendor to Do The Right Thing, and many companies just don't.

So, while Allo might be the new kid on the block with E2E encrypted and self-destructing messages, as I've shown, it can't be trusted for your security and privacy. You're best off ignoring it, not recommending it to family and friends, and sticking with Free Software alternatives where E2E messages are default.

]]>
https://pthree.org/2016/05/19/say-allo-to-insecurity/feed/ 0
How To Always Encrypt Chromium Saved Passwords On GNU/Linux - No Matter What https://pthree.org/2016/05/01/how-to-always-encrypt-chromium-saved-passwords-on-gnulinux-no-matter-what/ https://pthree.org/2016/05/01/how-to-always-encrypt-chromium-saved-passwords-on-gnulinux-no-matter-what/#respond Sun, 01 May 2016 21:47:03 +0000 https://pthree.org/?p=4641 One of the things that has always bothered me about the Chromium project (the project the Google Chrome browser is based on) is that passwords are encrypted, if and only if your operating system provides an authentication API through your account login. For example, on Windows, is is accomplished through the "CryptProtectData" function. This function uses your existing account credentials when logging into your computer, as a "master key" to encrypt the passwords on your hard drive. For Mac OS X, this is accomplished with Keychain, and with GNU/Linux users, KWallet if you're running KDE or GNOME Keyring if you're running GNOME.

In all those cases, your saved passwords will be encrypted before getting saved to disk. But, what if you're like me, and do not fall into any of those situations? Now, granted, GNU/Linux and BSD users (you're welcome) make up about 3% of the desktop installs.

Graph showing operating system market share.

Of that 3%, although I don't have any numbers, maybe 2/3 run GNOME or KDE. That leaves 1 out of every 100 users where Chromium is not encrypting passwords on disk by default. For me, who lands in that 1%, this is unacceptable. So, I wanted a solution.

Before I go any further, let me identify the threat and adversary. The threat is offline disk analysis. I'm going to assume that you're keeping your operating system up-to-date with the latest security patches, and that your machine is not infected with malware. Instead, I'm going to assume that after you are finished using your machine, upgrading the hardware, or a hard drive fails, that the disk is discarded. I'm further going to assume that you either can't or didn't digitally wipe or physically destroy the drive once decommissioned. So, the threat is someone getting a hold of that drive, or laptop, or computer, and imaging the drive for analysis. This means that our adversary is a global adversary- it could be anyone.

Now, the obvious solution would be to run an encrypted filesystem on that drive. dm-crypt with or without LUKS makes this possible. But, let's assume you're not running FDE. Any options? In my case, I run eCryptfs, and store the Chromium data there, symbolically linking to it from the default location.

By default, Chromium stores its passwords in ~/.config/chromium/Default/Login\ Data. This is an SQLite 3.x database, and as mentioned, the passwords are stored in plaintext. A simple solution is to create an eCryptfs private directory, and symlink the database to that location. However, Chromium also stores cookies, caches, and other data in ~/.config/chromium/ that might be worth encrypting as well. So, you can just symlink the entire ~/.config/chromium/ directory to the eCryptfs mount.

I'll assume you've already setup eCryptfs and have it mounted to ~/Private/. If not, run the "ecryptfs-setup-private" command, and follow the prompts, then run "ecryptfs-mount-private" to get it mounted to ~/Private/.

Make sure Chromium is not running and move the ~/.config/chromium/ directory to ~/Private/. Then create the necessary symlink, so Chromium does not create a new profile:

$ mv ~/.config/chromium/ ~/Private/
$ ln -s ~/Private/chromium/ ~/.config/

At this point, all your Chromium data is now stored in your eCryptfs encrypted filesystem, and Chromium will follow the symlink, reading and writing passwords in the encrypted mount. This means, no matter if using KWallet or GNOME Keyring, or nothing at all, your passwords will be always be encrypted on disk. Of course, in the SQLite 3.x database, the passwords are still in plaintext, but the database file is encrypted in eCryptfs, thus giving us our security that we're looking for.

However, there is a caveat which needs to be mentioned. The entire security of the encryption rests solely on the entropy of your eCryptfs passphrase. If that passphrase does not have sufficient entropy to withstand a sophisticated attack from a well-funded organization (our global adversary), then all bets are off. Essentially, this eCryptfs solution is acting like a "master password", and all encryption strengths rests on your ability to use a strong password defined by Shannon entropy. Current best-practice to guard against an offline password cracking attack, is to pick a password with at least 128-bits of entropy. You can use zxcvbn.js from Dropbox to estimate your passphrase entropy, which I have installed at http://ae7.st/ent/ (no, I'm not logging passphrases- save the page offline, pull your network cable and run it locally if you don't believe me).

]]>
https://pthree.org/2016/05/01/how-to-always-encrypt-chromium-saved-passwords-on-gnulinux-no-matter-what/feed/ 0
Opera, VPNs, and Security https://pthree.org/2016/04/22/opera-vpns-and-security/ https://pthree.org/2016/04/22/opera-vpns-and-security/#comments Fri, 22 Apr 2016 13:30:43 +0000 https://pthree.org/?p=4635 Yesterday, Opera announced that they are bundling a VPN with the latest release of their browser. This is what the release says:

Why we are adding free VPN in Opera

Bringing this important privacy improvement marks another step in building a browser that matches up to people’s expectations in 2016. When you think about it, many popular options offered by desktop browsers today were invented (quite frequently by Opera) many years ago. The innovation energy in the industry has been recently so focused on mobile, even if the desktop is still thriving.

In January, we were reviewing our product plans, and we realized that people need new features in order to browse the web efficiently in 2016. It also became apparent to us that what people need are not the same features that were relevant for their browsers ten years ago. This is why we today have more engineers than ever before working on new features for our desktop browser.

So far we have the native ad blocker. And, we’re introducing another major feature in just a matter of a few weeks; a native, unlimited and free VPN client, right inside your browser!

Enhanced privacy online with Opera’s free VPN

According to Global Web Index*, more than half a billion people (24% of the world’s internet population) have tried or are currently using VPN services. According to the research, the primary reasons for people to use a VPN are:

– To access better entertainment content (38%)
– To keep anonymity while browsing (30%)
– To access restricted networks and sites in my country (28%)
– To access restricted sites at work (27%)
– To communicate with friends/family abroad (24%)
– To access restricted news websites in my country (22%)

According to the research, young people are leading the way when it comes to VPN usage, with almost one third of people between 16-34 having used a VPN.

Better than traditional VPNs

Until now, most VPN services and proxy servers have been limited and based on a paid subscription. With a free, unlimited, native VPN that just works out-of-the-box and doesn’t require any subscription, Opera wants to make VPNs available to everyone.

That’s why Opera’s built-in free VPN feature is easy to use. To activate it, Mac users just need to click the Opera menu, select “Preferences” and toggle the feature VPN on, while Windows and Linux users need to go to the “Privacy and Security” section in “Settings” and enable VPN there. A button will appear in the browser address field, from which the user can see and change location (more locations will appear later), check whether their IP is exposed and review statistics for their data used. It’s free and unlimited to use, yet it offers several must-have options available in paid VPNs, such as:

  • Hide your IP address – Opera will replace your IP address with a virtual IP address, so it’s harder for sites to track your location and identify your computer. This means you can browse the web more privately.
  • Unblocking of firewalls and websites – Many countries, schools and workplaces block video-streaming sites, social networks and other services. By using a VPN you can access your favorite content, no matter where you are.
  • Public Wi-Fi security – When you’re surfing the web on public Wi-Fi, intruders can easily sniff data. By using a VPN, you can improve the security of your personal information.

There were a couple things that stuck out to me rather quickly when reading this press release:

  1. Is it a true VPN, or just an HTTP proxy?
  2. If either a VPN or an HTTP proxy, how is it handling DNS requests?
  3. If an HTTP proxy, is the request through a transparent TLS connection to Opera?
  4. Why is the press release specifically absent about logs and tracking?

Well, some of these questions have been answered. First, it's not a true VPN. Instead, it's just an HTTP/HTTPS proxy. Here's the details:

How the “VPN” works

Once the user enables the feature in settings, Opera VPN sends API requests to https://api.surfeasy.com to obtain credentials and proxy IPs. The browser then talks to a proxy like de0.opera-proxy.net, and its IP address can only be resolved from within Opera when the VPN feature is turned on. It’s an HTTP/S proxy that requires authentication.

When the Opera browser with enabled VPN loads a page, it sends many requests to de0.opera-proxy.net with a Proxy-Authorization request header.

The Proxy-Authorization header decoded:
CC68FE24C34B5B2414FB1DC116342EADA7D5C46B:9B9BE3FAE67
4A33D1820315F4CC94372926C8210B6AEC0B662EC7CAD611D86A3

Since we’re talking about a proxy, these credentials can be used with de0.opera-proxy.net even when connecting from a different machine. This means that if you use the proxy on a computer with no Opera installed, you’ll get the same IP as when using Opera’s VPN.

From this, we can learn that it's not a VPN at all. In fact, it's not even deploying a TLS tunnel for the HTTP/S proxy. So, traditional HTTP requests will still be in the clear, just with a different target. So while a school or library might be filtetring requests based on DNS, this HTTP/S proxy in Opera doesn't address more active smart filtering based on content.

Unfortunately, Help Net Security also suggests the use of a general VPN service provider (emphasis mine):

"What Opera offers is not a VPN as such. It's just a proxy for the browser. You still need a full VPN if privacy is what you care about (and you should care about your privacy). Other tools you use, including for example email clients like Outlook, won’t use this 'VPN'," Špaček told Help Net Security.

VPN service providers are scary. Sven Slootweg posted a "Don't use VPN services" Gist where he addresses some real concerns with using VPN service providers (I don't agree with a couple points):

  • VPN service providers log connections and other metadata.
  • VPN service providers have full accounting and payment information of their customers.
  • VPNs really are just glorified proxies, and don't provide any meaningful security or privacy.
  • VPNs don't obfuscate your IP address like Tor, and your IP address is meaningless to trackers anyway.
  • VPN service providers exist, because it's easy money.

I don't fully agree with a couple points (IP addresses are extremely valuable to trackers), but I think the overall topics Sven is trying to drive home, are the following: know how VPNs work, who has access to data at the VPN endpoint(s), and your security and privacy risks when using a VPN. There are valid times when using a VPN. Data is encrypted between your VPN client and the provider, so it is an easy way to get around restrictive firewalls, which you would think Opera would be trying to address with their HTTP/S proxy. You may also need to access your corporate internal network when "on the road", in which case using your corporate VPN server is needed.

But in both cases, understand the security and privacy concerns when using the VPN. Your VPN provider isn't going to go to jail for you. If the FBI catches unsavory traffic coming out of the VPN provider, you can rest assured they'll give the authorities all the logging, account, and payment information to comply with the request. You can rest assured that if your employer catches you breaking policy with the VPN, you will lose your VPN access, and possibly your job.

So, what to do? Well, realistically, if you want to obfuscate your traffic dynamically, security, and pseudoanonymously, then use Tor. Install a Tor client on your machine, install a Tor proxy extension in your browser, and when you want to get around restrictive firewalls, flip the proxy switch, and get on Tor.

Of course, Tor isn't a security and privacy panacea. You still need to understand the risks associated with using Tor. For example, the extension you installed may not tunnel DNS requests through Tor. Of course, HTTP traffic is still in the clear when it leaves the Tor exit relay. Tor clients and extensions may contain vulnerabilities that reveal metadata about you. Basically, don't be ignorant or stupid with your Tor connection.

Regardless, I think we can take a few things away from this post:

  1. The Opera VPN is just an HTTP/S proxy.
  2. Opera is very likely logging all your traffic.
  3. Your Opera VPN browsing habits are likely unique enough for Opera to identify you.
  4. VPN service providers should be avoided.
  5. VPN service providers also are likely logging all your traffic.
  6. Your VPN service provider won't go to jail for you.
  7. When in doubt, use Tor, just understand the risks.
]]>
https://pthree.org/2016/04/22/opera-vpns-and-security/feed/ 2
Tor and the CloudFlare Problem https://pthree.org/2016/04/17/tor-and-the-cloudflare-problem/ https://pthree.org/2016/04/17/tor-and-the-cloudflare-problem/#comments Sun, 17 Apr 2016 19:29:32 +0000 https://pthree.org/?p=4625 Before I go anywhere with this post, let me make three things very clear:

  1. I do not work for CloudFlare.
  2. I work for a small local ISP in Utah.
  3. I have been using Tor probably almost as long as many of you have been alive.

I first blogged about Tor in 2006. I had discovered it around 2004, only a couple of years after it's first release. I had used it as a way to prevent my ISP and my employer from tracking what I'm doing with my Internet connection. I would setup a simple SOCKS proxy in Firefox, then switch to it when I wanted to get on the Tor network, and switch away from it when I didn't. Oh, and you think latencies are bad on Tor now? You should have been on it back then.

Here is a metrics graph showing the time it took to download a 50 KiB file over the Tor network. Unfortunately, they don't have the data back when I started using the network, but you get a rough idea of what it was like:

Graph from metrics.torproject.org showing the latencies of downloading a 50 KiB file.

This makes a good deal of sense, because back then, ISPs didn't provide a lot of bandwidth to customers (it can be argued they still don't), and there wasn't a lot of exit nodes in the Tor network to handle the bandwidth (again, it can be argued there still isn't enough):

Graph from metrics.torproject.org showing the bandwidth of of relays in the network.

Graph from metrics.torproject.org showing the size of the Tor network

Spend some time on metrics.torproject.org looking over the historical data, and you'll get a good sense that using Tor in 2004 was a lot like getting data over dial-up. It was anything but pleasant.

What's the point? The point is, that while things can still be improved (we need more exit nodes, and we need more bandwidth on each exit node), the Tor network latencies, bandwidth, and relays is in a good position compared to 12 years ago when I started using it. So running large-scale attacks through the network is now practical.

So, where does CloudFlare fit into this? CloudFlare deploys solving captchas when you wish to consume a service behind the CloudFlare CDN. For example, while connected to Tor, visit medium.com, and you will be presented with a captcha, similar to something like this:

Screenshot of Chromium showing the need to solve a visual coptcha while trying to browse medium.com while connected to Tor.

This has gotten a lot of criticism from the "cypherpunk" millennials who feel that Tor access should be unrestricted. If you follow the "#dontblocktor" hashtag on Twitter, you will see the continued repeated criticism of CloudFlare deploying these captchas to Tor users on their CDN. Some of the arguments include:

  • Solving the captcha may only bring up another, repeatedly, never being able to consume the website in question.
  • Visually impaired users cannot solve the visual captchas.
  • Non-native English speakers will not be able to solve the audio version of the cpatcha.
  • People using browsers that disable JavaScript will not be able to reach the page.
  • There may be other security concerns where the choice of Tor is preferred over not using Tor.

No doubt, all captchas on the Web should be reconsidered. Personally, for JavaScript enabled browsers, I think forcing a proof-of-work puzzle onto the browser is transparent, and provides exactly the sort of rate-limiting needed for mitigating large-scale malicious attacks. For non-javascript puzzles, captchas seem to be the best alternative. But, I'm sure as a society, can can find alternatives to non-javascript browsers (such as network-based proof-of-work puzzles).

No doubt physical limitations, such as visual or audible impairments, can make solving a visual and audible captcha challenging, if not impossible. I don't have good solutions here except for JavaScript-based proof-of-work puzzles. But the real question that need to be addressed, is why is CloudFlare deploying captchas for Tor users?

CloudFlare addressed this due to the on-going criticism a select few on Twitter have giving the company. The blog post "The Trouble with Tor" basically comes down to the following:

  1. You must pick two between: security, anonymity, and convenience.
  2. CloudFlare is a large CDN that deals regularly with malicious traffic sourced from Tor exit relays.
  3. Captchas are a compromise, allowing Tor users to remain anonymous, while also getting access to the website.
  4. A CloudFlare CDN customer has an option in their control panel to whitelist Tor or captcha Tor.
  5. CloudFlare is investigating "blind token" proof-of-work client puzzles for something long-term.

I don't see anything unreasonable here. As a system administrator and security engineer for XMission, I understand and sympathize with CloudFlare's stance toward captachas, even if I don't agree with the implementation of the captcha itself. I have had to fight off malicious Tor traffic from our network many times during my employment, such as DNS and NTP amplification attacks, HTTP POST DDoS attacks, SQL injection and XSS attacks, and many others.

So, even as CloudFlare put it in their reasonable post, how do you allow honest Tor users with high degrees of convenience to consume the website while also minimizing and proactively mitigating malicious Tor traffic?

Again, I don't care for captchas, and wish they would die in a fire. But, what should CloudFlare do? Should they abandon the captcha altogether? If so, how should they proactively prevent malicious Tor traffic from negatively impacting their customer base? It's easy and knee-jerky to post screenshots to Twitter with the "#dontblocktor" hashtag, and shame CloudFlare and the customer using the CDN. I don't think that's the right approach, personally (nevermind that a captcha isn't a block (yes, semantics are important)). I'm curious how many of those who are reacting to CloudFlare captchas are actual system or network administrators that have to deal with these attacks. Instead, I would try to architect solutions to the problem.

Personally, I see the following:

  • Consume CloudFlare without Tor. There are no captchas, but you sacrifice a level of anonymity.
  • Consume CloudFlare behind Tor, but understand the compromise you are making to solve captchas sacrificing convenience.
  • Consume CLoudFlare beind a VPN, thus providing both anonymity and convenience.

If it really bothers you that you have to solve a captcha to reach a CloudFlare website, then rather than shaming CloudFlare, it might be worth your time to reach out to the site operator, and let them know about whitelisting Tor. If they engage in conversation, they may not have been aware of the configuration option, or they may have reasons why they want you to solve the captcha. Either way, you've come out ahead without the knee-jerking of #dontblocktor.

I guess in conclusion, while I hate captchas as much as the next guy, what would you do if you were employed by CloudFlare and in charge of this problem? What is a reasonable solution to keeping customers happy by mitigating malicious Tor traffic while also allowing honest Tor users to consume the website with high levels of convenience? Let's engage in discussion about how to create and architect these solutions, so we get as many people happy as possible- CloudFlare network admins, customers, and clients.

A final note about the term "block". The CloudFlare captcha is not blocking you from the reading the website. Instead, it's rate-limiting you. Some will argue that you get caught in endless captcha loops, consistently solving them over and over, never to actually reach the service. Personally, I have never encountered this, but others swear it exists. At most, I've had to solve 3 captchas in a row, usually because I did not solve them quick enough. I guess the effect is the same, but as already mentioned, the "#dontblocktor" hash tag is a knee-jerk, and incorrectly placed. Semantics are important, because CloudFlare is not actually blocking Tor, like Akamai does with "Access denied". It's one thing to provide a 502 HTTP error, it's quite another to rate limit requests.

]]>
https://pthree.org/2016/04/17/tor-and-the-cloudflare-problem/feed/ 1
Two OCB Block Cipher Mode Patents Expired Due To Nonpayment https://pthree.org/2016/03/31/two-ocb-block-cipher-mode-patents-expired-due-to-nonpayment/ https://pthree.org/2016/03/31/two-ocb-block-cipher-mode-patents-expired-due-to-nonpayment/#comments Fri, 01 Apr 2016 04:21:50 +0000 https://pthree.org/?p=4615 Peter Gutmann on the "[Cryptography]" mailing list wrote some thoughts about the impending crypto monoculture of all-things-Bernstein that seems to be currently sweeping the crypto world. In his post, he mentions the following (emphasis mine):

The remaining mode is OCB, which I'd consider the best AEAD mode out there (it shares CBC's graceful-degradation property in which reuse or misuse of the IV doesn't lead to a total loss of security, only the authentication property breaks but not the confidentiality). Unfortunately it's patented, and even though there are fairly broad exceptions allowing it to be used in many situations, the legal minefield that ensues makes it untouchable for most potential users. For example does the prohibition on military use cover the situation where an open-source crypto package is used in a vendor library that's used in a medical insurance app that's used by the US Navy, or where banking transactions protected by TLS may include ones of a military nature (both of these are actual examples that affected decisions not to use OCB). Since no-one wants to call in lawyers every time a situation like this comes up, and indeed can't call in lawyers when the crypto is several levels away in the service stack, OCB won't be used even though it may be the best AEAD mode out there.

Dr. Matthew Green also wrote about authenticated encryption and block cipher modes. He had this to say about OCB mode (emphasis mine):

In performance terms Offset Codebook Mode blows the pants off of all the other modes I mention in this post. It's 'on-line' and doesn't require any real understanding of Galois fields to implement** -- you can implement the whole thing with a block cipher, some bit manipulation and XOR. If OCB was your kid, he'd play three sports and be on his way to Harvard. You'd brag about him to all your friends.

I've known that OCB mode was patented, and as a result, why it has not been included in OpenSSL and other cryptographic protocol implementations. Peter said it correctly, it is a legal minefield. However, I wanted to read up on the patents, their design, operation, etc., mostly because I wanted to get out of doing the dishes. Discover my shock when I stumbled upon the following:

  • Patent 7,046,802 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Lapsed
  • Patent 7,200,227 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Lapsed

Not fully understanding what "Lapsed" means, I went to the official source: The United States Patent and Trademark Office website. I searched for those two patent numbers, and got the following:

  • Patent 7,046,802 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Patent Expired Due to NonPayment of Maintenance Fees Under 37 CFR 1.362
    • Status Date: 06-06-2014
  • Patent 7,200,227 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Patent Expired Due to NonPayment of Maintenance Fees Under 37 CFR 1.362
    • Status Date: 05-04-2015

Sure enough, Phillip Rogaway's first two patents regarding the OCB block cipher mode of encryption are expired due to nonpayment. I had to tweet this:

Patents 7949129 (Method and apparatus for facilitating efficient authenticated encryption) and 8321675 (Method and apparatus for facilitating efficient authenticated encryption) are still valid however. I'm not sure how this applies to the Charanjit Jutla's IAPM mode patents now owned by IBM. Also, I don't know exactly what OCB modes patents 7,046,802 and 7,200,227 cover. OCB1 and OCB2? if someone can comment here, that would be great.

So, what does this mean for the cryptography world? It means that OCB covered by those two patents can now be implemented royalty-free, without fear of legal entanglements, in Free Software as well as proprietary and commercial software. OpenSSL, LibreSSL, BoringSSL, OpenPGP, Open Whisper Systems Signal, and so many other protocols, projects, and software should be able to implement OCB now.

All because Phillip Rogaway did not make the payments necessary to keep the patent valid. Two more software patents bite the dust.

]]>
https://pthree.org/2016/03/31/two-ocb-block-cipher-mode-patents-expired-due-to-nonpayment/feed/ 3
Linux Kernel CSPRNG Performance https://pthree.org/2016/03/08/linux-kernel-csprng-performance/ https://pthree.org/2016/03/08/linux-kernel-csprng-performance/#comments Wed, 09 Mar 2016 02:34:23 +0000 https://pthree.org/?p=4606 I'm hardly the first one to notice this, but I was having a discussion in ##crypto on Freenode about the Linux kernel CSPRNG performance. It was mentioned that the kernelspace CSPRNG was "horrendously slow". Personally, I found the performance sufficient for me needs, but I decided to entertain his definition. I'm glad I did; I wasn't disappointed.

Pull up a terminal, and run the following command, passing 10GB of data from /dev/urandom to /dev/null:

$ dd if=/dev/urandom of=/dev/null bs=1M count=1024 iflag=fullblock  
1024+0 records in  
1024+0 records out  
1073741824 bytes (1.1 GB) copied, 80.1537 s, 13.4 MB/s
$ pv < /dev/urandom > /dev/null # cancel in a different terminal, unless you have "-S"
1.02GB 0:01:20 [13.3MB/s] [                   < =>                              ]

13.4 MBps of throughput for reading data directly out of the kernelspace CSPRNG. But, can we do better?

In the ##crypto channel, and as should be across development mailing lists, forums, groups, and discussion channels, I recommend that developers should not generally develop their own userspace CSPRNG. There are all sorts of pitfalls and traps waiting for you when you attempt it. Unless you know what you're doing, you could end up with a CSPRNG that isn't actually cryptographically secure (the "CS" in "CSPRNG").

However, what happens when I do actually run a userspace CSPRNG on the same machine? What can I expect out of performance? For example, I could implement AES-128 in CTR mode as a CSPRNG. In fact, we can do this with OpenSSL:

$ dd if=/dev/zero bs=10M count=1024 iflag=fullblock 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 15.3137 s, 701 MB/s
$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null
31.9GB 0:00:34 [ 953MB/s] [                                  < =>               ]

700-950 MBps (notice that dd(1) incurs a performance penalty). That's 52-70x the speed of reading the kernelspace CSPRNG directly. That's more than a full order of magnitude faster. However, this is on a box with AES-NI. What about disabling AES-NI on the same box? How badly does it damage performance, and how does it compare to reading the kernelspace CSPRNG? We can use OpenSSL speed(1SSL) to benchmark algorithms.

First, with AES-NI enabled:

$ openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr     468590.43k  1174849.02k  1873606.83k  2178642.60k  2244471.47k

And with AES-NI disabled:

$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)  
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      74272.21k    83315.43k   340393.30k   390135.47k   391279.96k

In this case, we see about a 5x performance improvement when using the AES-NI instruction set as compared to when not using it. That's significant. And even with AES-NI disabled in userspace, we're still outperforming /dev/urandom by almost 30x.

Interestingly enough, even the OpenBSD CSPRNG (different hardware than previously tested), which uses ChaCha20, outperforms the Linux CSPRNG (although its userspace CSPRNG with openssl(1) doesn't outperform kernelspace):

% dd if=/dev/urandom of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 13.630 secs (78775541 bytes/sec)
% dd if=/dev/zero bs=1M count=1024 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes transferred in 33.498 secs (32052998 bytes/sec)
% openssl speed -elapsed -evp aes-128-ctr 2> /dev/null
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      41766.37k    46930.74k    49593.54k    50669.32k    50678.33k

Roughly 78 MBps for OpenBSD on an Intel Xeon CPU running at 2.80GHz. Basically, six times the speed of the Linux kernel CSPRNG on an Intel Xeon CPU running at 2.67GHz.

So why is the Linux CSPRNG so slow? And, what can we do about it? Well, first, the kernel is using SHA-1 for its cryptographic primitive. In very loose terms, the CSPRNG hashes the input pool with SHA-1, and spits out the output to /dev/urandom. It's output is also its input, so its digesting its own output.

But, that's not all it's doing actually. The first function actually adds data into the input pool without increasing the entropy estimate. Then, after adding those bytes, the input pool is mixed with a Skein-like mixing function. Then some math is done to credit the entropy estimator, and the system is polled for data to add to the input entropy pool. Things like disk IO, CPU timings, interrupts, and user activity. Finally, we're ready to hash the data. This is done by extracting the data out of the input pool, and hashing it with SHA-1. But, we don't want any recognizable output, so the output is left-rotated and folded in half. Then, and only then, is the data ready for consumption.

W.T.F.

Unfortunately, the Linux kernel CSPRNG is not based on any sound theoretical security design. It's very much a hodge-podge home-brew design by developers who think they know what they're doing, when in reality, they don't. In 2013, a security audit and analysis was performed on the Linux kernel CSPRNG (PDF), and concluded that not only is it not robust, but it has some weaknesses:

In the literature, four security notions for a PRNG with input have been proposed: resilience (RES), forward security (FWD), backward security (BWD) and robustness (ROB), with the latter being the strongest notion among them.

(...snip...)

Distributions Used in Attacks based on the Entropy Estimator As shown in Section 5.4, LINUX uses an internal Entropy Estimator on each input that continuously refreshes the internal state of the PRNG. We show that this estimator can be fooled in two ways. First, it is possible to define a distribution of zero entropy that the estimator will estimate of high entropy, secondly, it is possible to define a distribution of arbitrary high entropy that the estimator will estimate of zero entropy. This is due to the estimator conception: as it considers the timings of the events to estimate their entropy, regular events (but with unpredictable data) will be estimated with zero entropy, whereas irregular events (but with predictable data) will be estimated with high entropy.

(...snip...)

As shown in Section 5.7, it is possible to build a distribution D0 of null entropy for which the estimated entropy is high (cf. Lemma 3) and a distribution D1 of high entropy for which the estimated entropy is null (cf. Lemma 4). It is then possible to mount attacks on both /dev/random and /dev/urandom, which show that these two generators are not robust.

(...snip...)

We have proposed a new property for PRNG with input, that captures how it should accumulate the entropy of the input data into the internal state. This property actually expresses the real expected behavior of a PRNG after a state compromise, where it is expected that the PRNG quickly recovers enough entropy. We gave a precise assessment of Linux PRNG /dev/random and /dev/urandom security. In particular, we prove that these PRNGs are not robust. These properties are due to the behavior of the entropy estimator and the mixing function used to refresh its internal state. As pointed by Barak and Halevi [BH05], who advise against using run-time entropy estimation, we have shown vulnerabilities on the entropy estimator due to its use when data is transferred between pools in Linux PRNG. We therefore recommend that the functions of a PRNG do not rely on such an estimator.

Finally, we proposed a construction that meets our new property in the standard model and we showed that it is noticeably more efficient than the Linux PRNGs. We therefore recommend to use this construction whenever a PRNG with input is used for cryptography.

TL;DR? The Linux CSPRNG does not meet the definitions of a secure CSPRNG per the PDF. It's not that it's theoretically broken, it's just not theoretically secure either. It's really nothing theoretically at all. This isn't great.

A replacement for random.c in the kernel would be to ditch the homebrew entropy collection, mixing, and output mangling, and instead, stick with AES-128 in CTR mode. Of course, as per the PDF, the entropy collectors need serious work, but if AES-128-CTR was deployed as the CSPRNG instead of SHA-1, then the generator could take advantage of hardware AES performance, which as I've shown, is exceptionally superior. It's frustrating, because the kernel already ships AES, so the code is already there. It's just not being utilized.

The Linux kernel could have 1 GBps in CSPRNG output, but is deliberately choosing not to. That's like having a V12 turbo-charged sleeper, without the turbo, and only firing on 3 of the 12 cylinders, with a duct taped muffler on the back.

Why does 1 GBps of performance matter? How about wiping hard drives or secure data removal in general? With 20 MBps, we can't even saturate a single drive in IOPS. With 1 GBps, we could saturate many simultaneously. As someone who wipes old employee workstations when they leave the company, backup servers with dozens of drives, or old decommissioned hardware, I see great benefit here.

Or, how about HTTPS web sites for a shared web hosting provider? I have seen countless times HTTPS and SSH connections lag due to waiting on the CSPRNG. Not that it's being intentionally blocked, but because the load is so intense on the server, it just can't generate enough cryptographic randomness to keep up with requests.

I'm sure there are plenty of other examples where end userspace applications could benefit with improved performance of the CSPRNG. And, as shown, it can't be that difficult to implement correctly. The real question is, of course, who will do the work and submit the patch?

]]>
https://pthree.org/2016/03/08/linux-kernel-csprng-performance/feed/ 1
Cryptographic Hashing, Part I- Introduction https://pthree.org/2016/03/07/cryptographic-hashing-part-i-introduction/ https://pthree.org/2016/03/07/cryptographic-hashing-part-i-introduction/#comments Tue, 08 Mar 2016 01:44:25 +0000 https://pthree.org/?p=4586 Introduction

Lately, I've been seeing some discussion online about cryptographic hashing functions, along with some confusion between a cryptographic digest, a cryptographic signature, and a message authentication codes. At least in that last post, I think I did well defining and clarifying the differences between those terms, but I also feel like I could take this discussion a lot further. So, I decided to dedicate a series to generic cryptographic hashing functions, which will include building compression frameworks with security proofs, specific implementations of cryptographic hashing functions, and some implementations of these functions. So, without further ado, let's get started.

Collisions

When we talk about a hashing function (cryptographic or otherwise), we are referring to any function that can take an arbitrary length of data, and compress it into a fixed-length digest. Typically, this digest is called a "fingerprint", a "checksum", or a "hash". The goal, is that any time we input the same data, our function outputs the same digest. Further, it's important that not only can I produce that digest, but anyone can produce the same digest. This gets us prepared for the Random Oracle, but we still have some ground to cover first.

Because our hashing function has a fixed length output, say 128-bits, then an ideal function would map every input to one of those outputs. In other words, our function maps an element in the domain (our data to be hashed) to exactly one element in the range (our actual hash). So, if our function produces 128-bit digests, then there are a total of 2^128 digests in the range. This means, that we have at least a one-to-one mapping of elements in the domain to elements in the range. Again, speaking about an ideal hashing function.

However, we know that there are many more inputs than just 2^128; there are infinitely many, actually. But think about it for a second. Take the number zero, and send it through our hashing function. Increment that number by 1, then hash that number. Continue in this manner, assuming infinite computing resources and infinite time, until you've hashed every number between 0 and 2^128. Ideally, you've produced exactly 2^128 unique digests. But, what happens when you now want to hash 2^128+1? Now we have what is called a collision. In other words, two distinct inputs was hashed to the same output. To put it formally:

Definition: A collision is when two distinct pieces of data hash to the same digest, checksum, or fingerprint.
Theorem: For any fixed-length hashing function, there are infinitely many collisions.
Proof: This can be proven using the pigeon-hole principle. Given a fixed-length hashing function of n-bits of output, hashing n+1 inputs from the domain will produce a collision in the range. As n tends to infinity, the collisions tend to infinity. Q.E.D.

I don't think I need to tell you how much larger infinity is to 128-bits. As a result, collisions are overwhelming. In fact, would you like to see a collision in practice? Below are 2 different hexadecimal strings. The differences are very subtle, but they indeed distinct (emphasized in bold red). Here, we'll take the two strings, and hash them with the known MD5 algorithm. Then, just to show I'm not cheating, we'll hash the same strings with SHA-1. While we produce a collision in MD5, we have distinct digests with SHA-1. Go ahead, and verify that you get the same results.

$ INPUT1=d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89\
55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b\
d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0\
e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
$ INPUT2=d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89\
55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b\
d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0\
e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
$ printf "$INPUT1" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT2" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT1" | xxd -r -p | sha1sum
a34473cf767c6108a5751a20971f1fdfba97690a  -
$ printf "$INPUT2" | xxd -r -p | sha1sum 
4283dd2d70af1ad3c2d5fdc917330bf502035658  -

Crazy, right? With an ideal hashing function, it should be at least as difficult as a brute force search to find these collisions, and it should take searching an entire 128-bit domain to find a collision. Unfortunately, however, finding blind collisions with a brute force search turns out to be much faster, thanks to the Birthday Paradox. The Birthday Paradox says the following:

In a room of just 23 people, there is a 50% probability that at least two of them share the same birthday. In a room of just 75 people, there is a 99.9% probability that at least two of them share the same birthday.

Wait, what? Uhm, last I checked, there are 366 days days in a year, assuming leap year. Soooo, if there are 23 people in a room, then there should be a 23/366, or about a 6% probability that two people share the same birthday. Unfortunately, this isn't how it works. There may be a 6% chance someone shares your birthday, but there is a 50% chance two arbitrary people share the same birthday. Now do you see the problem? Not only must you compare your birthday to everyone, but so must everyone else. This is a case of permutations. So, with 23 people in the room, there are actually 253 possible comparisons that must be made (23*22/2). The math gets a little hairy, and to be honest, it's a bit outside the scope of this post, and this series (it's going to be long enough as it is). Refer to the Wikipedia article if you want to work through the theory and the proof.

We can use this Birthday Paradox to work out an attack on finding two distinct inputs that produce an identical digest. This is called the Birthday Attack, and it's the primary driver in finding collisions. The attack basically says something like this:

To find a collision in a n-bit range with approximately 50% probability, you need to only search the the square root of 'n' of elements in the domain.

So, for a 128-bit digest (2^128 possible distinct outputs), using the Birthday Attack, I only need to search 2^64 possible inputs to have approximately a 50% probability that I have found a collision. If you don't think 2^64 is very small, the bitcoin network is currently mining 2^64 SHA-256 digests about every 20 seconds.

Blind, preimage, and second preimage collisions

Armed with this knowledge, we can now formalize some definitions of collision attacks. This might be confusing, so I'll define it first, then give some examples.

Collision attack:
A blind search, where two distinct inputs produce the same digest.
Preimage attack:
A search to find an input that matches a defined digest.
Second preimage attack:
A search to find a second file that matches the digest of a defined file.

Let's break these down individually. A collision search is literally a blind search, without any respect to inputs or outputs. You don't know what the inputs will be nor do you know what their outputs will be. You only know that you have found two distinct inputs that collide to the same output, all of which is entirely arbitrary.

A preimage attack is where you have a digest in your possession, but you would like to find an input that matches it. In this case, while the input is completely arbitrary, the output is static. For example, suppose you have the 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9". A preimage attack would be successfully finding the input that produced it. In this case, it was "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz".

A second preimage attack means having both the input "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz" and its 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9", and finding a second input that produces that same digest.

Usually, when breaking cryptographic hash functions, the first thing to break is the compression function, which I'll cover in later posts. Once the compression function is broken, the next step is to break searching for blind collisions. This is generally done by analyzing the weaknesses in mathematics, find bias in the output, observe the quality of the avalanche effect, and so forth. You eventually learn where the hashing function is weak, and where you can take "shortcuts" to get to your goal. Eventually, the algorithm is broken to the point that finding blind collisions is practical. MD5 is broken in this regard.

After breaking the compressing function, and weakening the algorithm to the point of practical collision attacks, preimage attacks become the next focus of analysis. However, when the compression function is broken, such as in the case of SHA-1, it's a strong sign to start moving away from the algorithm, long before you find collisions. So, analysis tends to slow down after collisions have been found, because no one should be using the function anymore. This also means continuing to find second preimage collisions gets even less attention.

Avalanche Effect

The final property of cryptographic hashing functions that needs to be addressed is the "avalanche effect". It is absolutely critical in cryptographic hashing functions that even though inputs may be sequential, their outputs do not show that to be the case. For example, consider the SHA-256 of the first 10 digits:

$ for I in {1..10}; do printf "$I: "; printf "$I" | sha256sum -; done
1: 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b  -
2: d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35  -
3: 4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce  -
4: 4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a  -
5: ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d  -
6: e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683  -
7: 7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451  -
8: 2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3  -
9: 19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7  -
10: 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5  -

Notice that there is no clear indication on sequential digests. For all practical purposes, they are truly randomized output, despite the sequential input (merely flipping a single bit on each input from the previous). However, can we formally define the avalanche effect? What would be ideal is that with each bit change on the input, every bit in the digest output has as close to a 50% chance of being flipped as theoretically possible.

I'll talk more about "rounds" in future posts when I talk about specific implementations and designs. Suffice it to say that a cryptographic hashing function will iterate through the compression functions a certain number of times, before outputing the state. On each round, the bits in the output should each have a 50% chance of being flipped. So, on each output of each round iteration, close to half of the bits have been flipped in some pseudorandom manner. After a certain number of rounds, the final output should be indistinguishable to true random noise.

So, how about this as a formal definition:

When a single input bit is flipped, each output bit should change with a 50% probability.

Of course, the cryptographic strength doesn't rest solely on the avalanche effect. There are mathematical properties that determine that. But, the output should be completely unpredictable. You could apply the "next bit test", in that there is no algorithm you could produce that would determine the next state of the next bit, without actually compromising the state of the machine (this is a test held to cryptographically secure pseudorandom number generators).

Unfortunately, all we have to test the avalanche effect is standard randomness tests, such as the chi-square distribution, Monte Carlo for Pi, and the Birthday Paradox, among others. This doesn't say anything about the cryptographic strength of the hashing function, but says a lot about randomness properties (non-cryptographic hashing functions can also exhibit strong randomness qualities).

There are a couple software utilities we can use to test and analyze cryptographic hashing functions. First, we have standard randomness tests, such as Dieharder and the FIPS 140-2 suite. But, for something more specific on analyzing cryptographic primitives, I would recommand Cryptol. On the one side, this isn't an out-of-the-box software solution for just running a battery of tests and analysis. It is actually a domain-specific language that will require a bit of a learning curve. On the other hand, it's Free Software, and you'll probably learn more about cryptanalysis with this tool, than just playing with randomness tests.

Conclusion

This was just a primer post to get you thinking about cryptographic hashes, specifically thinking about their output, and the task of finding collisions. The rest of the posts in the series will cover specific functions such as MD5, SHA-1, -2, and -3, as well as some others. We'll talk about hashing constructions, and where you'll find cryptographic functions in practice (I think you'll be surprised). I may even throw in a post or two about random oracles, and how we want cryptographic hashing functions to not only imitate them, but be proven secure under the "Random Oracle Model".

Regardless, this post will get you started, and hopefully excited for what is to come.

]]>
https://pthree.org/2016/03/07/cryptographic-hashing-part-i-introduction/feed/ 4
Manual Authenticated File Encryption With OpenSSL https://pthree.org/2016/02/27/manual-authenticated-file-encryption-with-openssl/ https://pthree.org/2016/02/27/manual-authenticated-file-encryption-with-openssl/#comments Sat, 27 Feb 2016 17:37:52 +0000 https://pthree.org/?p=4582 One thing that bothers me about OpenSSL is the lack of commandline support for AEAD ciphers, specifically AES in CCM and GCM block modes. Why does this matter? Suppose you want to save an encrypted file to disk, without GnuPG, because you don't want to get into key management. Further, suppose you want to send this data to a recipient or store it on a server outside of your full control. The authenticated encryption is important, otherwise the ciphertext is malleable and vulnerable to bit flipping.

So, when you get to the shell, you may try using AES in GCM mode with OpenSSL's "enc(1)" command, only to be left wanting. Here, we generate a key from /dev/urandom, convert it to hexadecimal, and provide the key as an argument on the command line.

$ LC_CTYPE=C tr -cd 'abcdefghjkmnpqrstuvwxyz23456789-' < /dev/urandom | head -c 20; echo
sec2tk24ppprcze33ucs
$ echo sec2tk24ppprcze33ucs | xxd -p
73656332746b323470707072637a6533337563730a
$ openssl enc -aes-256-gcm -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
AEAD ciphers not supported by the enc utility
$ echo $?
1

So, rather than using GCM, however, we can build the authentication tag manually with HMAC-SHA-512, which OpenSSL does support. This means using a non-authenticated block cipher mode, such as CTR, as a first step, then authenticating the ciphertext manually as a second step.

Using our same password from the previous example, we'll do this in two steps now:

$ openssl enc -aes-256-ctr -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
$ openssl dgst -sha512 -binary -mac HMAC -macopt hexkey:73656332746b323470707072637a6533337563730a -out file.txt.aes.mac file.txt.aes

Now you have three files- your plaintext file, your AES encrypted ciphertext file, and your HMAC-SHA-512 authentication file:

$ ls -l file.txt*
-rw-rw-r--. 1 aaron aaron 1050 Feb 27 10:26 file.txt
-rw-rw-r--. 1 aaron aaron 1066 Feb 27 10:27 file.txt.aes
-rw-rw-r--. 1 aaron aaron   64 Feb 27 10:28 file.txt.aes.mac

When sending or remotely storing the "file.txt.aes" file, you'll want to also make sure the "file.txt.aes.mac" authentication file is accompanied with it. Unfortunately, the OpenSSL dgst(1) command does not support verifying message authentication codes, so you'll have to script this manually. So, you'll need to generate a second file, maybe "file.txt.tmp.mac", then compare the two. If they match, you can decrypt the "file.txt.aes" ciphertext file. If not, discard the data.

This isn't elegant, and I wish enc(1) supported AEAD, but as it stands, it doesn't. So, you'll have to stick with doing things manually. However, this is something simple enough to script, and provides both data confidentiality and authenticity, which should be the goal of every ciphertext.

]]>
https://pthree.org/2016/02/27/manual-authenticated-file-encryption-with-openssl/feed/ 2
Digest Algorithms in Google Spreadsheets https://pthree.org/2016/02/26/digest-algorithms-in-google-spreadsheets/ https://pthree.org/2016/02/26/digest-algorithms-in-google-spreadsheets/#comments Fri, 26 Feb 2016 23:51:46 +0000 https://pthree.org/?p=4576 I can't imagine there are a lot of uses for using digest algorithms in spreadsheets, but I came up with one, and I really wished I had access to them. Seeing as though most spreadsheet applications don't ship one, I figured I would create my own.

Mostly, I use Google for my document processing and spreadsheet use, and I had a spreadsheet of Louis L'Amour books. My grandfather gave me his entire selection of Louis L'Amour books last year, and I made a goal to read them all during 2016. I grew up listening to them on audiotape in the truck when I was on the road with him laying carpet, heading up to the family cabin in Idaho, and other things, so, I have memories of many of the stories. It will be fun to read them.

So, what do digest algorithms have to do with Louis L'Amour novels? Well, after the spreadsheet was created to track what I've read and what I have left, as well as a pace (I have to be reading at least 60 pages every day), I wanted to start reading the books in a random order. Sure, I'll read the Sackett, Hopalong Cassidy, Talon & Chantry, and Kilkenny series first, but when I'm finished with the series, I want to read the novels in random order. Why? Because I don't want to get caught up in published year watching him change as a writer, or go in alphabetical order, because that's boring. Randomness is exciting!

Now I could have used the =RAND() function in the spreadsheet, but when I sort the columns, the numbers change. So, I need to copy and paste their values, then sort the columns. Besides, is RAND() even cryptographically secure (indistinguishable from true random noise)? Even better, I could just get ASCII data off of /dev/urandom and paste those results into the column, then sort off of that. But that requires using an external tool. However, I could also use a digest algorithm to calculate the digest of the book title, then sort by the digest. Because digest algorithms aren't part of the Google Spreadsheet default functions, my OCD kicked in, and I had to create one.

Here is what I came up with. You'll notice that MD2(), MD5(), and SHA1() are created, even if they're not cryptographically secure for today's modern cryptographic applications. However, in this specific use case, such as sorting columns, they are fine. Also, notice that SHA256(), SHA384(), and SHA512() exist, but not SHA224(). This is because "Utilities.DigestAlgorithm" does not export a "SHA_224" algorithm, which in my opinion, is just odd. MD4 is also not available, nor any of the SHA-3 functions. Regardless, all the digest algorithms supported by the API are available.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Cryptographic hash functions for use in Google Spreadsheets
// Use with =MD5(string)
// A string or cell (or concatenation of cells) can be provided
// Output in hex
// Released to the public domain

function MD2(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD2, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function MD5(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD5, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA1(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_1, s)
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA256(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA384(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_384, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA512(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_512, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}

So, how do we apply it? Because this is a script which is applied to your spreadsheet, you need to add it manually every time you create a new spreadsheet document. Supposedly, you can permanently add it through the Chrome Web Store (I created a project that is currently pending review), but for the time being, copying/pasting works.

Navigate to "Tools > Script Editor", remove the default function Google provides, and add the code above. Save it as a project, then it will be available to your spreadsheet. Now you can use it.

For example,

To calculate the MD5 of cell "A1":
    =MD5(A1)

To calculate the SHA1 of cells "A1" through "D1":
    =SHA1(CONCATENATE(A1:D1))

Retrieve the 12 left-most characters from a SHA512 digest of cells "A1" through "D1":
    =LEFT(SHA512(CONCATENATE(A1:D1)), 12)

I'm sure there are a few uses for digest algorithms in spreadsheets, they're just not very common; at least web searching for their use gives me very scant results. If you find these helpful, even if there are other solutions, I would be interested in how you used them in the comments below.

Oh, and here's my Louis L'Amour spreadsheet tracking my reading progress. 🙂

]]>
https://pthree.org/2016/02/26/digest-algorithms-in-google-spreadsheets/feed/ 1
My Strange Tweets https://pthree.org/2016/02/17/my-strange-tweets/ https://pthree.org/2016/02/17/my-strange-tweets/#comments Thu, 18 Feb 2016 03:01:46 +0000 https://pthree.org/?p=4546 You may have noticed some tweets from me that look.... strange. Probably something like these:

First, let me provide some background. When Twitter was announced, a couple Free Software developers got together to create a self-hosted Free Software alternative. They called that alternative "Identica", because it was hosted in Canada, and a way to establish your social identity. It made sense, and the Free Software and Open Source ecosystem ate it up. Within no time, it was a thriving online social network, involving mostly those from the Free Software and Open Source world, with all sorts of very influential developers and people creating accounts.

One account that seemed to catch the eye of many was @key. It posted what appeared to be MD5 checksums every 2 hours, regularly and consistently. Plenty of people were following the account, yet it wasn't following anyone. People replied to the tweets, asking what it was posting, who it was, why it was doing what it was doing, if it was a government account, etc. No one could figure it out, and if there were MD5 checksums, no one could reproduce them. It was a social enigma, and it kept people enthralled and engaged.

I thought this was exceptionally creative, and I was quite jealous that I didn't think of it first. The best I figured was that it was posting the timestamp of the tweet with a custom salt. At least, that is what I would have done. It couldn't be an MD5 of random data, otherwise, why not just post the random data? Or is that exactly what it is? So, instead, I decided to play with the Identica API and roll my own, using my own account. I had already setup the "Identica-Twitter bridge", so anything I posted to my Identica account would get posted to Twitter automatically.

But, I have to be different. So rather than a random digest that no one could figure out (I'm sure it's a timestamp), I wanted something a little more transparent. I started with taking the SHA-1 of the Unix epoch (the number of seconds since Jan 1, 1970 00:00.00) at 13:37 local time, because it's leet. This was easily accomplished with a bit of shell code:

$ EPOCH=$(date --date="today 13:37" +%s); printf "$EPOCH: "; printf "$EPOCH" | sha1sum - | cut -d ' ' -f 1

This was the first tweet:

Later however, I wanted something even more creative. I go by the online nick "eightyeight" on IRC, because I play the piano. However, some Asian cultures see the number "8" as lucky. With "Chinese" fortune cookies, I figured I would "encrypt" a fortune at 08:08 local time. Again, I decided to do this with a bit of shell code:

$ fortune -s -n 70 | gzip -c | base64 | rot13 | paste -sd ''

The first tweet to hit that was (testing the API, so this one actually wasn't on 08:08):

However, Identica started going downhill. First, we had big challenges fighting bot spam. Despite repeated bug reports and discussion on the network, very little change was happening in the code to combat the spam (for future reference, just use Hashcash tokens as a proof-of-work for form submissions). Then getting venture capital, and attempting to appeal to the mass market, things started changing. First it rebranded itself as "Status.Net", then we lost threaded replies. The API was no longer Twitter compatible (at least some things were different), and branding got real weird. Then it rebranded itself again under a completely new code rewrite as "pump.io", and that is the status today. At this last rebranding, the API was no longer functional, and my scripts stopped. I didn't want to work with the Twitter API, so I didn't bother setting it up again.

It wasn't until some time ago I decided to resurrect my cryptic tweets. However, I made some changes. Instead of using SHA-1, I decided to use RIPEMD-160. Although it hasn't had the mountains of analysis SHA-1 has had, RIPEMD-160 is still considered secure, although with its 160-bit digest size, the security margin might be a bit too slim for some. However, I stuck with the same Unix epoch timestamp automated at 13:37 local time.

Then, after developing my own playing card cipher, and refining it with the help of @timshadel, I decided to actually attempt a legitimate (if still insecure) cipher with Talon. It's still a fortune (BOFH style) and it's still published at 08:08 local time for the same reasons. If you want a crack at decrypting it, check out my playing card cipher repository at https://github.com/atoponce/cardciphers. There should be a new one every day, but it may be possible that the fortune is 1 character too long, and as a result, it doesn't get posted (I've accounted for this, but I'm sure I've missed something).

What's the point? Nothing more than just a bit of fun. It's probably not something you're interested in seeing on your timeline, and I don't blame you. Granted, there will be one of each every day. If you don't have a busy timeline, I guess it could get a bit old. But, I don't plan on stopping, nor using a separate account.

]]>
https://pthree.org/2016/02/17/my-strange-tweets/feed/ 2
Checksums, Digital Signatures, and Message Authentication Codes, OH MY! https://pthree.org/2016/02/16/checksums-digital-signatures-and-message-authentication-codes-oh-my/ https://pthree.org/2016/02/16/checksums-digital-signatures-and-message-authentication-codes-oh-my/#comments Wed, 17 Feb 2016 05:01:16 +0000 https://pthree.org/?p=4528 I recently submitted a bug to the Vim project about its Blowfish encryption not using authentication. Bram Moolenaar, the lead developer of Vim, responded about using checksums and digital signatures. I hope he doesn't mind me using him as an example here, but I want to quote the relevant bits (emphasis mine):

The encryption is meant to avoid other people, who don't have the key, from reading the text. It does not have the goal of protecting manipulation of the text, that is something else. You could add a checksum even when not using encryption. I believe it's called signing.

Unfortunately, Bram is confusing checksums, digital signatures, and message authentication codes, all rolled up into one. I don't blame him. This is a topic that is not well understood by those not intimately familiar with cryptography. In a nutshell, each provide data integrity at the core. Where they differ is whether or not you're using encryption keys, and whether or not those encryption keys are symmetric or asymmetric. So, in this post, I would like to break it down.

Checksums

Checksums do not require any sort of encryption key. They are simply digests, or "fingerprints" that represent some data. When you download a piece of software from the Internet, there may be a file with an MD5, SHA-1, or SHA-256 hash of the file. This is the software vendor providing a way for you to verify that you got all the correct bits when the download completes.

For example, suppose you wish to download the latest Debian 8.3.0 amd64 ISO from https://mirrors.xmission.com/debian-cd/8.3.0/amd64/iso-cd/. Notice that there are the following files: MD5SUMS, SHA1SUMS, SHA256SUMS, & SHA512SUMS. Part of the SHA256SUMS file looks like this:

$ head SHA256SUMS
1dae8556e57bb04bf380b2dbf64f3e6c61f9c28cbb6518aabae95a003c89739a  debian-8.3.0-amd64-CD-1.iso
89facfbb5039e49d4e3eeff1cca6ab55e9121ff46affeb46ed510c11731acf41  debian-8.3.0-amd64-CD-10.iso
7f6bc807d3636975374b937c2724353f7468ecd7a61e60f2a8b71f92eeefe629  debian-8.3.0-amd64-CD-11.iso
bd99b7c274ea400b50960ab9e46dd23bad76f87574d2ceee1e8e43859fbd045b  debian-8.3.0-amd64-CD-12.iso
e85679304a509593526cffa77ff0d675329565eb4430444ee2c0d2cdd87842a8  debian-8.3.0-amd64-CD-13.iso
69f727bceb0460957bbd5023fe79749c6bf9f0e3a1b89945e6c63c6b3f04f509  debian-8.3.0-amd64-CD-14.iso
d1dab389f8cb794013986d2da8a6dc72c0be8bc932fcc6d7291cb09b418724d5  debian-8.3.0-amd64-CD-15.iso
913b5d89322b500a02f699d44778901cb59aae909f09bff64963115143c2a6ca  debian-8.3.0-amd64-CD-16.iso
0638aca6f59a8f5bec6d1cd4d272cea01758c2b2d6ec1412048ecb78ef684a77  debian-8.3.0-amd64-CD-17.iso
6f17742fbc82828f04da39f66647e958b0ac667cb4d2a40c9888c749680f1eb8  debian-8.3.0-amd64-CD-18.iso

So, when downloading "debian-8.3.0-amd64-CD-1.iso", I can use the sha256sum(1) command to verify the file:

$ sha256sum debian-8.3.0-amd64-CD-1.iso
1dae8556e57bb04bf380b2dbf64f3e6c61f9c28cbb6518aabae95a003c89739a  debian-8.3.0-amd64-CD-1.iso

The digest matches, so the download was successful and all the correct bits exist. Another way would be to download the SHA256SUMS file, and use the "-c" switch for the utility to verify the checksum automatically, rather than you eyeballing it:

$ sha256sum -c SHA256SUMS 
debian-8.3.0-amd64-CD-1.iso: OK

The important thing to understand about checksums, is they are completely and totally anonymous. There is no secret shared between the server where I downloaded the software and myself, and there is no identity attached to the checksum. This means that anyone can change the original file and recalculate the checksum. So, if transferring data over the Internet, there is nothing preventing a man-in-the-middle attack from replacing the bits you're downloading with something else, while also replacing the checksum.

In other words, checksums provide data integrity, but they do not offer any sort of authentication. However, there are a number of checksum hashing functions, both cryptographically secure and not, such as CRC, MurmurHash, MD5, SHA-1, SHA-2, SHA-3, and so forth. For non-authenticated data integrity, a cryptographically secure hash function isn't always desirable, which is why non-cryptographic hash functions exist.

Digital Signatures

Digital signatures are a form of checksum, in that they provide data integrity, but they require asymmetric encryption to also provide authenticity. Digital signatures are away to attach an identity to the checksum. This implies a level of trust between you and the 3rd party, such as Debian with our example above. If you have met with the 3rd party, or dealt with them enough to establish some level of trust, then you can install the 3rd party's public key into your system. Then, when they provide data attached with a digital signature, you can verify that the data did in fact come from the 3rd party, and no other source.

Notice that a man-in-the-middle attack is no longer valid here, if I already have the 3rd party's public key installed on my system. Going back to our example with Debian, I have the Debian signing public key already installed. So, I can now download the MD5SUMS.sign, SHA1SUMS.sign, SHA256SUMS.sign, or SHA512SUMS.sign file, along with the checksums file I already downloaded, and verify that the checksums are those intended by Debian:

$ gpg --verify SHA256SUMS.sign                   
gpg: assuming signed data in `SHA256SUMS'
gpg: Signature made Sun 24 Jan 2016 11:08:33 AM MST using RSA key ID 6294BE9B
gpg: Good signature from "Debian CD signing key <debian-cd@lists.debian.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: DF9B 9C49 EAA9 2984 3258  9D76 DA87 E80D 6294 BE9B

If we look at the contents of the SHA256SUMS.sign file, we get the following:

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABCAAGBQJWpRMhAAoJENqH6A1ilL6bYz4P/3ZNCR8N+rrlSSgTN/AkpSVt
WXWg2BTflY3cPYmKK/osJUvLT7HTPDhabPiuQY2jJxrHYJhq5sCOrhbgc4eSRmIf
IsSm7OxQ9TXqde4mg9DVsxmIRui/rVbhEjkAVu47A0eGDUrRxczgJUo14En3jO0Z
qhXypCIN90y8HWaqy6OMe+eCsPyGxmXpWRT1XEH9tOX21wCAaxUl6ZHkiqNqdt8u
Erojls77nlaBR/tvB9CHXTkUmqsocdYD+n5UsvtmLlYN0nz85b7NhrLEW2QtugLd
MJngeI5eJvI4Hjyas0HfSlsdoBAvF+Uw3Dn9aHiTIeWVIeCYUKhdXmLww0dL0n95
jVuBSuMavQwOKKRGTbvG++RET9s/2U/G95wK0Vfx5fsf1neKVJgYf9q9iyObgcH8
dRLAqkgWJBNkvm9oXmpcy7jAq8jlXzDfaPz8plAyqDuIXoOSCHpJ5KAbAS1cYLIT
9U2cQLKTbCPrWJT5xZzOMuCPWu1CzfluDEafFsNzurWG5vCmFEJ+vV9strkeEIuX
tFeKVDkkhVEZYQKSbIlidXBa/WP2Q0g1KvlKXb+nsnWDtWAjLUPD621F3ZjUcjlX
aDPv3J+7kqfryA/7qYMVTH67KY3DwKIDKt6XtquxSf7HuYqEwXKIXp2De7zCCEqH
csWVPFNUQyOdetIC/l/w
=TjXr
-----END PGP SIGNATURE-----

The details of a PGP signature aren't important for this blog post. Suffice it to say that it requires the signer's private key to produce the signature, and the signer's public key to verify it. The sender will hash the message with their private key, and append the signature to the message. When the recipient receives the message, they will separate the signature from the message, hash the message, and verify that the two signatures match using the sender's public key. If they do match, the recipient knows that only the sender signed the data, and no one else.

Because the private key is required to create the signature, and because only the 3rd party should have access to the private key, this means that a man-in-the-middle attack is no longer effective. A miscreant should not be able remove the signature, apply a new signature, and have the recipient still verify the signature as "good" from Debian, unless that miscreant also had access to Debian's private signing key.

To make an example of an existing software vendor, Arch Linux was under heat about this. A core developer strongly disagreed about digitally signed packages. They would provide their software from their repositories with MD5 checksums only. The packages were not digitally signed.

So, when your local Arch Linux installation would request packages from the Arch Linux software repository, unless served over HTTPS, a man-in-the-middle could interject their own bits with their own MD5 checksum. Your pacman(8) package manager would verify that the MD5 is valid, and proceed to install the software with root privileges, because that is what you told it to do. By also digitally signing the package with an Arch Linux signing key, this attack is no longer possible.

Eventually, Arch Linux fixed the vulnerability, and closed a very large security hole, by digitally signing their packages.

As with checksums, digital signatures really should be using a cryptographically secure hashing function as part of the protocol. This can include RIPEMD160, SHA-2, SHA-3, BLAKE2, Skein, and others. MD5 and SHA-1 are no longer considered cryptographically secure, and should not be used with digital signatures (thus why SHA-256 SSL certificates instead of SHA-1).

Message Authentication Codes

Finally, message authentication codes (also called "MAC tags") are another way to provide data integrity with authentication, but this time using symmetric encryption. Where digital signatures imply a physical identity behind the authentication, MAC tags provide anonymous authentication. Generally, symmetric keys don't have identities associated with them. They're usually short-lived and shared via complex key exchanges, such as the Diffie-Hellman key exchange.

A MAC tag is keyed, meaning a shared secret is used when calculating the digest. There are a number of different implementations of MACs, such as CBC-MAC, HMAC, UMAC, & Poly1305, among others. The differences between each of those isn't important for this post. What is important, is how they are calculated, and how they are used with encryption.

The sender of some message will apply a cryptographic hashing function to the message, and append (or prepend) the resulting digest to the message, and send the full payload off. Because both the ciphertext and the MAC were calculated with a shared secret key, a man-in-the-middle cannot strip the MAC tag and apply their own without knowing the shared secret. Because, when the recipient receives the payload, they will strip off the MAC tag, rehash the message with the same keyed hashing function, and see if the two MAC tags match. If they do match, the message can be acted upon. If they do not match, something happened to the data in transit, and the payload can be safely ignored.

There are three main ways to apply MAC tags to messages: encrypt-then-MAC, MAC-then-encrypt, and encrypt-and-MAC. The first, encrypt-then-MAC, is considered "best practice" for message authentication. First the message is encrypted, then the ciphertext is authenticated, and the resulting MAC is appended to the ciphertext. This provides both ciphertext and plaintext integrity. The big advantage to this approach, is that if the MAC tag does not match the newly calculated MAC tag during verification, the ciphertext does not need to be decrypted. This is the default approach with IPsec and modern versions of OpenSSH. RFC 7366 standardizes this for TLS (yet to be implemented by OpenSSL last I checked). Also an ISO/IEC 19772:2009 standard.

Encrypt-then-MAC flowchart.

The next approach, MAC-then-encrypt, means authenticating the plaintext, appending the resulting MAC tag to the plaintext, and then encrypting the full plaintext and MAC tag payload. While this approach offers plaintext data integrity, it does not offer ciphertext integrity. As such, the ciphertext must be decrypted before the MAC tag can be verified. This is the default behavior in older versions of OpenSSH.

MAC-the-encrypt flowchart

Finally, encrypt-and-MAC, means authenticating the plaintext first, then encrypting the plaintext. The resulting MAC tag is appended to the ciphertext. Again, like MAC-then-encrypt, this approach offers plaintext data integrity, but it does not offer ciphertext integrity. So, you must detach the MAC tag first, then decrypt the ciphertext, then verify if the MAC tag is valid. This is the default behavior with OpenSSL.

Encrypt-and-MAC

As I understand it, there are no known vulnerabilities with MAC-then-encrypt and encrypt-and-MAC MACs. However, by having both ciphertext and plaintext integrity with encrypt-then-MAC, as well as not needed to decrypt the ciphertext on failure, is why encrypt-then-MAC is the preferred way to handle message authentication.

As with digital signatures, MACs should be calculated with cryptographically secure hashing functions, such as RIPEMD160, SHA-2, SHA-3, BLAKE2, Skein, etc. MD5 and SHA-1 would not qualify (although we could get into a discussion about HMAC-MD5 and HMAC-SHA1, but we won't).

Conclusion

No doubt, it's confusing to separate checksums from digital signatures from message authentication codes. Things get even a bit more hairy with blind signatures (used primarily in digital currencies) and Merkle trees (used primarily in peer-to-peer networks and copy-on-write filesystems), but they're special cases of the primary three functions discussed above. However, if you can get checksums, digital signatures, and message authentication codes cleared up, then you're that much closer to implementing cryptographic protocols correctly.

]]>
https://pthree.org/2016/02/16/checksums-digital-signatures-and-message-authentication-codes-oh-my/feed/ 3
Bitcoin Mining Rate and Waste https://pthree.org/2016/01/30/bitcoin-mining-rate-and-waste/ https://pthree.org/2016/01/30/bitcoin-mining-rate-and-waste/#comments Sat, 30 Jan 2016 13:27:01 +0000 https://pthree.org/?p=4520 Recently, the Bitcoin mining rate surpassed 1 exahash per second, or 1 quintillion SHA-256 hashes per second.

Bitcoin mining graph showing gigahashes per second over time.

If we do some quick math, we can determine the following:

  • If SHA-1 collisions can be found in 2^65.3 hashes, that's one SHA-1 collision found every 45 seconds.
  • Every combination of bits can be flipped in an 84-bit keyspace every year.
  • If mining is done strictly with ASICs and each ASIC can produce 1 trillion hashes per second, that's 1,000,000 ASICs.
  • If each ASIC above consumes 650 Watts of power, that's 650 Megawatts of power consumed.
  • At 650,000 kWh per ASIC, that's 1.3 million pounds of CO2 released into the atmosphere every hour if using fossil fuels.
  • Current global rate is about 160 Bitcoins mined per hour.
  • At $0.15 USD per kWh, that's $609 spent on electricity per Bitcoin mined. Bitcoin is currently trading at $376/BTC.

That's "back of the envelope" calculations, with some big assumptions made about the mining operation (how it's powered, who is powering it, etc.).

Of course, not all mining is using fossil fuels, not all miners are using ASICs, not all ASICs can do 1 trillion hashes per second (some more, some less), not all ASICs are consuming that wattage per rate, and the cost of electricity was strictly a U.S. figure. Of course, if you're using a GPU, or worse, a CPU for your mining, then you expending more electricity per the rate than ASICs are. That may help balance out some of the miners who are using renewable energy, such as solar power for their mining. Many Chinese and Russian mining data centers certainly have less overhead costs on electricity. You get the point- we've made some big assumptions, to come to some very rough "ballpark" figures. I don't think we're too far off.

So, according to those numbers, unless using renewable energy, cheaper electricity, or Bitcoin trading goes north of $610 USB/BTC mining for Bitcoin is a net loss. This comes at the expense of 1.3 million pounds of CO2 released into the atmosphere every hour. I would argue that Bitcoin is the worst idea to come out of Computer Science in the history of mankind.

]]>
https://pthree.org/2016/01/30/bitcoin-mining-rate-and-waste/feed/ 8
Using Your Monitors As A Cryptographically Secure Pseudorandom Number Generator https://pthree.org/2016/01/21/using-your-monitors-as-a-cryptographically-secure-pseudorandom-number-generator/ https://pthree.org/2016/01/21/using-your-monitors-as-a-cryptographically-secure-pseudorandom-number-generator/#respond Thu, 21 Jan 2016 12:55:26 +0000 https://pthree.org/?p=4511 File this under the "I'm bored and have nothing better to do" category. While coming into work this morning, I was curious if I could use my monitors as a cryptographically secure pseudorandom number generator (CSPRNG). I don't know what use this would have, if any, as your GNU/Linux operating system already ships a CSPRNG with /dev/urandom. So, in reality, there is really no need to write a userspace CSPRNG. But what the hell, let's give it a try anywho.

The "cryptographically secure" piece of this will come from the SHA-512 function. Basically, the idea is this:

  • Take a screenshot of your monitors.
  • Take the SHA-512 of that screenshot.
  • Resize the screenshot to 10% it's original size.
  • Take the SHA-512 of that resized file.
  • Take the SHA-512 of your previous two SHA-512 digests.
  • Take the last n-bits of that final digest as your random number.

Most GNU/Linux systems come with ImageMagick pre-installed, as well as the "sha512sum(1)" function. So thankfully, we won't need to install any software. So, here's a simple shell script that can achieve our goals:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/sh
# Produces random numbers in the range of [0, 65535].
# Not licensed. Released to the public domain.

cd ~/Private # assuming you have an encrypted filesystem mounted here

TS1=$(date +%Y%m%d%H%M%S%N)
import -window root ${TS1}.png
sha512sum ${TS1}.png > /tmp/SHA512SUMS

TS2=$(date +%Y%m%d%H%M%S%N)
convert -scale 10% ${TS1}.png ${TS2}.png
sha512sum ${TS2}.png >> /tmp/SHA512SUMS

DIGEST=$(sha512sum /tmp/SHA512SUMS)
printf "%d\n" 0x$(printf "$DIGEST" | awk '{print substr($1, 125, 128)}')

shred ${TS1}.png ${TS2}.png
rm ${TS1}.png ${TS2}.png
shred /tmp/SHA512SUMS
rm /tmp/SHA512SUMS

Running it for 10 random numbers:

$ for i in {1..10}; do sh monitor-csprng.sh; done
15750
36480
64651
7942
2367
10905
53889
9346
52726
63570

A couple things to note:

  • This is slow, due to taking the screenshot, and resizing it.
  • The data on your monitors should be sufficiently random. Chats, social updates, etc. The security of this will depend entirely on the entropy of the initial screenshot.
  • You really should be saving your screenshots to an encrypted filesystem, such as eCryptfs.
  • We're using timestamps with nanosecond accuracy to provide some additional entropy for the final SHA-512 digest.
  • This is using the last 4 hexadecimal characters to be converted to decimal. In reality, it could be anything, including some convoluted dynamic search algorithm in the string.

It's worth noting that the entropy of the initial screenshot is critical, which is actually difficult to accurately measure. So, it may help to have a chat window or more open, with recent chat logs. Same could be said for social update "walls", with the most recent updates (Twitter, Facebook, Goodreads, etc.). Having a clock with seconds ticking in a status bar can also help (although not unpredictable, at least semi-unique). Tabs in browsers, running applications, etc. The more unpredictable your workspace in the screenshot, the better off you'll be. But, people in general suck at randomness, so I'm not advocating this as something you should rely on for a cryptographically secure random number generator.

If you wanted, you could add this to a terminal, giving you a sort of "disco rave" before taking the screenshot:

1
2
3
4
5
6
7
#!/bin/sh
# Disco lights in your terminal
# No license. Released to the public domain

while true; do
    printf "\e[38;5;$(($(od -d -N 2 -A n /dev/urandom)%$(tput colors)))m•\e[0m"
done

Get that running first, then take your screenshot. But then, if you're reading data off of /dev/urandom, you might as well do that for your random numbers anyway...

]]>
https://pthree.org/2016/01/21/using-your-monitors-as-a-cryptographically-secure-pseudorandom-number-generator/feed/ 0
Disable Pocket From Iceweasel https://pthree.org/2016/01/02/disable-pocket-from-iceweasel/ https://pthree.org/2016/01/02/disable-pocket-from-iceweasel/#comments Sat, 02 Jan 2016 14:20:52 +0000 https://pthree.org/?p=4501 I'm not sure who I should be more disappointed in- Mozilla or Debian. Iceweasel 43 recently arrived in Debian unstable, and with it, Pocket. For those who are not familiar, Pocket is a 3rd party service that allows users to save sites they want to read or visit for later. Provided the extension is installed, this allows users to sync pages they want to read for later, across devices and platforms.

But here's the catch: it's a proprietary non-free service-as-a-software-substitue (SaSS).

Thankfully, you can disable it, and it really isn't that difficult. Open up about:config in a new tab, and type "pocket" into the search filter. From there, set "browser.pocket.api" and "browser.pocket.site" to "localhost", and set "browser.pocket.enabled" to "false", then restart your browser.

Screenshot showing about:config with the above settings in place.

It really bothers me that Mozilla has enabled this sort of integration into their browser. Not only Pocket, but other proprietary or privacy invasive plugins and extensions also, such as "sponsored tiles" (which is finally removed), "encrypted media extensions", and "Hello" (which I haven't figured out how to disable). These sorts of things should be separate extensions or plugins that the user can install at their whim. Shipping it by default takes away freedom and choice, and it's turning the browser into a proprietary non-free software application.

What ultimately bothers me about this, is that Mozilla already has bookmark synchronization support, and their sync server is Free Software, allowing you to roll your own. Pocket doesn't offer anything that Mozilla Sync doesn't. I already have a "TOREAD" bookmark folder, where I can put pages I want to read later. And it's synched across all of my devices.

Mozilla pushing the 3rd party proprietary Pocket, and Debian shipping it in Iceweasel (thankfully, a bug is submitted) is a great disservice to users and a threat to software freedom.

Hopefully, Pocket goes the way of sponsored tiles, and gets removed.

]]>
https://pthree.org/2016/01/02/disable-pocket-from-iceweasel/feed/ 1
Encrypted Account Passwords with Vim and GnuPG https://pthree.org/2015/12/31/encrypted-account-passwords-with-vim-and-gnupg/ https://pthree.org/2015/12/31/encrypted-account-passwords-with-vim-and-gnupg/#comments Thu, 31 Dec 2015 12:40:05 +0000 https://pthree.org/?p=4466 Screenshot of terminal showing hidden password with Vim syntax highlighting

Background

I've been a long-time KeepassX user, and to be honest, I don't see that changing any time soon. I currently have my password database on an SSH-accessible server, of which I use kpcli as the main client for accessing the db. I use Keepass2Android with SFTP on my phone to get read-only access to the db, and I use sshfs mounts on my workstations with KeepassX for read-only GUI access. It works great, and allows me to securely access my password databases from any client, mobile or otherwise.

However, I recently stumbled on this post on how to use Vim with GnuPG to create an encrypted file of passwords: http://pig-monkey.com/2013/04/password-management-vim-gnupg/. I've heard about a GnuPG plugin for Vim for years now, and know friends that use it. I've even recommended that others use it as a simplistic means of keeping an encrypted password database, instead of relying on 3rd-party tools. However, I've never really used it myself. Well, after reading that post, I decided to give it a try.

Defining a specification

Ultimately, everything in that post I'm carrying over here, with only a couple modifications. First, fields should end with a colon, which include the comments. Comments could just be just a single line, or multi-line, but it's still a field just as much as "user" or "pass". Further, there should be a little flexibility in the field keywords, such as "user" or "username". Additionally, because I exported my Keepass db to an XML file, then used a Python script to convert it into this syntax, I also carried over some additional fields. So, I've defined my database with the following possible fields:

  • comment|comments
  • expire|expires
  • pass|password
  • tag|tags
  • type
  • url
  • user|username

Notice that I did not define a "title" as would be the case in the Keepass XML. The entry itself is the title, so I find this redundant. Also, you'll noticed I defined an additional "type" field. While not explicitly defined in the Keepass XML, it is implicitly defined with icons for entries. This could be useful for defining "ssh" vs "mysql" vs "ldap" vs "http" authentications when doing searching in the file.

So, an invalid example on pig-monkey.com is:

Super Ecommerce{{{
    user:   foobar
    pass:   g0d
    Comments{{{
        birthday:   1/1/1911
        first car:  delorean
    }}}
}}}

This is invalid due to the "Comments" field. Fixed would be:

Super Ecommerce{{{
    user:   foobar
    pass:   g0d
    Comments:{{{
        birthday:   1/1/1911
        first car:  delorean
    }}}
}}}

Another valid entry could be:

Example {{{
    username: aarontoponce
    password: toomanysecrets
    url: https://example.com
    type: http
    tags: internet,social,2fa
    comments: {{{
        backup codes: vbrd83ezn2rjeyj, p89r4zdpjmyys2k, rdh6e7ubz8vh82g, er4ug6vp25xsgn5
        2fa-key: "3udw mkmm uszh cw2a 5agm 7c3p 5x32 tyqz"
    }}}
}}}

Notice that I have not defined file comments, such as those found in configuration files or source code. There is a comment section per entry, so that seems to be the fitting place for any and all comments.

I really liked the post, and how thought out the whole thing was, including automatically closing the PGP file after an inactivity timeout, automatically folding entries to prevent shoulder surfing, and clearing the clipboard when Vim closes. However, one oversight that bothered me, was not concealing the actual password when the entry is expanded. Thankfully, Vim supports syntax highlighting. So, we just need to define a filetype for GnuPG encrypted accounts, and define syntax rules.

Vim syntax highlighting

EDITED TO ADD: I tried getting the Vim syntax working in this post, but WordPress is clobbering it. So, you'll need to get it from pastebin instead. Sorry.

To get this working, we need a syntax file. I don't know if one exists already for this syntax structure, but it isn't too difficult to define one. Let's look at what I've defined in this pastebin, then I'll go over it line-by-line.

The first four lines in the syntax file define are just comments. Next is just a simple if-statement checking if syntax highlighting is enabled. If so, use it. The first interesting line is the following:

let b:current_syntax = "gpgpass"

This defines our syntax. Whenever we load a file with syntax highlighting enabled, and we set the "filetype" to "gpgpass", this syntax will be applied.

syntax case ignore

This just allows us to have "comment" or "Comment" or "COMMENT" or any variations on the letter case, while still matching and proving a highlight for the match.

After that, we get into the meat of it. This "syntax match" section allows me to conceal the passwords on the terminal to prevent shoulder surfing, even when the entry is expanded. this is done with setting the background terminal color to "red" and the foreground text color also to "red". Thus, we have red text on a red background. The text is still yankable and copyable, even with the mouse cursor, it's just not visible on screen.

The actual concealment is done with the regular expression. An atom is created to match "pass:" or "password:" surrounded by whitespace as the first word on the line. However, I don't want to conceal the actual text "pass:", just the password itself. So, the regular expression "\@<=" says to ignore our atom in the match, and only match "\S\+" for concealing. The concealment is achieved with red foreground text on a red background with:

highlight gpgpassPasswords ctermbg=red ctermfg=red

The rest of the syntax matching in that pastebin is for identifying our fields, and highlighting them as a "Keyword" using regular expressions. All field names will be highlighted the same color based on your colorscheme, as they are all defined the same. Thus, aside from the hidden password, there is uniformity and elegance in the presentation of the syntax.

Using the syntax in Vim

This syntax file won't do us much good if it isn't installed and Vim isn't configured to use it. We could save it system-wide to "/usr/share/vim/vim74/syntax/gpgpass.vim", or just keep it in our home directory at "~/.vim/syntax/gpgpass.vim". Whatever works.

Now that the syntax file is installed, we need to call it when editing or viewing GnuPG password files. We can use the vimrc from pig-monkey.com, with one addition- we're going to add "set filetype=gpgpass" under the "SetGPGOptions()" function. Now, I understand that you may edit encrypted files that are not GnuPG password files. So, you're going to get syntax highlighting in those cases. Or, you could enable the modeline and set a modeline in the password file. The problem with the modeline, is its long history of vulnerabilities. Most distributions, including Debain, disable it, and for good reason too. So, I'd rather have it set here, and unset the "filetype" if it's bothering me.

Here's the relevant config:

if has("autocmd")
    """"""""""""""""""""
    " GnuPG Extensions "
    """"""""""""""""""""

    " Tell the GnuPG plugin to armor new files.
    let g:GPGPreferArmor=1

    " Tell the GnuPG plugin to sign new files.
    let g:GPGPreferSign=1

    augroup GnuPGExtra
    " Set extra file options.
	autocmd BufReadCmd,FileReadCmd *.\(gpg\|asc\|pgp\) call SetGPGOptions()
    " Automatically close unmodified files after inactivity.
	autocmd CursorHold *.\(gpg\|asc\|pgp\) quit
    augroup END

    function SetGPGOptions()
    " Set the filetype for syntax highlighting.
	set filetype=gpgpass
    " Set updatetime to 1 minute.
	set updatetime=60000
    " Fold at markers.
	set foldmethod=marker
    " Automatically close all folds.
	set foldclose=all
    " Only open folds with insert commands.
	set foldopen=insert
    endfunction
endif " has ("autocmd")

Conclusion

What I like about this setup is the portability and simplicity. I am in a terminal on a GNU/Linux box most of my waking hours. It makes sense to use tools that I already enjoy, without needing to rely on 3rd party tools. This also closes the gap of potential bugs with 3rd party password managers leaking my passwords. I'm not saying that Vim and GnuPG won't be vulnerable, of course, but I do place more trust in these tools than the Keepass ones, to be honest.

As of right now, however, I am still a Keepass user. But, I wanted to put this together, and try it out for size, and see how the shoe fits. As such, I've exported my KeepassX database, encrypted it with GnuPG, configured Vim, and I'm off to the races. I'll give this a go for a few months, and see how I like it. I know it's going to pose issues for mp on my phone, even with ConnectBot and SSH keys. But, maybe I don't need it on my phone anyway. Time will tell.

Oh, and I can still view the database as read-only and still enjoy the syntax highlighting benefits by using "view /path/to/passwords.gpg" instead of "vim /path/to/passwords.gpg".

]]>
https://pthree.org/2015/12/31/encrypted-account-passwords-with-vim-and-gnupg/feed/ 3
Multiple Encryption https://pthree.org/2015/12/26/multiple-encryption/ https://pthree.org/2015/12/26/multiple-encryption/#respond Sat, 26 Dec 2015 14:37:37 +0000 https://pthree.org/?p=4462 I hang out in ##crypto in Freenode, and every now and then, someone will ask about the security of multiple encryption, usually with the context that AES could be broken in the near future. When talking about multiple encryption, they are usually referring to cascade encryption which has the form of:

CT = Alg_B(Alg_A(M, key_A), key_B)

The discussion revolves around the differences between "Alg_A" and "Alg_B". Such as using AES for "Alg_A" and Camellia for "Alg_B". Also, the discussion will include whether or not "key_A" and "key_B" should be the same key, or different.

Cascade encryption is more efficient in storage space than some alternatives, such as this one suggested by Bruce Schneier:

CT = Alg_A(OTP, key_A) || Alg_B(XOR(M, OTP), key_B),  where OTP is a true one-time pad

I'm not going to go into the theoretical concerns with multiple encryption. However, I would like to cover some practical considerations:

  1. Multiple key security.
  2. Long-term storage.
  3. Complexity.
  4. Host security.

Multiple key security

It should come as no surprise that when dealing with multiple encryption, that you are going to be dealing with multiple keys, if you choose to keep "key_A" and "key_B" separate. Probably the most difficult aspect of encryption implementations, is keeping the secret key secret. For example, key exchanges between machines over the scary Internet has been notoriously difficult to get correct. Current best practice is implementing authenticated ephemeral elliptic-curve Diffie-Hellman (ECDHE) when communicating secret symmetric keys between machines. So, not only do you need to communicate one key, but multiple keys when encrypting and decrypting data.

If the multiple-encrypted data is to be stored on disk, then keys will need to be retrieved for later. How are these stored? This isn't an easy question to answer. If you store them in a password manager, they are likely just getting single-encrypted, probably with AES. So, the security of your ciphertext rests on the security of your stored keys, likely protected by the very algorithm you are trying to safe-guard.

Now, you could use the same key for every encryption layer. But, this poses a theoretical concern (which I promised I wouldn't cover- sorry). If the same key is used for every layer, then if an attacker can recover the key through cryptanalysis of the first encryption layer, then the attacker could possibly decrypt all remaining layers. Obviously, you don't want to use ciphers where the decryption process is exactly the same as the encryption process. Otherwise, the second encryption process on the ciphertext would decrypt the first encryption! While not probable, this last scenario could even occur with different algorithms, such as AES and Camellia. So, it seems at least at a cursory glance, that using the same key for all encryption layers probably is not a wise idea. So, we're back to key management, which is the bane of cryptographers everywhere.

Long-term storage

In my opinion, a larger problem is that of storage. It's one thing to get multiple encryption correct and safe on the wire, it's another to place value on long-term data storage. Think about it for a second- what is the longest you have kept data on the same drive? In personal scenarios, I have some friends that have had personal backups for up to five years. To me, this is impressive. It's likely more common that data switches drives every couple of years. RAID arrays die, hardware is replaced, higher drive capacity is demanded, or even bit rot creeps in, destroying data (such as on magnetic or optical mediums). When push comes to shove, the encrypted data is just going to move from drive-to-drive. But, ask yourself this next question- what is the oldest data you have in your possession right now?

Let's be realistic here for a second. I would be hard-pressed to find data I stored back in 2000, 15 years ago. I could find some photos in photo albums, on mugs, and on Christmas cards, but I'm not 100% confident I could get the digital original. Despite my best efforts, accidents happen, mistakes are made, and data is just lost. I don't think I'm alone here. I've even worked for companies, with large budgets, that had a hard time recovering data that is 10+ years old. For one, it's expensive to hold on to data indefinitely, but a great amount of data also becomes less and less valuable as time progresses. Yes, I still use my same email account from 2004- Google has done a great job of keeping all of my emails these past 11 years, and I would expect other data service providers to do the same. But, how many of you have kept an email address for 10+ years? Or even the data, for that matter? (This blog is actually 11 years old as well- kudos to me on keeping the data going this long).

My point is, hardware fails and is changed. Your personal value on data also changes, and accidents happen. So you're concerned about AES being broken in 20 years, or even sooner. Do you think by that time you'll still place value on that encrypted data? Do you think you'll even still have access to it, or can find it? And, if so, will it really be that difficult to decrypt the AES data, and encrypt it with the current best practice encryption algorithm?

Complexity

This is probably the problem you should be concerned with the most. As a collective group, we as developers have a hard time getting single encryption correct, let alone multiple encryption. This deeply enters the theoretical realm, which I promised I wouldn't blog about. But, you do have a practical concern as well- order of operations and correct implementations.

First, order of operations. It's one thing to do "double encryption", where only two algorithms are chosen and used. If you can't recall if you used AES first or second, it's a 50/50 shot at getting the order correct (provided you know which key belongs to which algorithm, otherwise it's a one-in-four chance). Imagine however, using three encryption layers, and lining the keys up correctly. Imagine the complexity of four layers, or more. Ugh. Seems like you certainly don't want to go higher than two layers.

Second, look at implementations. AES is AES. It shouldn't matter what algorithm does the calculations. But, implementations like to put "magic bytes" at the beginning of ciphertexts (OpenSSL, OpenPGP, etc.). This data is only valuable for that implementation, and even worst, for a specific subset of versions. Just imagine encrypting a file with OpenSSL version 1.0 now, and needing to decrypt it in 10 years. Will OpenSSL version X be able to read those magic bytes, and correctly decrypt the file? Or will it error out, unable to decrypt the data because the data structure of the magic bytes changed in that 10 year time frame?

So, it seems best to encrypt it with some programming language library, where you can control exactly what data is stored. But, as everyone will tell you while frothing at the mouth, "don't roll your own crypto". Technically, you aren't if you "import aes" and use the "aes" module provided by that language correctly. It just remains to be seen if you implemented it correctly to thwart an attacker. Crypto is hard and full of sharp edges. It's very difficult to get things right, without getting cut. Regardless, while the "aes" module might be available in 10 years, what about the "camellia" module, or whatever algorithm you chose for the second layer? Is it still in development, or was it abandoned due to either being broken, or lack of development? Can you find that module, so you can decrypt your data?

Host Security

In a more practical real-world, everyday person scenario, how secure is the host that is doing the multiple encryption? Do others have physical access to the machine? Is it free of viruses, malware, and other badware? Does the system run an encrypted filesystem? Where and how are backups stored? Who has access to those backups? So many more questions can be asked that judge the quality of the security level of the host storing or processing the data.

Viruses and malware would probably be my number one concern if the data was so valuable, as to be multiple-encrypted. So, I would probably encrypt the plaintext on one machine, encrypt the ciphertext on a second machine, and store it on a third machine, preferably air-gapped. Thus, if a virus exists on one machine, hopefully it doesn't exist on another, and hopefully it doesn't attach itself to my encrypted data, and hopefully the badware didn't report my plaintext to a botnet pre-encryption.

Physical host security is hard. People have crappy passwords protecting their workstations. Physical access can get the attacker root regardless. Systems are infected with badware all the time, just by visiting websites! So there is hardly a guarantee that your data is safe, even though it was encrypted multiple times with different keys and algorithms.

A Couple Thoughts

It hardly seems worth the effort to encrypt your data multiple times with different algorithms and different keys, provided the overhead necessary in managing everything (hardware and software). Further, in reality, modern encryption algorithms aren't usually broken. For example, DES as an algorithm, isn't broken- it just requires a small key space. So, encrypting your data multiple times is solving a problem that for the most part, just doesn't exist.

That's not to say that AES will remain secure in 10, 20, or 40 years. I'm not that naive. But, as a user, you do have the ability to switch algorithms when AES does break. So, decrypt your AES ciphertext, and encrypt it with SevenFish (sorry Bruce- bad joke). Keep it encrypted with SevenFish until that breaks, and then decrypt it, and encrypt it with whatever the new modern cipher is at the time (if you still have the data, it's still valuable to you, and all implementations can still work with the ciphertext).

Conclusion

In my opinion, don't worry about multiple encryption. Generate a GnuPG key pair, encrypt your data once, and be done with it.

]]>
https://pthree.org/2015/12/26/multiple-encryption/feed/ 0
Getting Root On The Nexus 6 With Android 6 https://pthree.org/2015/12/22/getting-root-on-the-nexus-6-with-android-6/ https://pthree.org/2015/12/22/getting-root-on-the-nexus-6-with-android-6/#comments Tue, 22 Dec 2015 21:25:45 +0000 https://pthree.org/?p=4456 This probably the 40th millionth time, since owning this phone, that I've needed to root my device. Because I keep doing it over and over, while also referring to past commands and notes, it's high time I blogged the steps. If I can benefit myself from my own blog post, then chances are someone else can. So, with that said, here's what we're going to do:

  1. Grab the latest Nexus factory images from Google.
  2. Update the phone by flashing all the images (without wiping user data).
  3. Flash the recovery with the latest TWRP image.
  4. Get root on the device with Chainfire's "system-less root" SuperSU package.
  5. Enable USB tethering and the wireless hotspot functionality.

Before beginning, I should mention that if the title isn't immediately clear, this post is specific to the Motorola Nexus 6, which is the phone I currently own. It's probably generic enough, however, to be applied to a few Nexus devices. Minus getting the factory Nexus images from Google, this might even be generic enough for non-Nexus devices, but you're on your own there. Proceed at your own risk. With that said, it's fairly hard to brick an Android phone these days.

Also, you need to make sure you have an unlocked bootloader. Google ships with the bootloader locked by default. Unlocking it, will wipe your user partition, meaning you will lose any and all user data (images, videos, text messages, application data, etc.). I'm going to assume that you've already unlocked the bootloader, and are ready to proceed.

TL;DR

If you don't want to read the post, and know what you're doing, here's the short of it:

$ tar -xf shamu-mmb29k-factory-9a76896b.tgz
$ cd shamu-mmb29k
$ adb reboot bootloader
$ fastboot flash bootloader bootloader-shamu-moto-apq8084-71.15.img
$ fastboot reboot-bootloader
$ fastboot flash radio radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
$ fastboot reboot-bootloader
$ fastboot update image-shamu-mmb29k.zip
$ fastboot flash recovery twrp-2.8.7.1-shamu.img
$ fastboot reboot recovery
(reboot normally)
$ adb push UPDATE-SuperSU-v2.46.zip /sdcard/supersu.zip
$ adb reboot recovery
(install /sdcard/supersu.zip from TWRP)
(do not install TWRP root)
(reboot normally)
(install build.prop editor from Google Play)
(set "net.tethering.noprovisioning" to "true")

Otherwise ...

Getting the Google Nexus factory images

Navigate to https://developers.google.com/android/nexus/images#shamu and grab the version you are looking for. For example, I recently wanted to flash 6.0.1, so I grabbed the "MMB29K" image. Before flashing, I find it critical to verify the checksums. They are "27dde1258ccbcbdd3451d7751ab0259d" for MD5 and "9a76896bed0a0145dc71ff14c55f0a590b83525d" for SHA-1. So, after downloading, I pulled up a terminal, and verified them:

$ md5sum shamu-mmb29k-factory-9a76896b.tgz
27dde1258ccbcbdd3451d7751ab0259d  shamu-mmb29k-factory-9a76896b.tgz
$ sha1sum shamu-mmb29k-factory-9a76896b.tgz 
9a76896bed0a0145dc71ff14c55f0a590b83525d  shamu-mmb29k-factory-9a76896b.tgz

After examination, it's clear these checksums match, so I'm ready to flash.

Flashing the images

This step does not require root on your device. I'll need to connect my phone to my computer via USB, and verify that I can talk to it via adb(1). This means installing the Debian "android-tools-adb" and "android-tools-fastboot" packages if they're not already. After installed, I should be able to verify that I can talk to the phone:

$ sudo apt-get install android-tools-adb android-tools-fastboot
(...snip...)
$ adb devices
List of devices attached 
[serial number]      device

If your device is visible, we are ready to rock-n-roll. First, extract the tarball, and enter the directory:

$ tar -xf shamu-mmb29k-factory-9a76896b.tgz
$ cd shamu-mmb29k
$ ls -lh
total 2.3G
-rw-r--r-- 1 atoponce atoponce  124 Jan  1  2009 android-info.txt
-rw-r--r-- 1 atoponce atoponce 8.1M Jan  1  2009 boot.img
-rw-r----- 1 atoponce atoponce  11M Nov 18 16:59 bootloader-shamu-moto-apq8084-71.15.img
-rw-r--r-- 1 atoponce atoponce 6.2M Jan  1  2009 cache.img
-rw-r----- 1 atoponce atoponce  985 Nov 18 16:59 flash-all.bat
-rwxr-x--x 1 atoponce atoponce  856 Nov 18 16:59 flash-all.sh*
-rwxr-x--x 1 atoponce atoponce  814 Nov 18 16:59 flash-base.sh*
-rw-r----- 1 atoponce atoponce 964M Nov 18 16:59 image-shamu-mmb29k.zip
-rw-r----- 1 atoponce atoponce 113M Nov 18 16:59 radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
-rw-r--r-- 1 atoponce atoponce 8.8M Jan  1  2009 recovery.img
-rw-r--r-- 1 atoponce atoponce 2.0G Jan  1  2009 system.img
-rw-r--r-- 1 atoponce atoponce 136M Jan  1  2009 userdata.img

Notice a couple of things- first, there are shell scripts "flash-all.sh" and "flash-base.sh" for Unix-like systems. Also, notice the "bootloader-shamu-moto-apq8084-71.15.img" & "radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img" raw images, as well as the "image-shamu-mmb29k.zip". These are the only files we're going to concern ourselves with when flashing the phone.

However, we want to be careful that we don't flash "userdata.img". This will format your user partition and all user data will be wiped (see above). What we're going to do, is basically the same execution as the "flash-all.sh" shell script. However, we're going to make just one small modification. Further, we need our phone already booted into the bootloader. As such, here's what we're going to do:

$ adb reboot bootloader
$ fastboot flash bootloader bootloader-shamu-moto-apq8084-71.15.img
$ fastboot reboot-bootloader
$ fastboot flash radio radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
$ fastboot reboot-bootloader
$ fastboot update image-shamu-mmb29k.zip

Notice that I removed -w from that last command (if you looked in the "flash-all.sh" shell script). That option wipes user data, which would be necessary if we wanted to return the phone back to factory state. We don't- we're just upgrading. Also, I don't see the need for "sleep 5". Just wait for the phone to successfully reboot before running the next command.

At this point, the phone is successfully updated. If you were to reboot the phone, it would be perfectly operational as if you did an OTA update, or purchased it from the store. However, we want root, so we have a few more steps to accomplish.

Getting and flashing TWRP

This step also does not require root on your phone. I prefer TWRP for my recovery on Android. It's touch-based, which sets the UI apart from the other recoveries, and it's Free Software, unlike ClockworkMod. Both of these are big wins for me. Grab the latest image at https://twrp.me/devices/motorolanexus6.html. I downloaded twrp-2.8.1.7-shamu.img. Unfortunately, I couldn't find any checksums to check to verify the download. So, I installed it anyway, knowing I could flash the stock "recovery.img" if something goes wrong. So far, things have been great, so I calculated the checksums for you:

$ md5sum twrp-2.8.7.1-shamu.img 
f040c3a26f71dfce2f04339f62e162b8  twrp-2.8.7.1-shamu.img
$ sha1sum twrp-2.8.7.1-shamu.img
40017e584879fad2be4043c397067fe4d2d76c88  twrp-2.8.7.1-shamu.img
$ sha256sum twrp-2.8.7.1-shamu.img
ebe5af833e8b626e478b11feb99a566445d5686671dcbade17fe39c5ce8517c7  twrp-2.8.7.1-shamu.img

If those checkout, you should be safe in flashing. Currently, the phone should already be booted into the bootloader. If not, make sure it is. Once in the bootloader, we can flash TWRP then reboot normally:

$ fastboot flash recovery twrp-2.8.7.1-shamu.img

Now, it's critical that we don't normally reboot the phone. If we do, recovery will be overwritten, and we'll have to reflash. So, while your phone is still booted into the bootloader, reboot it into recovery. You can do this by pressing the volume up/down arrows, until rebooting into recovery is available, and pressing the power button. This should boot you into TWRP. Now that you're there, you can reboot the phone normally.

WARNING
It is possible that while booting, your phone will notify you that the system cannot be verified. One of two things will happen: either the boot will pause, and not go further, or will boot without despite the warning. If you flashed these exact versions, my phone boots without the warning at all. However, don't panic if you see it. Remember, you have the factory images. Just reflash the recovery.img, and you will be just fine.

More info can be found at http://www.xda-developers.com/a-look-at-marshmallow-root-verity-complications/.

Getting and flashing SuperSU (getting root)

WARNING
At this point, the phone should be booted into its regular state. We are now ready to root the phone. This means getting the latest SuperSU package, and installing it through TWRP. However, I need to throw out another caution. We'll be installing a beta version of SuperSU to do something called "system-less root". This means that the package will only be modifying the bootloader image to get root, and will not be touching the system partition. This is both good, and bad. It's good in that we only need to reflash the bootloader to remove root. It's bad in that this is experimental software, and really not ready for production. Further, unlike TWRP, SuperSU is proprietary software, which sucks. It does make me a bit nervous, to be honest, to rely on non-free closed-source proprietary software, on such a critical piece of my life. Proceed at your own risk.

As of this writing, you'll need to get the SuperSU package from the XDA forums at http://forum.xda-developers.com/showpost.php?p=64161125&postcount;=3. I grabbed version "BETA-SuperSU-v2.64-20151220185127.zip". There may be updates since this post was published.

Unfortunately, again, I did not see any published checksums. So, I've installed it, with the knowledge of how to reflash my bootloader should I encounter problems.

$ md5sum UPDATE-SuperSU-v2.46.zip 
332de336aee7337954202475eeaea453  UPDATE-SuperSU-v2.46.zip
$ sha1sum UPDATE-SuperSU-v2.46.zip 
6135f9d0af28e02f4292c324bf5983998e7ae006  UPDATE-SuperSU-v2.46.zip
$ sha256sum UPDATE-SuperSU-v2.46.zip 
d44cdd09e99561132b2a4cd19d707f7126722a9c051dc23f065a948c7248dc4e  UPDATE-SuperSU-v2.46.zip

Provided these checksums match, we're good to go. We need to push the ZIP to our phone with the Android debugger, and reboot into the TWRP recovery:

$ adb push UPDATE-SuperSU-v2.46.zip /sdcard/supersu.zip
$ adb reboot recovery

From the TWRP interface, tap "Install" and install the /sdcard/supersu.zip package. When it finishes, tap "Reboot". TWRP will ask if you would like to install the root provided by the image. You do NOT want to install this root- you just flashed one.

The phone should boot normally.

Enable USB tethering and the wireless hotspot

This step requires root. Finally, we want to enable the hotspot and tethering. Google is bending to wireless carriers, forcing the user to prove that they are subscribing to a cellular service that allows them to use USB tethering or the wireless hotspot. Personally, I find this dirty, and unfortunate. Even worse, is the fact that cellular providers think they can get away by charging double for using your own data. Data is data; it shouldn't matter if it comes from your phone, or your laptop connected to your phone. If they want to charge for overages on caps, whatever. But charging double, just because you connected your phone via USB? Or setting up a hotspot in your grandma's house, because she doesn't have WiFi but you have cellular coverage? Please. This is clearly grandfathered from the days of feature phones, where you couldn't tether or hotspot. So, you purchased a USB dongle to enable the hotspot. Even then, it was dirty, but it's clear that this is a byproduct of days gone by.

To enable tethering and the hotspot, you just need to add one line to /system/build.prop config file. Unfortunately, /system/ is mounted read-only. So, you'll have to remount it as read-write and edit the file. However, every attempt I have made at modifying it has ended up with an empty file- IE: losing all its contents. So, rather than editing it manually, there is an app for that.

Install https://play.google.com/store/apps/details?id=com.jrummy.apps.build.prop.editor&hl=en. Add "net.tethering.noprovisioning" and set the property to "true", then reboot your phone. At that point, you should be able to USB tether and setup a wireless hotspot.

Conclusion

This wasn't for the faint of heart or for someone who doesn't care about gaining the necessary control over their Android phone that root would give them (setting up firewalls, ad blockers, tethering/hotspot, etc.). However, as mentioned earlier, it's getting fairly difficult to hard brick and Android phone these days. Even better, the steps are getting somewhat standardized. IE: flash factory images, flash custom recovery, install SuperSU, & optionally enable tethering/hotspot.

]]>
https://pthree.org/2015/12/22/getting-root-on-the-nexus-6-with-android-6/feed/ 3
Your GnuPG Private Key https://pthree.org/2015/11/19/your-gnupg-private-key/ https://pthree.org/2015/11/19/your-gnupg-private-key/#comments Fri, 20 Nov 2015 02:36:08 +0000 https://pthree.org/?p=4416 This post is inspired by a discussion in irc://irc.freenode.net/#gnupg about Keybase and a blog post by Filippo Valsorda.

I was curious just exactly how my private key is encrypted. Turns out, gpg(1) can tell you directly:

$ gpg --output /tmp/secret-key.gpg --export-secret-keys 0x22EEE0488086060F
$ gpg --list-packets /tmp/secret-key.gpg
:secret key packet:
	version 4, algo 17, created 1095486266, expires 0
	skey[0]: [1024 bits]
	skey[1]: [160 bits]
	skey[2]: [1023 bits]
	skey[3]: [1023 bits]
	iter+salt S2K, algo: 3, SHA1 protection, hash: 2, salt: ad8d24911a490591
	protect count: 65536 (96)
	protect IV:  01 1e 07 58 4a b6 68 a0
	encrypted stuff follows
	keyid: 22EEE0488086060F
(...snip...)

Notice the line "iter+salt S2K, algo: 3, SHA1 protection, hash: 2, salt: ad8d24911a490591". In there, you see "algo: 3" and "hash: 2". What do those identifiers reference? If you refer to RFC4880, you can learn what they are:

Symmetric Encryption Algorithms

  1. Plaintext or unencrypted data
  2. IDEA
  3. 3DES
  4. CAST5
  5. Blowfish
  6. Reserved
  7. Reserved
  8. AES-128
  9. AES-192
  10. AES-256
  11. Twofish

Cryptographic Hashing Algorithms

  1. MD5
  2. SHA-1
  3. RIPEMD-160
  4. Reserved
  5. Reserved
  6. Reserved
  7. Reserved
  8. SHA-256
  9. SHA-384
  10. SHA-512
  11. SHA-224

I emphasized the defaults, which are CAST5 and SHA-1. So, your key is encrypted with the SHA-1 of your passphrase, which is used as the key for CAST5 to encrypt your private key. Thus, the whole security of your encrypted private key rests on the entropy of your passphrase, provided that sane defaults are chosen for the encryption and hashing algorithms, which they are.

CAST5 has been well analyzed and it is not showing any practical or near practical weaknesses. It is a sane default to chose for a symmetric encryption algorithm. However, CAST5 uses 64-bit blocks for encrypting and decrypting data, which may have some theoretical weaknesses. AES uses 128-bit blocks, and thus has a larger security margin. Because AES-256 is available as a symmetric encryption algorithm, there really is no reason to not use it, aside from feeling more secure.

SHA-1 is showing near practical attacks on blind collisions, but for use with keying a block cipher from a passphrase, it's still exceptionally secure. What is needed to break SHA-1 in this regard, is a pre-image attack. A pre-image attack is where you have the hash, but you do not know the input that created it. This is not brute force. This is able to break the algorithm in such a way, that provided with any hash, you can reliably produce its input. SHA-1 has a wide security margin here, so there really is nothing practical to worry about. However, with SHA-512 available, there is also really no reason why not to use a SHA-2 algorithm. In fact, aside from the increase security margin, SHA-512 is designed to work well on 64-bit platforms, but struggle with 32-bit. So this gives us an increased security margin, albeit negligible, against using something like SHA-256.

So, how can we change these? Turns out to be quite simple. All you need to do is specify the secret key symmetric encryption algorithm and hashing algorithm, then change your password (retype it with the same password if you don't want to change it):

$ gpg --s2k-cipher-algo AES256 --s2k-digest-algo SHA512 --edit-key 0x22EEE0488086060F
Secret key is available.

pub  1024D/0x22EEE0488086060F  created: 2004-09-18  expires: never       usage: SCA 
                               trust: unknown       validity: unknown
sub  1792g/0x7345917EE7D41E4B  created: 2004-09-18  expires: never       usage: E   
sub  2048R/0xCE7911B7FC04088F  created: 2005-07-04  expires: never       usage: S   
(...snip...)

gpg> passwd
(...snip...)
gpg> save

Now if we export our key, and look at the OpenPGP packets, we should see the new updates:

$ gpg --output /tmp/secret-key.gpg --export-secret-keys 0x22EEE0488086060F
$ gpg --list-packets /tmp/secret-key.gpg
:secret key packet:
	version 4, algo 17, created 1095486266, expires 0
	skey[0]: [1024 bits]
	skey[1]: [160 bits]
	skey[2]: [1023 bits]
	skey[3]: [1023 bits]
	iter+salt S2K, algo: 9, SHA1 protection, hash: 10, salt: 9c3dbf2880791f2e
	protect count: 65536 (96)
	protect IV:  db f5 e8 1c 98 03 99 7c 77 33 4e cd d3 3c 1f 4f
	encrypted stuff follows
	keyid: 22EEE0488086060F
(...snip...)

Now I have "algo: 9" which is "AES256" and "hash: 10" which is "SHA512" protecting my private key. I've gained a little bit extra security margin at the cost of retyping my passphrase. Not bad. Now, I'd like to pose you a question:

If I were to publish my encrypted GnuPG private key, think I'm crazy?

Let's look at this for a second. We know that the key is protected with AES-256. AES remains secure, after 15 years of intense scrutiny and analysis, and is showing no signs of wear. It's the most used symmetric encryption algorithm that protects HTTPS, SSH, VPN, OTR, Tor, and even your GnuPG encrypted emails. It protects social security numbers, credit card transactions, usernames and passwords, patient health records, hard drives, and on, and on, and on. AES is secure.

So, breaking the encrypted key will be at least as hard as breaking AES, which seems to be a long shot. So, you would be better off attacking my passphrase. Knowing that it was used as the key for AES, and that it is hashed with SHA-512, we can write a brute force algorithm to generate passphrases, hash them with SHA-512, and attempt at decrypting the AES key. After all, we have the salt and the IV right there in plaintext in the packets. Turns out, there are already projects for this.

This should be a breeze. Unless, of course, you have sufficient entropy behind your passphrase. What is sufficient entropy? Well, take some time searching "entropy" on my blog. You'll quickly learn that I recommend at least 80-bits. 100-bits would give you a paranoid security margin, even with the password being hashing with a fast hashing algorithm like SHA-512.

So, if the private key is encrypted with AES-256, and the passphrase has sufficient entropy to withstand a sophisticated attack, and it is hashed with SHA-512, then I should have no concern posting my encrypted key into the open, right?

Without further ado, here is my encrypted GnuPG private key:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

- -----BEGIN PGP PRIVATE KEY BLOCK-----
Version: GnuPG v1

lQHpBEFLyzoRBACXCUta5CK+DCgnXn9wkqUumkcbenibGPBe3Y8IEY4BjkdbGdTN
tiGB+Tvo0hzn2qzy4mNPlOx/LWZWF2MdwF3WS77wwIskMb8W314zhE2RS0G318YY
X7zMGSF+7QiNXNsW/d0t1RonYOKIS96zKOtFQZrTr//V8+1rxEa4rvO5dwCgul0s
pt2BUDqwoy2Q/5UKgnmrzmsD/37/3g5zXykvTH2P6BlgTdfnVvpOLDT3CyWlAynz
u5hdmgYNT50I2w5TstY+uViYhAbMiyIT1HwBRcaQh8hUWkzDGyzJF7pS4pZeD0M9
u0P7Cejm2+ENdOX66ablWjP7GLJRcToGxnAZ6hgPpWLen8lHYaUK//g4JJx8UJ/n
wifeA/9xYWDi3ur/fFCKQZIPV9Ziw1oL58su948yWRn2WN7m74+bSldkXzkc4jRe
Q51FpGBHMswRIJKB6yG1FbfLum8ppGbvtz9NrMMZuirguTWetX8aJrjr0ddGjTsY
uZPfKoUiqDUXSFc3hmVgQQQ4MFdD3XYy6AQTyI1vstCS/Tdn7P4JAwqcPb8ogHkf
LmDb9egcmAOZfHczTs3TPB9Pg1SJqjvSz7nKDY87EVmeM46YBaCs1XScaOF4Gs+x
u0LNAxlfX3xOUIWRtCdBYXJvbiBUb3BvbmNlIDxhYXJvbi50b3BvbmNlQGdtYWls
LmNvbT6IWQQTEQIAGQUCQUvLOgQLBwMCAxUCAwMWAgECHgECF4AACgkQIu7gSICG
Bg/+cACeM0EeO7gE85/OSwMzxjvQAGB53jgAnik6qvFWyQtvp71KElbpZUsa0YNj
nQIoBEFLy0IQBwCjVGmY/PmOtRHtBIuANfg9zf8thGXZtFZWEgzHLGUgSfIjb0di
F24mwiVw2k3gzqBKuFBJ633F3AhwlTBnXS3tLWQgwSrm3BcCOOn+wJvwgUXa0iBn
gXhcq/7IL7HnKsYiG9EFMI8mAd10t7zdsA/dS2xUmFNexQvUdra/uRU+eeQbrXYe
iwfymw2RCZROl4/QXA9/a/aTyUhKgkj2vieo0jh394h7grPZxw0lgCclvTN/0jGq
dPkp56NMDb0eGlVzWeEiseD1JxXeYaSeToJP3zmx+nFoiBa+VVVeUhzYAwADBQb+
P373jEOwDu63py4FWdMPMYNWv1xFWLYI5hWaTKPSRwG/NZICRDF+QNztSmVOhW7Z
SFY/nTHq5yFp+QID63VMGv5Cunse9QXAoarecyV2hllwUq7l7wHujJhvvqyEgL9G
ah/drkZMGe8btYihz/M4g5i1P2DPr4CL/46eZxgjmjuVw7Nb0UsgUPgGizPCbnJ3
ye1ahxc1dOX80Guh0ZDRfR/ehZkk07wN2H6KRrxFCAmDaCR9KxwGYbpepND0t4HG
qCMji37lzYUrS4PV6yK0DGekqF98xgiIVBYFjF0jwQD+CQMKnD2/KIB5Hy5g4K9F
3JFIPxw+L+Gdc2MSYuJI3Y5kpZluUmYYYYFgOH64/J8egeYSSKWUIPqMhnwBWtbW
GQCNdztQyIIi6mB3YiDeK2AMnhRq+PwwwGG1iEYEGBECAAYFAkFLy0IACgkQIu7g
SICGBg+MhQCfZ7FNu4wMtdifkblGkN5Qqj+cYWYAoLiipdgnnhPTP2z7SgOsxiR4
YI4wnQPEBELI4BIBCADCobEk1f0sByuV4p2moEmZIXXEJhzTolO2mmBLBSmbjPMg
OBpAFTmWYXJxo8oZnNeeTOWN/hsHV9rmC/c/bZ7FGX4u/pN0l3qOSoSO6m6yvM/O
q4idyl17SfqDR9AUdEwJIA9zfSomzmzfO/q3rIAOPNkd71RQ4YMkHn8SfQLaojxl
+N5pPB29c6Cy+hltP6JRibAHgYCkxQGLK6ZLQ1LTwkDLNxCH8qfmHvg7M3qcXxl4
KOgtb8kNZ3PpAvek1GNL/eaF7nnd/u5uHuJM5V7czVHjpMbBfRGgLzR6Tu6v8qm7
nJzhSErxC7lK0hXLgGuHlk26bj9pzb+lfBk7GnkRAAYp/gkDCpw9vyiAeR8uYPUV
mINtxpejZeID6EaNZ8wTKNB1e3yt/As2Svkpcl1ivosbADhxCTFSJ4RtmEynFPCw
qgYbjuTBK+Fv2LSd7CYHJm4zZ/hfiCBgL4jI3ZY52qHFnBhgHtLwza/eLeaC0i4+
RWUAN9JEW1Ygpsh0ApRd1UvY8dGoLF24fy3C2jeh5Q3SVw2SnhUryLU7u6gTTLt4
yskT+/pzdNrb2n3dvLLUhQJ50SeTroa/swB/SXnjWtDSRGpvv3HAStHSENaZCIrV
e/u/WSNd99nklpYdPwh3UFO9OOayYErYiprEhLC7JjdVbqbu2aIfrw+nS/PS+tlm
wQTYH/DKA6wIeSZK4y9Byfrx3oq4DPk9pge7n3z3oz9pzd+xG2GMvExwKiCUlcF5
he5Peb4sTsJz0epMpwcBXFNYyVO++rZvBTABhgY5MQAcympoBAAcspo8AvBWx2xK
SaIyp5XDtT8jqey5Jjo3qmIQvdpo5oQub73JCMDSHi7KZ/IkgtggpgDqNh9bT5ZE
C7d3lprPFQJNQtFTO2K8NecRkgatn9Imomy1DidlnsCuZjqfamNVssWhm0uNu8SC
aJYBtC9kZXhBLAy0mFCMfJQ8ysDzlG3+Fu9wE9tCQvkh7y0ZKo9Ortqnwqp6S0I3
2xMBOg2EMHRP+ADF3q/i2wlgbH0MHsiX4oMxd0eIZV+rJIQSnO547UvvzAUBjXjZ
1l23wmg8FctYIE0UPsTo1QNArGO1sIOdv807UMolB9lq1pKAXkJP0GPRZLv9bLDj
kkKLEBELXCCQA0vDH+N/eGXRdJbWv5i0mOXmxK2JAZZw8rsJF3XZ22PABrC0NaCP
qHftcq20PPMCRZ/TfjQmwlot495KmLUJo2G6JasdcEilyPH2GVUwiqCM2/wOWk4U
xX+FmA40NYmU64hJBBgRAgAJBQJCyOASAhsCAAoJECLu4EiAhgYPt9kAn3r0Hhf9
aTFyowH3pgqIUiaMVQo5AKCFWzcU23YT0E2/LZNl6Yqzcs113Q==
=SHc5
- -----END PGP PRIVATE KEY BLOCK-----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJWToHpAAoJEM55Ebf8BAiPk7kH+weVf8kVJRjSaSWE+Aft76iA
Nzj1cVUfpWoT/K139i3TMiZ6PpAQtCRyEakdxfeSfXiOz83pqmKSL5ADCdlRoxuB
HtkoLW6thETOs70mDrrsEQgBZgMYPMsiKG1W/M3xppRGZxUM7/UEXhjHYiThe1Qd
Dkwot+hu5EttQpu0kKFmPrviPpJOk0gJ5SQrhlROWCS+aT9TyhbswMRpSyurDZ2H
LppGk8EtBeWTsTf9AhemX1GFu4iJPwIDfZtiWOLGGjQn4ROqb/RWqLG254O//Gw6
jtDRGHIGyYk+2NQ6/gAKWI9Sxaz5kUxqKSzDU9WuDCE3peIB9HXM+ynFVKLgsXE=
=Rnfg
-----END PGP SIGNATURE-----

Now that this is posted, I should expect everyone to steal my GnuPG identity, parading around as me, forging signatures, correct? No, it's not happening. I trust the crypto. I trust the math. I trust the software.

Why? Why am I doing this? Well, a recent discussion popped up on IRC about Keybase. Some don't like the fact that they are encouraging you to upload your encrypted private key to the server. Some claim that it is "insecure". However, didn't we just logically and reasonably conclude that AES and SHA-2 are protecting my best interests? So, if it's insecure, then it can't be due to the crypto. It must be something else. Let's look at all the risk factors:

Keybase servers are compromised and you use the web interface
If the server is compromised, then the attackers can modify the code, providing malicious JavaScript to the browser, so when you successfully decrypt your private key, it can be sent and stored elsewhere under their control. There is nothing you can do here. You are screwed. This is a very valid concern.

Keybase servers are compromised and you use the command line interface
The attackers only have access to your encrypted private key. Provided you never use the web interface, all encryption and decryption as handled out-of-band. This means that regardless of the Keybase server compromise, the attackers will never actually get to forge signatures using your GnuPG private key.

Your local client is compromised
There is nothing I can do for you here. Keybase isn't to blame, and not even the gpg(1) client can protect you. You have bigger problems on your hands than the attackers gaining access to your unencrypted GnuPG private key.

I think a lot of GnuPG users are paranoid about their key getting leaked, because they are unsure of exactly how it is stored. Hopefully this post lays those fears to rest. If there are still concerns about the leak of encrypted private keys, then it's probably due to a fear of not fully understanding the strength of passwords and their entropy requirements. However, if you meet the necessary password entropy requirements, and your key is encrypted with CAST5/SHA-1 or AES-256/SHA-512, there is nothing wrong with keeping a backup of your GnuPG key on Github, AWS, Dropbox, or other "cloud" hosting solutions.

Trust the math. Trust the software.

By the way, if you do successfully recover my private key passphrase (and I know you won't), I would be interested in hearing from you. Send me a signed email with my private key, and I'll make it financially worth your while. 🙂

]]>
https://pthree.org/2015/11/19/your-gnupg-private-key/feed/ 5
Now Using miniLock https://pthree.org/2015/11/10/now-using-minilock/ https://pthree.org/2015/11/10/now-using-minilock/#respond Wed, 11 Nov 2015 02:45:35 +0000 https://pthree.org/?p=4401 I have been a long proponent of OpenPGP keys for a way to communicate securely. I have used my personal key for signing emails since ~ 2005. I have used my key at dozens and dozens of keysigning parties. I have used my key to store account passwords and credentials with vim(1), Python, and so many other tools. I have used my key to encrypt files to myself. And so much more. There is only one problem.

No one is using my key to send me encrypted data.

Sure, when attending keysigning parties, it's an encryption orgy. Attendees will sign keys, then send encrypted copies to recipients. And, of course, people will send encrypted emails before and after the party, for various reasons. But, when the party dies down, and people get back to their regular lives, and very few actually send encrypted data with OpenPGP keys. Realistically, it's rare. Let's be honest. Sure, there's the one-off, either as a corporation (XMission uses OpenPGP extensively internally for password storage) or individuals (sending encrypted tax forms to a spouse or accountant), but by large, it's rarely used.

I have a good idea of why, and it's nothing ground breaking- OpenPGP is hard. It's hard to create keys. It's hard to manage the keys. It's hard to grasp the necessary concepts of public keys, private keys, encryption, decryption, signatures, verification, the Web of Trust, user identities, key signing parties, revocation certificates, and so much more.

OpenPGP is just hard. Very hard.

Well, in an effort to encourage more people, such as my family and friends that would not use OpenPGP, to encrypt sensitive data, I've jumped on board with miniLock. What is miniLock? Currently, it's a Free Software browser application for the Google Chrome/Chromimum browser (not an extension). It uses ECC, bcrypt, BLAKE2, zxcvbn, and a number of other tools that you really don't need to worry about, unless you want to audit the project. All you need is an email and a password. The keys are deterministically generated based on that.

Think about this for a second. You don't need a public and private keyring to store your keys. You don't need to upload them to a key server. You don't need to attend keysigning parties, worry about the Web of Trust, or any of that other stuff that makes OpenPGP the nightmare it is.

All you need is an email and a password.

Unfortunately, this does have one big drawback- your email or password can't change, without changing your keys. However, the miniLock keys are cheap- IE: you can change them any time, or create as many as you want. You only need to distribute your miniLock ID. In fact, the miniLock ID is the entire public key. So, they don't even need to be long term. Generate a one-time session miniLock ID for some file that you need to send to your accountant during tax season, and call it good.

However, I prefer long-term keys, so as such, I created 3 IDs, one for each email account that I use. If you want to send me encrypted data, without the hassle of OpenPGP, feel free to use the correct miniLock ID for the paired email address.

Email miniLock ID
aaron.toponce@gmail.com mWdv6o7TxCEFq1uN6Q6xiWiBwMc7wzyzCfMa6tVoEPJ5S
atoponce@xmission.com qU7DJqG7UzEWYT316wGQHTo2abUZQk6PG8B6fMwZVC9MN
aaron.toponce@utah.edu 22vDEVchYhUbGY9Wi6EdhsS47EUeLKQAVEVat56HK8Riry

Don't misunderstand me. If you have an OpenPGP key, and would prefer to use that instead, by all means do so. However, if you don't want to setup OpenPGP, and deal with the necessary overhead, I can now decrypt data with miniLock. Maybe that will a better alternative for you instead.

]]>
https://pthree.org/2015/11/10/now-using-minilock/feed/ 0
Do XKCD Passwords Work? https://pthree.org/2015/09/15/do-xkcd-passwords-work/ https://pthree.org/2015/09/15/do-xkcd-passwords-work/#comments Tue, 15 Sep 2015 12:22:33 +0000 https://pthree.org/?p=4348 You'll always see comments on web forums, social sites, blog posts, and emails about "XKCD passwords". This is of course referring to the XKCD comic by Randall Munroe describing what he thinks is the best password generator:

What no one has bothered asking, is if this actually works.

Lorrie Faith Cranor, director of the Carnegie Mellon Usable Privacy and Security Laboratory at Carnegie Mellon University, a member of the Electronic Frontier Foundation Board of Directors, and Professor in the School of Computer Science and the Engineering and Public Policy Department at Carnegie Mellon University, did ask this question. In fact, she studied to the point, that she gave a TED talk on the subject. The transcript of her talk can be found here. Here are the relevant bits (emphasis mine):

Now another approach to better passwords, perhaps, is to use pass phrases instead of passwords. So this was an xkcd cartoon from a couple of years ago, and the cartoonist suggests that we should all use pass phrases, and if you look at the second row of this cartoon, you can see the cartoonist is suggesting that the pass phrase "correct horse battery staple" would be a very strong pass phrase and something really easy to remember. He says, in fact, you've already remembered it. And so we decided to do a research study to find out whether this was true or not. In fact, everybody who I talk to, who I mention I'm doing password research, they point out this cartoon. "Oh, have you seen it? That xkcd. Correct horse battery staple." So we did the research study to see what would actually happen.

So in our study, we used Mechanical Turk again, and we had the computer pick the random words in the pass phrase. Now the reason we did this is that humans are not very good at picking random words. If we asked a human to do it, they would pick things that were not very random. So we tried a few different conditions. In one condition, the computer picked from a dictionary of the very common words in the English language, and so you'd get pass phrases like "try there three come." And we looked at that, and we said, "Well, that doesn't really seem very memorable." So then we tried picking words that came from specific parts of speech, so how about noun-verb-adjective-noun. That comes up with something that's sort of sentence-like. So you can get a pass phrase like "plan builds sure power" or "end determines red drug." And these seemed a little bit more memorable, and maybe people would like those a little bit better. We wanted to compare them with passwords, and so we had the computer pick random passwords, and these were nice and short, but as you can see, they don't really look very memorable. And then we decided to try something called a pronounceable password. So here the computer picks random syllables and puts them together so you have something sort of pronounceable, like "tufritvi" and "vadasabi." That one kind of rolls off your tongue. So these were random passwords that were generated by our computer.

So what we found in this study was that, surprisingly, pass phrases were not actually all that good. People were not really better at remembering the pass phrases than these random passwords, and because the pass phrases are longer, they took longer to type and people made more errors while typing them in. So it's not really a clear win for pass phrases. Sorry, all of you xkcd fans. On the other hand, we did find that pronounceable passwords worked surprisingly well, and so we actually are doing some more research to see if we can make that approach work even better. So one of the problems with some of the studies that we've done is that because they're all done using Mechanical Turk, these are not people's real passwords. They're the passwords that they created or the computer created for them for our study. And we wanted to know whether people would actually behave the same way with their real passwords.

So, in her research, XKCD passwords really didn't work out that well. They are longer in length, so they take longer to type, which increases the chance for error, and people are no better at remembering on XKCD passphrase, than they are a short string of random characters.

To me, this is unsurprising. If you look at the history of my blogging on passwords, you'll find that I continually advocate true random events to build your passwords, maximizing entropy. In my last post, I even blogged two shell functions that you can use to build XKCD passwords, and "monkey passwords" (monkeys generating passwords by banging away at a keyboard). Both target 80-bits of entropy in the generation. Check out the lengths:

$ gen-monkey-pass 9
cxqwtw63taxdr3zn	uaq4tbt43japmm2q	mptwrxhhb486yfuv
-cb73b9-kgzhmww3	s45t3x6r9smw-7yr	hjkgzkha-qup4gh4
34c5rg4ksw-aprvk	uug-2vq7pfze6dnp	s4qx4eazbnrd2pqe

$ gen-xkcd-pass 9
sorestdanklyAlbanyluckyRamonaFowler   (sorest dankly Albany lucky Ramona Fowler)
towsscareslaudedrobinawardsrenal      (tows scares lauded robin awards renal)
thinkhazelsvealjuggedagingscareen     (think hazels veal jugged agings careen)
tarotpapawsNolanpacketAvonwiped       (tarot papaws Nolan packet Avon wiped)
surgesakimbohardercruelArjunablinds   (surges akimbo harder cruel Arjuna blinds)
amountlopsedgemeaslyCannoninseam      (amount lops edge measly Cannon inseam)
EssexIzmirwizesPattygroutszodiac      (Essex Izmir wizes Patty grouts zodiac)
hoursmailedslamsvowedallowspar        (hours mailed slams vowed allow spar)
AfghanNigelnutriadillmoldertrolly     (Afghan Nigel nutria dill molder trolly)

XKCD passwords average 32 characters to achieve 80-bits of entropy, compared to 16 characters that "monkey passwords" produce. And, according to the research done by Lorrie, people won't necessarily recall XKCD passwords any easier than "monkey passwords". So, if that's the case, then what's the point? Why bother? Why not just create "monkey passwords", and use a password manager?

Exactly. It's 2015. There are password managers for your browser, all versions of every desktop operating system, command-line based utilities for servers, and even apps for your smartphone. There are plenty of "cloud" synchronization services to make sure each instance is up-to-date. At this point, your passwords should:

  • Contain at least 80-bits of entropy.
  • Be truly random generated (no influence from you).
  • Be unique for each and every account.
  • Be protected with two-factor authentication, where available.
  • Be stored in a password manager, that is easily accessible.

You'll remember the ones you type in frequently, and you'll memorize them quickly. The others are stored for safe keeping, should you need to recall them.

]]>
https://pthree.org/2015/09/15/do-xkcd-passwords-work/feed/ 10
Password Generation in the Shell https://pthree.org/2015/09/05/password-generation-in-the-shell/ https://pthree.org/2015/09/05/password-generation-in-the-shell/#comments Sat, 05 Sep 2015 13:44:49 +0000 https://pthree.org/?p=4331 No doubt, some people use password generators- not many, but some. Unfortunately, this means relying on 3rd party utilities, where the source code may not always be available. Personally, I would rather be in full control of the entire generation stack. I know how to make sure plenty of entropy is available in the generation, and I know which sources of entropy to draw on to maximize the entropy estimate. As such, I don't use tools like pwgen(1), apg(1), or anything else. I rely strictly on /dev/urandom, grep(1), and other tools guaranteed to be on every BSD and GNU/Linux operating system.

As such, the script below has been successfully tested in various shells on Debian GNU/Linux, PC-BSD, FreeBSD, OpenBSD, NetBSD, and SmartOS. If you encounter a shell or operating system this script does not work in, please let me know. Thanks to all those who helped me test it and offered suggestions for improvement.

So, with that said, here they are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# No copyright. Released under the public domain.
# You really should have shuf(1) or shuffle(1) installed. Crazy fast.
shuff(){
    if [ $(command -v shuf) ]; then
        shuf -n "$1"
    elif [ $(command -v shuffle) ]; then
        shuffle -f /dev/stdin -p "$1"
    else
        awk 'BEGIN{
            "od -tu4 -N4 -A n /dev/urandom" | getline
            srand(0+$0)
        }
        {print rand()"\t"$0}'
| sort -n | cut -f 2 | head -n "$1"
    fi
}
gen_monkey_pass(){
    I=0
    [ $(printf "$1" | grep -E '[0-9]+') ] && NUM="$1" || NUM="1"
    until [ "$I" -eq "$NUM" ]; do
        I=$((I+1))
        LC_CTYPE=C strings /dev/urandom | \
            grep -o '[a-hjkmnp-z2-9-]' | head -n 16 | paste -s -d \\0 /dev/stdin
    done | column
}
gen_xkcd_pass(){
    I=0
    [ $(printf "$1" | grep -E '[0-9]+') ] && NUM="$1" || NUM="1"
    [ $(uname) = "SunOS" ] && FILE="/usr/dict/words" || FILE="/usr/share/dict/words"
    DICT=$(LC_CTYPE=C grep -E '^[a-zA-Z]{3,6}$' "$FILE")
    until [ "$I" -eq "$NUM" ]; do
        I=$((I+1))
        WORDS=$(printf "$DICT" | shuff 6 | paste -s -d ' ' /dev/stdin)
        XKCD=$(printf "$WORDS" | sed 's/ //g')
        printf "$XKCD ($WORDS)" | awk '{x=$1;$1="";printf "%-36s %s\n", x, $0}'
    done | column
}

Nothing fancy about them. The first function, "shuff" is really just a helper function for systems that might not have shuf(1) or shuffle(1) installed. It's used only in the "gen_xkcd_pass" function. The next function, "gen_monkey_pass" acts like monkeys banging on the typewriter. It reads /dev/urandom directly, reading the printable characters that come out of it, counting them to 16, and putting them in an orderly set of columns for output as seen below. The input is a total set of 32-characters, giving each character exactly 5-bits of entropy. So, at 16 characters, each password comes with exactly 80-bits of entropy. The character set was chosen to stay entirely lowercase plus digits, and remain unambiguous, so it's clear, and easy to type, even though it may still be hard to remember. The function can take a numerical argument, for generating exactly that many passwords:

$ gen_monkey_pass 24
awdq2zwwfcdgzqpm	t54zqxus77zsu6j6	-2h6dkp93bjdb496
thm9m9nusqxuewny	qmsv2vqw-4-q4b4d	ttbhpnh4n7nue5g8
ytt6asky765avkpr	grwhsfmyz872zwk3	mzq-5ytdv8zawhy6
zb46qgnt62k74xwf	uydrsh2axaz5-ymx	6knh32qj4yk885ea
vky55q2ubgaucdnh	5dhk9t97pfja9phj	rhn2qg734p83wnxs
-q2hb833c-54z-9j	t33shcc55e3kqcd6	q6fwn3396h4ygvq4
232hr73rkymerpyg	u2pq-3ytcpc79nb9	7hqqwqujz4mxa-en
jj9vdj3jtpjhwcp6	mqc97ktz-78tb2bp	q7-6jug86kqhjfxn

The last function, "gen_xkcd_pass" comes from the "correct horse battery staple" comic from XKCD. On every Unix system, there is a dictionary file installed at /usr/share/dict/words or /usr/dict/words. On Debian GNU/Linux, it contains 99,171 words (OpenBSD contains 234,979!). However, many of them have the apostrophe as a valid character. Taking out any punctuation and digits, we are left with just lowercase and uppercase characters for our words. Further, the total word space is limited to at least 3 characters in length and at most 6 characters in length. This leaves us with 19,198 words, or about 14.229-bits of entropy per word. This means generating at least 6 words to achieve an 80-bit entropy minimum. For clarity, the password is space-separated to the right in parens, to make it more clear what exactly the password is, as shown below. Even if all 6 words have 6 characters (the password is 36 characters in total), the formatted line will never be longer than 80 characters in width, making it fit perfectly in an 80x24 terminal. It also takes a numerical argument, for generating exactly that many passwords:

$ gen_xkcd_pass 8
flyersepticspantearruinedwoo         (flyer septic span tear ruined woo)
boasgiltCurrywaivegalsAndean         (boas gilt Curry waive gals Andean)
selectpugjoggedlargeArabicbrood      (select pug jogged large Arabic brood)
titshubbubAswancartharmedtaxi        (tits hubbub Aswan cart harmed taxi)
Reaganmodestslowleessamefoster       (Reagan modest slow lees same foster)
tussleFresnoJensentheirsNohhollow    (tussle Fresno Jensen theirs Noh hollow)
Laredoriffplunkbarredhikersrearm     (Laredo riff plunk barred hikers rearm)
demostiffnukesvarlethakegilt         (demo stiff nukes varlet hake gilt)

Of course, as you can see, some fairly obscure words pop out as a result, such as "filt" and "rearm". But then, you could think of it as expanding your vocabulary. If you install the "american-insane" dictionary, then you can get about 650,722 words in your total set, bringing your per-word entropy north of 16-bits. This would allow you to cut your number of generated words down to 5 instead of 6, to keep the 80-bits entropy minimum. But then, you also see far more obscure words than with the standard dictionary, and it will take a touch longer to randomize the file.

This script should be platform agnostic. If not, let me know what isn't exactly working in your shell or operating system, and why, and I'll try to address it.

]]>
https://pthree.org/2015/09/05/password-generation-in-the-shell/feed/ 5
Setting Up A Global VPN Proxy on Android with L2TP/IPSec PSK https://pthree.org/2015/09/04/setting-up-a-global-vpn-proxy-on-android-with-l2tpipsec-psk/ https://pthree.org/2015/09/04/setting-up-a-global-vpn-proxy-on-android-with-l2tpipsec-psk/#comments Fri, 04 Sep 2015 12:00:26 +0000 https://pthree.org/?p=4308 In my last post in this short series, I want to discuss how to setup a transparent proxy on your Android phone using the builtin VPN for L2TP. As usual, the same precautions apply here. Don't be stupid with your data, just because you can hide it from your ISP.

In general, I'm skeptical of VPN service providers, which is partially why I'm writing this post. There isn't a VPN provider on this planet that will go to jail for you. And I don't buy into the hype that they aren't logging your traffic. Too often, VPN providers have been all too hasty to turn over user account information and logs, when Big Brother comes knocking. Instead, install strongSwan on your own L2TP VPN server, in a datacenter you trust to handle your traffic, and configure your Android to use that.

Unlike the previous posts, this one does not require root access. To start, you need to navigate to "Settings -> More -> VPN":

vpn-0 vpn-1

Tap the "+" sign to add a new VPN configuration. In this example, we'll configure it to connect to an L2TP/IPSec PSK VPN. As such, you'll need to fill out the server address (pixelated here), and the IPSec pre-shared key. Give the configuration a name, such as "My VPN", and tap "SAVE".

vpn-2 vpn-3 vpn-4

When tapping on the "My VPN" defined configuration, you will be asked to authenticate with your credentials. These can be from the operating system accounting database, LDAP, NIS, or IPSec specific. Provide your username and password, and tap "Save account information" if you want to save the credentials to disk on the phone. Then tap "CONNECT". At this point, you should see a little key in the status bar, confirming that you are indeed connected to the VPN server. If you want, you can create a "VPN" quick-access widget on your home screen, so you can get immediate access to your "My VPN" configuration with a single tap.

vpn-5 vpn-6 vpn-7

]]>
https://pthree.org/2015/09/04/setting-up-a-global-vpn-proxy-on-android-with-l2tpipsec-psk/feed/ 4
Setting Up A Global Tor Proxy on Android with Orbot https://pthree.org/2015/08/27/setting-up-a-global-tor-proxy-on-android-with-orbot/ https://pthree.org/2015/08/27/setting-up-a-global-tor-proxy-on-android-with-orbot/#comments Thu, 27 Aug 2015 12:00:25 +0000 https://pthree.org/?p=4283 In my last post, I explained how to setup a Global SSH proxy on Android with ConnectBot and ProxyDroid. In this article, I'll do the same thing, but with Orbot. Also, as with the last article, the same precautions apply here. If you're on an untrusted or unknown network, using an encrypted proxy can be helpful. However, just because you're using Tor, doesn't mean you should trust its network blindly either. There are all sorts of practical attacks on Tor that have been reaching the press lately, and you would be wise to read them, and proceed with caution.

With that said, sometimes all you want to do is get around a content filter, such as viewing Reddit at church, or getting on Twitter while at work. Of course, there are necessary risks with those actions as well. Basically, don't be an idiot.

With that out of the way, this requires that you have root access on your phone, and that you have installed the Orbot Android app. Once the app is installed, we really only need to make one adjustment, and that is enabling two check boxes: "Transparent Proxying" and "Tor Everything":

orbot-2

As something you should keep in mind, you may also want to check "Use Bridges". Relay bridges are entry nodes that are not listed in the main Tor directory. As such, it is more difficult for ISPs to filter them. If you suspect that your ISP is blocking all known entry nodes, then using bridges can be helpful to get around the problem. But, using bridges may be unnecessary. Check if your Tor connection is getting filtered first. If so, enable the use of bridges, otherwise, you're just fine using Tor without them.

Also, Orbot has some interesting settings, such as specifically setting a whitelist of entry and exit nodes, and a black list of nodes to avoid. If you know someone is operating a Tor node, and you trust them, then I would recommend setting them as either an entry or exit, whichever is appropriate. The reason for this, is it is not impractical for a well-funded organization to have a large number of entry and exit nodes. If so, they can build traffic profiles on who is connecting to the entry node, and which site they are visiting from the exit. However, by specifying specific nodes for either entry or exit (or both), you eliminate this threat. Sadly enough, I could not get this working with Orbot.

One last setting that has caught my eye, is "Tor Tethering". If you use your phone as a wireless hotspot, or USB tethering, you can also transparently route all the traffic from those connected clients through the Tor proxy. I haven't tested this yet with the latest version, but with previous versions of Orbot, it didn't work.

Other settings are listed below, page after page.

orbot-1 orbot-3 orbot-4 orbot-5 orbot-6

When at the main page of the app, long-tap the power button in the center of the droid, to connect to the Tor network. When the arms of the droid are down, you are not connected. When the arms are yellow, and pointing to the sides of the phone, the app is trying to get a connecting to the Tor network. When the arms are green, pointing up, you are fully connected, and can start enjoying your proxy.

orbot-0 orbot-7 orbot-8

Notice that when you are connected, an onion icon is in the status bar at the top of the phone, showing as a permanent notification. If you have "Expanded Notifications" set, you can get IP address and country information in the notification. If you swipe the droid right or left, the droid will spin, and you will end up with a new "Tor Identity". Basically, you'll be connected to a new set of nodes.

orbot-9 orbot-10 orbot-11

Tapping the "CHECK BROWSER" button at the bottom left of the landing screen will use your default browser app to connect to https://check.torproject.org and verify whether or not transparent proxying over Tor is working.

]]>
https://pthree.org/2015/08/27/setting-up-a-global-tor-proxy-on-android-with-orbot/feed/ 3
Setting Up A Global SSH Proxy on Android with ConnectBot and ProxyDroid https://pthree.org/2015/08/26/setting-up-a-global-ssh-proxy-on-android-with-connectbot-and-proxydroid/ https://pthree.org/2015/08/26/setting-up-a-global-ssh-proxy-on-android-with-connectbot-and-proxydroid/#respond Wed, 26 Aug 2015 12:00:31 +0000 https://pthree.org/?p=4254 I'm one that takes precautions with my data when on unfamiliar or untrusted networks. While for the most part, I trust TLS to handle my data securely, I find that it doesn't take much effort to setup a transparent proxy on my Android handset, to route all packets through an encrypted proxy.

In this case, I happen to work for the greatest ISP in the world, and so I have an SSH server in the datacenter. I wholly trust the network from my SSH server to the border routers, so the more traffic I can send that direction, the better. I realize that may not be the case for all of you. However, if you have an externally available SSH server on a trusted network, this post may be of interest.

First, setting up this proxy requires having root. I'm not going to cover how to get root in this post. You can find it elsewhere. Next, you'll need to apps installed; namely ConnectBot and ProxyDroid. Both are Free Software apps. Also, you can do this with SSH Tunnel on its own, if you have Android 4.2.2 or older. Unfortunately, it doesn't work for 4.3 and newer. I have Android 5.1, and it isn't setting up the firewall rules correctly.

Once they are installed, you'll want to set them up. Here I walk through setting up ConnectBot.

  1. Pull up ConnectBot from your app drawer, and setup a new connection by typing in the username, host, and optionally port.
  2. When asked if you want to accept the server's public SSH key, verify the key, then tap "YES"
  3. Enter in your password to connect, and verify that you can successfully connect to the remote SSH server.
  4. Now, disconnect, sending you back to the app's landing screen.

connectbot-1 connectbot-2 connectbot-3

  1. At this point, long-tap the SSH profile you just created, and tap "Edit port forwards".
  2. Tap the menu in the upper-right hand corner of the profile, and tap "Add port forward".
  3. Give the forward a nickname, such as "ProxyDroid".
  4. Tap "Dynamic (SOCKS)" from the list under "Type".
  5. Provide any source port. It must be above 1024, and cannot be currently in use. I find "1984" apropos.
  6. Leave the "Destination" blank, and tap "CREATE PORT FORWARD".

connectbot-4 connectbot-5 connectbot-6 connectbot-7

You now have sucessfully created a SOCKS listening port on localhost:1984. Now, we need to create software firewall rules in the phone, to globally forward all packets through localhost on port 1984, creating our transparent proxy. As such, pull up ProxyDroid, and I'll walk you through setting that up:

  1. In ProxyDroid, set "127.0.0.1" as the "Host".
  2. Match the port with what you set in ConnectBot's port forward ("1984" in our example).
  3. Set the "Proxy Type" to "SOCKS5"
  4. Scroll to the bottom of the app, and check the checkbox for "Global Proxy".
  5. OPTIONAL: Check the checkbox for "DNS Proxy".

That last step will tunnel DNS requests through the proxy also. Unfortunately, I have found it to be buggy, and unstable. So, leaving it unchecked, unfortunately, gives you a stable encrypted SSH proxy experience.

 

Now that both are configured, connect to your remote SSH server with ConnectBot that you have configured, then enable the proxy by tapping the slider next to "Proxy Switch". You should have a running global SSH proxy from your smartphone to the remote SSH server, where all packets are being sent. You can visit a site that returns your external IP address, such as http://findmyipaddress.com/, to verify that the source IP address of the HTTP request is the same IP address as your SSH server. If so, your packets are being tunneled through your SSH connection.

proxy-running

]]>
https://pthree.org/2015/08/26/setting-up-a-global-ssh-proxy-on-android-with-connectbot-and-proxydroid/feed/ 0
md5crypt() Explained https://pthree.org/2015/08/07/md5crypt-explained/ https://pthree.org/2015/08/07/md5crypt-explained/#comments Fri, 07 Aug 2015 13:25:20 +0000 https://pthree.org/?p=4215 Recently, the Password Hashing Competition announced its winner, namely Argon2, as the future of password hashing. It's long since been agreed that using generic-purpose cryptographic hashing algorithms for passwords is not a best practice. This is due to their speed. Cryptographic hashing algorithms are designed to be lighting fast, while also maintaining large margins of security. However, Poul-Henning Kamp noticed in the early 1990s that the DES-based crypt() function was no longer providing the necessary margins of security for hashing passwords. He noticed how fast crypt() had become, and that greatly bothered him. Even worse, was the realization that FPGAs could make practical attacks against crypt() in practical time. As he was the FreeBSD release engineer, this meant putting something together that was intentionally slow, but also with safe security margins. He chose MD5 as the basis for his new "md5crypt password scrambler", as he called it.

Before delving into the algorithm, the first thing you'll notice is the strange number of steps and mixing that PHK does with his md5crypt() algorithm. When I was reading the algorithm, the first question that popped into my mind was: "Why not just do standard key-stretching with the password?" Something like this (pseudocode):

digest = md5(password + salt).digest()
rounds = 1000
while rounds > 0:
  digest = md5(password + salt + digest).digest()
  counter -= 1

This certainly seems to be the most straightforward approach, and the entirety of the security is based on the cryptographic security of MD5. If you were concerned about the output digest being recognizable, it might make sense to scramble it. You could scramble the remaining bytes in a deterministic fashion, which PHK actually ends up doing before saving to disk.

But then it hit me: PHK wanted his new algorithm to be intentionally slow, even if using MD5. This means adding additional steps to mixing the password, which requires more CPU, and thus, more time. If raw MD5 could process 1,000,000 hashes per second, then standard key-stretching of 1,000 iterations would bring it down to 1,000 hashes per second. However, if adding additional operations slows it down by 1/N-iterations, the the resulting throughput would be 1,000/N hashes per second. I can see it now- anything to slow down the process, without overburdening the server, is a gain. As such, the md5crypt() function was born.

Here is the algorithm, including what I think may be a bug:

  1. Set some constants:
    "pw" = user-supplied password.
    "pwlen" = length of "pw".
    "salt" = system-generated random salt, 8-characters, from [./0-9A-Za-z].
    "magic" = the string "$1$".
    "itoa64" = is our custom base64 string "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    
  2. Initialize digest "a", and add the password, magic, and salt strings to it:
    da = MD5.init()
    da.update(pw)
    da.update(magic)
    da.update(salt)
    
  3. Initialize digest "b", and add the password, salt, and password strings to it:
    db = MD5.init()
    db.update(pw)
    db.update(salt)
    db.update(pw)
    final = db.digest()
    
  4. Update digest "a" by repeating digest "b", providing "pwlen" bytes:
    for(pwlen; pwlen > 0; pwlen -= 16):
      if(pwlen > 16):
        da.update(final)
      else:
        da.update(final[0:pwlen])
    
  5. Clear virtual memory
    memset(final, 0, length(final))
    
  6. Update digest "a" by adding a character at a time from either digest "final" or from "pw" based on each bit from "pwlen":
    for(i = pwlen; i; i >>= 1):
      if i % 2 == 1:
        da.update(final[0])
      else:
        da.update(pw[0])
    dc = da.digest()
    
  7. Iterate 1,000 times to prevent brute force password cracking from going to fast. Mix the MD5 digest while iterating:
    for(i=0; i<1000; i++)
      tmp = MD5.init()
      if i % 2 == 0:
        tmp.upate(dc)
      else:
        tmp.update(pw)
      if i % 3 == 0:
        tmp.update(salt)
      if i % 7 == 0:
        tmp.update(pw)
      if i % 2 == 0:
        tmp.update(pw)
      else:
        tmp.update(dc)
      dc = tmp.digest()
    
  8. Convert 3 8-bit words of digest "c" into 4 6-bit words:
    final = ''
    for a, b, c in ((0, 6, 12), (1, 7, 13), (2, 8, 14), (3, 9, 15), (4, 10, 5)):
      v = ord(dc[a]) < < 16 | ord(dc[b]) << 8 | ord(dc[c])
      for i in range(4):
        final += itoa64[v & 0x3f]
        v >>= 6
    v = ord(dc[11])
    for i in range(2):
      final += itoa64[v & 0x3f]
      v >>= 6
    
  9. Clear virtual memory:
    memset(dc, 0, length(dc))
    

Notice that between steps 5 and 6, the virtual memory is cleared, leaving the digest "final" as NULLs. Yet, in step 6, the for-loop attempts to address the first byte of digest "final". It seems clear that PHK introduced a bug in this algorithm, that was never fixed. As such, every implementation must add a C NULL in step 6, instead of final[0]. Otherwise, you will end up with a different output than the original source code by PHK.

Anyway, that's the algorithm behind md5crypt(). Here's a simple Python implementation that creates valid md5crypt() hashes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from hashlib import md5

# $ mkpasswd --method='md5' --salt='2Z4e3j5f' --rounds=1000 --stdin 'toomanysecrets'
# $1$2Z4e3j5f$sKZptx/P5xzhQZ821BRFX1

pw = "toomanysecrets"
salt = "2Z4e3j5f"

magic = "$1$"
pwlen = len(pw)
itoa64 = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

# Start digest "a"
da = md5(pw + magic + salt)

# Create digest "b"
db = md5(pw + salt + pw).digest()

# Update digest "a" by repeating digest "b", providing "pwlen" bytes:
i = pwlen
while i > 0:
    da.update(db if i > 16 else db[:i])
    i -= 16

# Upate digest "a" by adding either a NULL or the first char from "pw"
i = pwlen
while i:
    da.update(chr(0) if i & 1 else pw[0])
    i >>= 1
dc = da.digest()

# iterate 1000 times to slow down brute force cracking
for i in xrange(1000):
    tmp = md5(pw if i & 1 else dc)
    if i % 3: tmp.update(salt)
    if i % 7: tmp.update(pw)
    tmp.update(dc if i & 1 else pw)
    dc = tmp.digest()

# convert 3 8-bit words to 4 6-bit words
final = ''
for x, y, z in ((0, 6, 12), (1, 7, 13), (2, 8, 14), (3, 9, 15), (4, 10, 5)):
    # wordpress bug: < <
    v = ord(dc[x]) << 16 | ord(dc[y]) << 8 | ord(dc[z])
    for i in range(4):
        final += itoa64[v & 0x3f]
        v >>= 6
v = ord(dc[11])
for i in range(2):
    final += itoa64[v & 0x3f]
    v >>= 6

# output the result
print "{0}${1}${2}".format(magic, salt, final)

Ulrich Drepper created a "sha256crypt()" as well as "sha512crypt()" function, which is very similar in design, and which I'll blog about later.

It's important to note, that while PHK may have announced md5crypt() as insecure, it's not for the reasons you think. Yes, MD5 is broken, horribly, horribly broken. However, these breaks only deal with the compression function and blind collision attacks. MD5 is not broken with preimage or second preimage collisions. In the case of a stored md5crypt() hash, it requires either a brute force search or a preimage attack to find the plaintext that produced the hash. MD5 is secure with preimage attacks. The reason md5crypt() has been deemed as "insecure", is because MD5 is fast, fast, fast. Instead, password hashing should be slow, slow, slow, and no amount of creativity with MD5 can adequately address its performance. As such, you should migrate to a password hashing solution designed specifically to slow attackers, such as bcrypt or scrypt, with appropriate parameters for security margins.

]]>
https://pthree.org/2015/08/07/md5crypt-explained/feed/ 1
The Chaocipher With Playing Cards https://pthree.org/2015/07/09/the-chaocipher-with-playing-cards/ https://pthree.org/2015/07/09/the-chaocipher-with-playing-cards/#comments Thu, 09 Jul 2015 15:26:21 +0000 https://pthree.org/?p=4143 As you know, I am a cryptography hobbyist. More specifically, I have an interest in pencil and paper ciphers, also referred to as "hand ciphers" or "field ciphers". Since Bruce Schneier released his Solitaire Cipher for Neal Stephenson's book "Cryptonomicon" (known in the book as "Pontifex"), I have had a real desire to learn hand ciphers with playing cards, that I'll refer to as "card ciphers".

Further, in 1918, John F. Byrne invented a mechanical encryption system that he called "Chaocipher". He released an autobiography titled "The Silent Years", of which he describes the system without the algorithm, and releases a series of exhibits of ciphertexts for cryptography experts to break.

Unfortunately, because he didn't release the algorithm I think, no one took his encryption system seriously, despite his best efforts to get the War Department to use it. It wasn't until 2010 that the widow of John F. Byrne's son released the Chaocipher papers, mechanics, and artifacts to the National Cryptologic Museum in Maryland, that we finally fully understood the algorithm.

In this post, I am going to describe the algorithm using playing cards, whereas John F. Byrne's original invention required two circular rotating disks. Another hindering aspect to Byrne's invention was the mechanical engineering required. At best, the device is clunky and awkward to use. However, using playing cards, I think you'll find the system much more elegant, easier to carry around, and has the advantage of not being incriminating that you are carrying a cryptographic device. Playing cards were certainly available in the early 1900s, so it's unfortunate that he didn't think of using playing cards as the primary mechanism for the cipher.

Setup

The Chaocipher uses the concept of lookup tables to encrypt or decrypt a message. This is done by maintaining two separate alphabets. When encrypting a message, a character is located in the plaintext alphabet, and it's location is noted. Then, the ciphertext character is identified by locating the character in the ciphertext alphabet at the same position. After the ciphertext character has been recorded, both the plaintext and ciphertext alphabets are permuted. We'll get into those details in a moment, but first, let's set aside some definitions.

Because John F. Byrne's original invention required two circular disks, there are two definitions that you should be aware of:

Zenith
The top of the wheel or circle. In our case, this will be the top of the pile.

Nadir
The bottom of the wheel or circle. In our case, this will be in the middle of the pile (the 14th of 26 cards).

Deck
All 52 cards in a standard poker deck.
Pile
Either the red playing cards (Diamonds and Hearts) dedicated to the ciphertext alphabet, or the black playing cards (Clubs and Spades) dedicated to the plaintext alphabet. Each pile is exactly 26 cards.

Left alphabet
The red pile of playing cards containing the ciphertext characters A-Z.
Right alphabet
The black pile of playing cards containing the plaintext characters A-Z.

We will be treating our two piles (the red and black piles) as circular. The piles will always be face-up on the table and in our hands. The top card in the face-up pile will be the 1st card while the bottom card will be the 26th card. Because the pile is circular in nature, this means that the top and bottom cards in the pile are "next" to each other in succession. This means further, then, that the 14th card in the pile is our nadir, while the top card, or the 1st card in the pile, is our zenith.

Now that we've set that aside, we need to create some definitions so we know exactly which playing card in every suit is assigned to which English alphabet character. I've assigned them as follows:

Hearts and Spades Clubs and Diamonds
A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

This means that the English character "X" would be the Jack of Clubs in the plaintext "black" pile ("right alphabet" in Chaocipher-speak), and the Jack of Diamonds in the ciphertext "red" pile ("left alphabet" in Chaocipher-speak). This also means that the 8 of Spades would be the English character "H", just as much as the 8 of Hearts.

On a side note, if you wish to program this in software, and you populate an array with the values of 1 through 52 to represent each card in the deck, it's standard to use 1-13 for the Clubs, 14-26 for the Diamonds, 27-39 for the Hearts, and 40-52 for the Spades ("bridge order").

Algorithm

The algorithm can comprise of a standard "simple" set of steps, or using a more advanced "takeoff pattern" for enciphering and deciphering the message. First, let me discuss the simple pattern, then I'll look at the more advanced takeoff pattern.

An overview of the algorithm could be described as follows:

  1. Determine the ciphertext character according to the plaintext character (and vice versa for decryption).
  2. Permute the red pile.
  3. Permute the black pile.

Each of these three steps are executed in order until the message is exhausted.

Enciphering Plaintext

Looking at it in closer detail, suppose I had the following red and black piles (using "+" to identify the zenith, and "*" to identify the nadir):

            +                                      *
  red (ct): 7D 3H TH 2H JD AD 8H 8D 5H TD QH 9H JH 2D 6D KH QD 9D 5D KD AH 7H 6H 4H 4D 3D
black (pt): TC 3S JS 2C 5S AC 4C KC 9S TS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS 2S 5C KS 6C

If I wanted to encrypt the character "A", in the black deck, according to our table above, that would be the Ace of Spades. As such, I need to find the Ace of Spades in my black pile. While locating the card, I need to be counting, so I know the position in the pile that the Ace of Spades in in. In this case, the Ace of Spades is the 22nd card in the black pile. Thus, the 22nd card in the red pile is the Seven of Hearts:

            +                                      *                       ↓
  red (ct): 7D 3H TH 2H JD AD 8H 8D 5H TD QH 9H JH 2D 6D KH QD 9D 5D KD AH 7H 6H 4H 4D 3D
black (pt): TC 3S JS 2C 5S AC 4C KC 9S TS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS 2S 5C KS 6C

The Seven of Hearts produces the English character "G". Thus, with these two piles, "A" encrypts to "G" before permutation. Conversely, "G" would decrypt to "A" with these starting piles.

Permuting the Red and Black Piles

Now that we've discovered our plaintext and ciphertext characters, we need to cut the deck, such that both the plaintext and ciphertext characters are at the zenith of each pile. The resulting piles would then be as follows:

            +                                      *
  red (ct): 7H 6H 4H 4D 3D 7D 3H TH 2H JD AD 8H 8D 5H TD QH 9H JH 2D 6D KH QD 9D 5D KD AH
black (pt): AS 2S 5C KS 6C TC 3S JS 2C 5S AC 4C KC 9S TS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C

Permuting the Red Pile

Permuting the red pile follows the following steps:

  1. Remove the zenith + 1 card (2nd card) from the red pile.
  2. Place the removed card into the nadir of the red pile (will be the 14th card).

So, we'll follow these steps by taking the zenith + 1 card (2nd card), which is the "6H", and placing it at the nadir of the red pile (14th card). The resulting red pile will look as follows:

            +                                      *
  red (ct): 7H .. 4H 4D 3D 7D 3H TH 2H JD AD 8H 8D 5H TD QH 9H JH 2D 6D KH QD 9D 5D KD AH

            +                                      *
  red (ct): 7H 4H 4D 3D 7D 3H TH 2H JD AD 8H 8D 5H .. TD QH 9H JH 2D 6D KH QD 9D 5D KD AH

            +                                      *
  red (ct): 7H 4H 4D 3D 7D 3H TH 2H JD AD 8H 8D 5H 6H TD QH 9H JH 2D 6D KH QD 9D 5D KD AH

Permuting the Black Pile

Permuting the black pile follows the following steps:

  1. Take the zenith (top) card and place it at the bottom of the black pile.
  2. Remove the zenith + 2 card (3rd card) from the black pile.
  3. Place the removed card into the nadir of the black pile (will be the 14th card).

So, we'll follow these steps by taking the zenith card (top card), which is the "AS", and placing it at the bottom of the black pile. The resulting black pile will look as follows:

            +                                      *
black (pt): 2S 5C KS 6C TC 3S JS 2C 5S AC 4C KC 9S TS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS

Now take the zenith + 2 (3rd card), which is the "KS" and place it at the nadir of the black pile (14th card). The final black pile will look as follows:

            +                                      *
black (pt): 2S 5C .. 6C TC 3S JS 2C 5S AC 4C KC 9S TS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS

            +                                      *
black (pt): 2S 5C 6C TC 3S JS 2C 5S AC 4C KC 9S TS .. 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS

            +                                      *
black (pt): 2S 5C 6C TC 3S JS 2C 5S AC 4C KC 9S TS KS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS

As such, both the red and black piles should look like the following after enciphering the plaintext character "A" and permuting both piles:

            +                                      *
  red (ct): 7H 4H 4D 3D 7D 3H TH 2H JD AD 8H 8D 5H 6H TD QH 9H JH 2D 6D KH QD 9D 5D KD AH
black (pt): 2S 5C 6C TC 3S JS 2C 5S AC 4C KC 9S TS KS 9C 6S 7S 8S QS QC 7C JC 4S 3C 8C AS

To summarize, the algorithm steps are as follows:

  1. Find the plaintext character in the black pile.
  2. Record the position of this card in the black pile.
  3. Find the ciphertext character in the red pile by counting to that position.
  4. Bring the plaintext character to the zenith by cutting the deck at that position.
  5. Bring the ciphertext character to the zenith by cutting the deck at that position.
  6. Permute the red pile:
    1. Remove the zenith + 1 card from the red pile (2nd card).
    2. Insert the removed card into the nadir of the red pile (14th location).
  7. Permute the black pile:
    1. Move the card at the zenith to the bottom of the black pile.
    2. Remove the zenith + 2 card from the black pile (3rd card).
    3. Insert the removed card into the nadir of the black pile (14th location).

Make sure you understand these steps before continuing.

Permuting with a Takeoff Pattern

John F. Byrne described a "takeoff pattern" in which the left and right alphabets are used for both the plaintext and ciphertext characters. In the simple method, the right alphabet (black pile) is used exclusively for all plaintext characters in the message. So, if the plaintext message was "ATTACKATDAWN", then you could think of using the right pile 12 times, or "RRRRRRRRRRRR" ("BBBBBBBBBBBB" if we're thinking "black pile").

However, suppose you would like to use both of the red and black piles (left and right alphabets respectively) for your plaintext message. Then you could create a "takeoff pattern" for encrypting your text. Suppose you used the following takeoff pattern: "RLRRLLRRRLLL" (right, left, right, right, left, left, right, right, right, left, left, left). This means that you would use the right alphabet for the first plaintext character, then the left alphabet for the second plaintext character, the right alphabet for the 3rd, the right alphabet for the 4th, etc. Or, if using playing cards, you could think of the same takeoff pattern as "BRBBRRBBBRRR" (black, red, black, black, red, red, black, black, black, red, red, red).

Personally, I don't care for the takeoff pattern for two main reasons: first, the takeoff pattern needs to be communicated with the key. This may not be a problem if code books are distributed among field agents, as the takeoff pattern can be printed on the same page as the key. However, this does mean that the takeoff pattern needs to be as long as the key itself.

The second reason I don't care for the take of pattern, is due to the unnecessary complexity of the takeoff pattern itself, it greatly increases the chances to make a mistake. Already, the sender and recipient will be going back and forth frequently between the red and black pile of cards. By creating a takeoff pattern, this makes that back and forth more frequent. Further, if you are using the 3rd of 5 "L"s in a stream, but you think you are on the 4th "L", then the encryption or decryption will be wrong from there out. Chaocipher doesn't have the ability to correct itself from a mistake.

For these two reasons, I suggest that when using playing cards with the Chaocipher, that instead you always use the black pile for the plaintext characters, and the red pile for the ciphertext characters. Then, the only thing that you need to keep track of is the characters in the message itself.

Keying the Deck

Before executing the Chaocipher algorithm, the deck should be "keyed". This refers to the order of the deck. Both the sender and the recipient must have the same deck order in order to successfully encrypt and decrypt a message. The deck can be keyed by either a sufficient set of shuffling and cutting, or keyed with a key phrase. First, let's look at thoroughly shuffling and cutting a full 52-card deck.

Keying with Shuffling and Cutting

Suppose after thoroughly shuffling and cutting the deck, the deck order face-up is as follows:

|< - top                                                                                                                                         bottom ->|
3H 8D QH 4C 6S QS 8C 4D 9S 5D 8S QC 3C 6H JS 7H 5S TS QD 7C 4H JC KD TH 3S KS 6D 9C 9D 2C JD 2H 2D 6C 8H KC 9H JH 7S KH AS AH 5C AD TC 7D 4S 3D 2S TD 5H AC

We now need a deterministic algorithm for separating the red cards from the black cards. Holding the deck face-up in your hand, deal out two face-down piles, the left pile of red cards, and the right pile of black cards. Do this card-for-card, one-at-a-time. Do not grab a bunch of similarly-colored cards. This can introduce error into the keying process. Doing it one-at-a-time ensures exactness, and minimizes the chances for mistake.

After the full deck has been dealt into two face-down piles, turn the piles over, so they are face-up. Using the standard Chaocipher tokens of "+" to identify the zenith, or top of the pile, and the "*" to identify the nadir, or 14th card in the pile, your two piles should be in the following order:

            +                                      *
  red (ct): 3H 8D QH 4D 5D 6H 7H QD 4H KD TH 6D 9D JD 2H 2D 8H 9H JH KH AH AD 7D 3D TD 5H
black (pt): 4C 6S QS 8C 9S 8S QC 3C JS 5S TS 7C JC 3S KS 9C 2C 6C KC 7S AS 5C TC 4S 2S AC
            -----------------------------------------------------------------------------
  position:  1  2  3  4  5  6  7  8  9  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2
                                        0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5  6

Verify that you can do this by hand, and that it matches with the deck order above. Remember, the red pile is our "left alphabet" in the Chaocipher which contains all ciphertext English characters. The black pile is our "right alphabet" in the Chaocipher which contains all plaintext English characters. In other words, if we converted them to English characters, then the left and right alphabets would be as follows, using the same notation to identify the zenith and nadir:

            +                         *
 left (ct): C U L Q R F G Y D Z J S V X B O H I K M A N T P W E
right (pt): Q F L U I H Y P K E J T X C M V O S Z G A R W D B N
            ---------------------------------------------------
  position: 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
                              0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

Keying with a Key Phrase

Already knowing the algorithm prepares you for using a key phrase to key the deck. Basically, you'll just use the characters in your key phrase as the plaintext message, using the black pile to find key key phrase character, just as you would if encrypting a message. Both piles will be permuted, as normal. The only difference is that you will be not be recording the ciphertext characters. Further, you will start with alphabetized piles.

Both piles will start with the following order:

            +                                      *
 left (ct): AH 2H 3H 4H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4D 5D 6D 7D 8D 9D TD JD QD KD
right (pt): AS 2S 3S 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC

Suppose our key phrase is "CHAOCIPHER". Then, working through the steps character for character, they would follow the following order:

Locate "C" in the black pile:
            +                                      *
 left (ct): AH 2H 3H 4H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4D 5D 6D 7D 8D 9D TD JD QD KD
right (pt): AS 2S 3S 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC

Bring both characters to zenith:
            +                                      *
 left (ct): 3H 4H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 3S 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC AS 2S

Permute the red pile. Remove the zenith + 1 card:
            +                                      *
 left (ct): 3H .. 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 3S 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC AS 2S

            +                                      *
 left (ct): 3H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D .. 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 3S 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC AS 2S

Insert the card into the nadir:
            +                                      *
 left (ct): 3H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4H 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 3S 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC AS 2S

Permute the black pile. Move the top card to the bottom:
            +                                      *
 left (ct): 3H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4H 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 4S 5S 6S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC AS 2S 3S

Remove the zenith + 2 card:
            +                                      *
 left (ct): 3H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4H 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 4S 5S .. 7S 8S 9S TS JS QS KS AC 2C 3C 4C 5C 6C 7C 8C 9C TC JC QC KC AS 2S 3S

            +                                      *
 left (ct): 3H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4H 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 4S 5S 7S 8S 9S TS JS QS KS AC 2C 3C 4C .. 5C 6C 7C 8C 9C TC JC QC KC AS 2S 3S

Insert the card into the nadir:
            +                                      *
 left (ct): 3H 5H 6H 7H 8H 9H TH JH QH KH AD 2D 3D 4H 4D 5D 6D 7D 8D 9D TD JD QD KD AH 2H
right (pt): 4S 5S 7S 8S 9S TS JS QS KS AC 2C 3C 4C 6S 5C 6C 7C 8C 9C TC JC QC KC AS 2S 3S

Repeat for "H", "A", "O", "C", "I", "P", "H", "E", & "R".

When you are finished keying the deck with the key phrase "CHAOCIPHER", you should have the following order for the red and black piles:

            +                                      *
 left (ct): 6D JH 7D 8D 9D 2D KH JD QD 2H 3H 6H 7H 8H 9H TH QH AD KD AH TD 3D 4H 5H 4D 5D
right (pt): 4S AS 3S JS 5S 7S 9S QS AC 2C 3C 4C 6S 2S 6C 8S 7C 8C KS TS 9C TC JC QC KC 5C

Enhancements

Initialization Vectors

One thing that we have learned with modern computer encryption primitives is to prepend initialization vectors to the ciphertext. The initialization vector must be random and unpredictable. However, its function is to create a unique state on the system before the plaintext is encrypted or before the ciphertext is decrypted. The point is to modify a secret state (our key, or pile orders) while ignoring the output. By adding an initialization vector to the system, we limit the effectiveness of attacks on the ciphertext. For example, if the initialization vector is 26 characters long (one character for each character in the English alphabet, or 26! total combinations), then 25 collisions on one initialization vector to launch an attack on the state (the last element can be determined by process of elimination).

Unfortunately, a 26-character initialization vector is not very practical to use by hand. Knowing that it is standard for field agents to break up their messages into blocks of five characters, it would seems reasonable to use a 5-character initialization vector. However, this doesn't seem to mix the state well enough.

For example, consider using an unkeyed deck to encrypt the text "AARON" 10 times with different initialization vectors at each round:

BSTPR VUYVS
LOKJY WJXTR
YFTLN WJLHQ
UAOZP UTVIV
YXTXU VILUH
WGQCJ UTLUE
LYPYZ WHYSS
QHESJ VHKEQ
CLZRN WVMRE
FEKEQ VUKDR

The first five characters in each ciphertext is the initialization vector, randomly generated. The second block of 5 characters is my name encrypted after the initialization vector keyed the deck. Notice that the first character in the second block seems to have a lot of "V"s and "W"s. If I do 100 rounds, and count the frequency of the first character, I get the following:

F:1, L:1, G:3, K:3, T:4, I:7, J:7, U:7, H:8, W:14, V:45

That is not a good distribution of characters for the first plaintext character being an "A" over 100 different initialization vectors. I would expect it to be much more diffuse. So, how about instead of a 5-character initialization vector, we bump it to 10? How does the frequency distribution look then?

A:1, I:1, U:1, V:1, X:1, Z:1, T:2, E:3, G:3, N:5, O:5, Q:5, B:6, C:6, F:7, D:10, S:11, R:13, P:18

That's a little bit better. A 26-character initialization vector would certainly show a flatter frequency distribution for the first ciphertext character in the message. However, as mentioned, that's cumbersome. So, at this point, it's up to you. Using a 5-character initialization vector would provide about 10 or 11 possible first ciphertext characters. Using a 10-character initialization vector increases that to about 18 with a flatter distribution.

PKCS#7 Padding

As mentioned, it has become a field cipher standard to separate your ciphertext into blocks of 5 characters. This means that if your message is not a multiple of 5 characters, to add padding at the end until it is. However, when the recipient decrypts the message, it should be unambiguous exactly what is padding, and what is not. The padding in PKCS#7 makes this possible.

We can define this easily enough be determining exactly how many characters must be added to pad the message into multiples of 5 characters. So, we'll count:

  • If the message needs only one character appended, append a single "V".
  • If the message needs two characters appended, append "WW".
  • If the message needs three characters appended, append "XXX".
  • If the message needs four characters appended, append "YYYY".
  • If the message is already a multiple of five characters, append "ZZZZZ".

By using the padding described above, after decrypting the message, the recipient needs to only look at the last character to determine exactly how many characters make up the padding, and to strip from the plaintext.

To illustrate this, let's take an unkeyed deck, add a 5-character initialization vector, and encrypt the message "ATTACK AT DAWN". This message is only 12 characters, so I would need to add "XXX" at the end of the message according to the definition above. This my message becomes (removing spaces) "ATTACKATDAWNXXX". Adding the 5-character initialization vector "KEQPN" then encrypting, I get the following result:

plaintext: ATTACK AT DAWN
initialization vector: KEQPN
padding: XXXX

ciphertext: KEQPN XLHTT PRUCA FHUEC

Of course, decrypting "KEQPN XLHTT PRUCA FHUEC" and removing the initialization vector "KEQPN" will reveal "ATTACKATDAWNXXX". It's clear to the recipient that "XXX" is padding, and can be stripped without affecting the plaintext.

Conclusion

This has been a lengthy post, and I commend you for reading this far. The Chaocipher is an interesting algorithm, and I'll be studying its properties as time moves forward. I think the Chaocipher fits well as playing card cipher, and gets as close to "bare metal" as you can without designing an actual mechanical mechanism with two rotating disks and removable character tiles. Playing cards are easy to carry around with you in your pocket, so its portability is nice.

Further, we can increase the strength of the algorithm, as mentioned, by adding an initialization vector at the start of the message, and by adding padding, we can stick with the standard of 5-character blocks in our ciphertext. Of course, this means adding 6-10 additional characters, but for a 160-character message, this doesn't seem too cumbersome.

There are some things that I have observed while using playing cards for the cipher hardware. First, encrypting and decrypting are slow. It takes me about a minute to encrypt/decrypt two-three characters. So, for a 160-character message, it could take the better part of an hour to work through.

Second, due to its slow speed, you may get tempted to try and speed things up a bit, so you can work through the message more quickly. However, this drastically opens you up to mistakes. I was encrypting the plaintext "JELLY LIKE ABOVE THE HIGHWIRE SIX QUACKING PACHYDERMS KEPT THE CLIMAX OF THE EXTRAVAGANZA IN A DAZZLING STATE OF FLUX" over and over. After about 10 ciphertexts, I wrote a Python script to automate the process for me. Only 1 of the ciphertexts was 100% accurate, character-for-character. And, unfortunately, 1 ciphertext was 0% accurate, with every character in the message incorrect. However, on the other 8 messages, I seemed to maintain accuracy for about 2/3 of the characters on most messages. Some others, I made a mistake more early on. Regardless, the point is, I was making frequent mistakes, despite my best effort to not do so. Only 1 out of 10 ciphertexts would decrypt cleanly. It might be worth having two decks, one for producing the ciphertext character, and one for double-checking your work. Of course, this slows you down further, but could be doable for minimizing mistakes.

However, the Chaocipher with playing cards is a fun cipher to work, and easy once you get the hang of it. I would recommend using plastic playing cards, such as the ones from Kem, Copag, or Bicycle Prestige. This way, the cards don't get gummed up like paper cards, are washable, last longer due to their extra durability, and overall, just are a better feeling playing card.

If you work the Chaocipher with playing cards, let me know what you think.

]]>
https://pthree.org/2015/07/09/the-chaocipher-with-playing-cards/feed/ 1
The Kidekin TRNG Hardware Random Number Generator https://pthree.org/2015/06/20/the-kidekin-trng-hardware-random-number-generator/ https://pthree.org/2015/06/20/the-kidekin-trng-hardware-random-number-generator/#comments Sat, 20 Jun 2015 17:11:33 +0000 https://pthree.org/?p=4117 Yesterday, I received my Kidekin TRNG hardware random number generator. I was eager to purchase this, because on the Tindie website, the first 2 people to purchase the RNG would get $50 off, making the device $30 total. I quickly ordered one. Hilariously enough, I received a letter from the supplier that I was their first customer! Hah!

Image of the Kidekin Digital TRNG

Upon opening the package, I noticed the size of the TRNG. It's roughly 10.5 cm from end-to-end which makes it somewhat awkward for a device sitting in your USB port on your laptop. It would work fine sitting in the back of a desktop or server, out of the way, but on my Thinkpad T61, it's a bit large to be sitting there 24/7 feeding my kernel CSPRNG.

Plugging the device in, the kernel actually sees two USB devices, not just one, and sets them up as /dev/ttyUSB0 and /dev/ttyUSB1. Curious. Downloading the software ZIP file from their webpage, and looking through it, the following UDEV rules are provided:


$ cat /etc/udev/rules.d/98-kidekin.rules 
#SYMLINK+= method works on more systems, if it does not on your system, please switch to the NAME= method.

#disable the unused port.
#SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="00", NAME="kidekin_dont_use", MODE="0000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"
SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="00", SYMLINK+="kidekin_dont_use", MODE="0000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"

#connect kidekin TRNG to /dev/random
#SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", NAME="kidekin_trng", MODE="0777", RUN+="/bin/stty raw -echo -crtscts -F /dev/kidekin_trng speed 3000000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"
SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", SYMLINK+="kidekin_trng", MODE="0777", RUN+="/bin/stty raw -echo -crtscts -F /dev/kidekin_trng speed 3000000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"
SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", RUN+="/etc/init.d/rng-tools restart"

This is a bit assuming, and a bit overdoing it IMO, so I simplified it, and setup the following:

SUBSYSTEM=="tty", ATTRS{interface}=="kidekin_trng", ATTRS{bInterfaceNumber}=="01", SYMLINK+="kidekin", MODE="0777", RUN+="/bin/stty raw -echo -crtscts -F /dev/kidekin speed 3000000", ENV{ID_MM_DEVICE_IGNORE}="1", ENV{ID_MM_CANDIDATE}="0"

This avoids setting up a "do not use" symlink for the unnecessary USB device, and changes the symlink of the usable USB device to /dev/kidekin. This also doesn't restart rngd(8), as I'll administer that on my own. At this point, I am ready for testing.

First and foremost, I wanted to test its throughput:

$ dd if=/dev/kidekin count=1G | pv -a > /dev/null
[ 282KiB/s]

The device held stable at 282 KBps or roughly 2.2 Mbps. This is 75.2 KBps per dollar for my $30 purchase. Not bad.

The Kidekin is based on astable free running oscillators, or multivibrators. Unfortunately, a security proof does not accompany the device. So, while this may hold up to the suite of randomness tests, the output may not be cryptographically secure, and could also potentially be backdoored, as verifying the hardware is not easily doable. So, let's see if it at least holds up to the randomness tests. I created a 256 MB file, and ran the standard suites of tests:

$ dd if=/dev/kidekin of=entropy.kidekin bs=1M count=256 iflag=fullblock
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 928.326 s, 289 kB/s

At this point, I can start my testing. First, let's quantify the amount of entropy per byte, as well as some basic tests with ent(1):

$ ent entropy.kidekin
Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size
of this 268435456 byte file by 0 percent.

Chi square distribution for 268435456 samples is 248.92, and randomly
would exceed this value 59.56 percent of the times.

Arithmetic mean value of data bytes is 127.4924 (127.5 = random).
Monte Carlo value for Pi is 3.141825693 (error 0.01 percent).
Serial correlation coefficient is -0.000003 (totally uncorrelated = 0.0).

Everything good so far. How about the FIPS 140-2 tests for randomness:

$ rngtest < entropy.kidekin
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 2147483648
rngtest: FIPS 140-2 successes: 107292
rngtest: FIPS 140-2 failures: 82
rngtest: FIPS 140-2(2001-10-10) Monobit: 14
rngtest: FIPS 140-2(2001-10-10) Poker: 13
rngtest: FIPS 140-2(2001-10-10) Runs: 26
rngtest: FIPS 140-2(2001-10-10) Long run: 30
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=317.891; avg=7386.982; max=19073.486)Mibits/s
rngtest: FIPS tests speed: (min=6.563; avg=109.376; max=114.901)Mibits/s
rngtest: Program run time: 19261018 microseconds
$ echo $?
1

Again, so far so good. Some failures are expected with random input of this size. 82 failures versus 107292 successes is right on par with the tests. Now the Dieharder battery of tests:

$ dieharder -a < entropy.kidekin
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
        mt19937|  8.99e+07  | 722892634|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.87388974|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.25081726|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.80329585|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.87234234|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.27873738|  PASSED  
        diehard_opso|   0|   2097152|     100|0.05958924|  PASSED  
        diehard_oqso|   0|   2097152|     100|0.10540020|  PASSED  
         diehard_dna|   0|   2097152|     100|0.30006047|  PASSED  
diehard_count_1s_str|   0|    256000|     100|0.43809130|  PASSED  
diehard_count_1s_byt|   0|    256000|     100|0.29758303|  PASSED  
 diehard_parking_lot|   0|     12000|     100|0.78081639|  PASSED  
    diehard_2dsphere|   2|      8000|     100|0.58294587|  PASSED  
    diehard_3dsphere|   3|      4000|     100|0.04012616|  PASSED  
     diehard_squeeze|   0|    100000|     100|0.97651988|  PASSED  
        diehard_sums|   0|       100|     100|0.01875349|  PASSED  
        diehard_runs|   0|    100000|     100|0.17566659|  PASSED  
        diehard_runs|   0|    100000|     100|0.78887310|  PASSED  
       diehard_craps|   0|    200000|     100|0.16369886|  PASSED  
       diehard_craps|   0|    200000|     100|0.42148915|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.27534860|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.45190499|  PASSED  
         sts_monobit|   1|    100000|     100|0.88204376|  PASSED  
            sts_runs|   2|    100000|     100|0.15277754|  PASSED  
          sts_serial|   1|    100000|     100|0.71489026|  PASSED  
          sts_serial|   2|    100000|     100|0.85005457|  PASSED  
          sts_serial|   3|    100000|     100|0.77631916|  PASSED  
          sts_serial|   3|    100000|     100|0.81111751|  PASSED  
          sts_serial|   4|    100000|     100|0.72512842|  PASSED  
          sts_serial|   4|    100000|     100|0.68758000|  PASSED  
          sts_serial|   5|    100000|     100|0.69083583|  PASSED  
          sts_serial|   5|    100000|     100|0.09706031|  PASSED  
          sts_serial|   6|    100000|     100|0.52758972|  PASSED  
          sts_serial|   6|    100000|     100|0.27970465|  PASSED  
          sts_serial|   7|    100000|     100|0.07925569|  PASSED  
          sts_serial|   7|    100000|     100|0.25874891|  PASSED  
          sts_serial|   8|    100000|     100|0.33647659|  PASSED  
          sts_serial|   8|    100000|     100|0.80952471|  PASSED  
          sts_serial|   9|    100000|     100|0.99948911|   WEAK   
          sts_serial|   9|    100000|     100|0.32461849|  PASSED  
          sts_serial|  10|    100000|     100|0.69360795|  PASSED  
          sts_serial|  10|    100000|     100|0.96022345|  PASSED  
          sts_serial|  11|    100000|     100|0.91349333|  PASSED  
          sts_serial|  11|    100000|     100|0.95918606|  PASSED  
          sts_serial|  12|    100000|     100|0.69821905|  PASSED  
          sts_serial|  12|    100000|     100|0.57652285|  PASSED  
          sts_serial|  13|    100000|     100|0.28393582|  PASSED  
          sts_serial|  13|    100000|     100|0.45849491|  PASSED  
          sts_serial|  14|    100000|     100|0.30832853|  PASSED  
          sts_serial|  14|    100000|     100|0.89099315|  PASSED  
          sts_serial|  15|    100000|     100|0.87022105|  PASSED  
          sts_serial|  15|    100000|     100|0.06938123|  PASSED  
          sts_serial|  16|    100000|     100|0.79568629|  PASSED  
          sts_serial|  16|    100000|     100|0.53218489|  PASSED  
         rgb_bitdist|   1|    100000|     100|0.38552808|  PASSED  
         rgb_bitdist|   2|    100000|     100|0.79403454|  PASSED  
         rgb_bitdist|   3|    100000|     100|0.66811643|  PASSED  
         rgb_bitdist|   4|    100000|     100|0.84954470|  PASSED  
         rgb_bitdist|   5|    100000|     100|0.90198903|  PASSED  
         rgb_bitdist|   6|    100000|     100|0.98808244|  PASSED  
         rgb_bitdist|   7|    100000|     100|0.25730860|  PASSED  
         rgb_bitdist|   8|    100000|     100|0.43237015|  PASSED  
         rgb_bitdist|   9|    100000|     100|0.90916135|  PASSED  
         rgb_bitdist|  10|    100000|     100|0.81131338|  PASSED  
         rgb_bitdist|  11|    100000|     100|0.31361128|  PASSED  
         rgb_bitdist|  12|    100000|     100|0.40786889|  PASSED  
rgb_minimum_distance|   2|     10000|    1000|0.03358258|  PASSED  
rgb_minimum_distance|   3|     10000|    1000|0.99298827|  PASSED  
rgb_minimum_distance|   4|     10000|    1000|0.47721533|  PASSED  
rgb_minimum_distance|   5|     10000|    1000|0.86641982|  PASSED  
    rgb_permutations|   2|    100000|     100|0.10084049|  PASSED  
    rgb_permutations|   3|    100000|     100|0.99560585|   WEAK   
    rgb_permutations|   4|    100000|     100|0.42217190|  PASSED  
    rgb_permutations|   5|    100000|     100|0.95466090|  PASSED  
      rgb_lagged_sum|   0|   1000000|     100|0.64120688|  PASSED  
      rgb_lagged_sum|   1|   1000000|     100|0.22106106|  PASSED  
      rgb_lagged_sum|   2|   1000000|     100|0.41244281|  PASSED  
      rgb_lagged_sum|   3|   1000000|     100|0.98880097|  PASSED  
      rgb_lagged_sum|   4|   1000000|     100|0.78380177|  PASSED  
      rgb_lagged_sum|   5|   1000000|     100|0.25533777|  PASSED  
      rgb_lagged_sum|   6|   1000000|     100|0.78150371|  PASSED  
      rgb_lagged_sum|   7|   1000000|     100|0.53903267|  PASSED  
      rgb_lagged_sum|   8|   1000000|     100|0.04436257|  PASSED  
      rgb_lagged_sum|   9|   1000000|     100|0.77174302|  PASSED  
      rgb_lagged_sum|  10|   1000000|     100|0.54862612|  PASSED  
      rgb_lagged_sum|  11|   1000000|     100|0.48691334|  PASSED  
      rgb_lagged_sum|  12|   1000000|     100|0.06308057|  PASSED  
      rgb_lagged_sum|  13|   1000000|     100|0.42530804|  PASSED  
      rgb_lagged_sum|  14|   1000000|     100|0.86907366|  PASSED  
      rgb_lagged_sum|  15|   1000000|     100|0.66262930|  PASSED  
      rgb_lagged_sum|  16|   1000000|     100|0.85485044|  PASSED  
      rgb_lagged_sum|  17|   1000000|     100|0.39817394|  PASSED  
      rgb_lagged_sum|  18|   1000000|     100|0.90608610|  PASSED  
      rgb_lagged_sum|  19|   1000000|     100|0.94996515|  PASSED  
      rgb_lagged_sum|  20|   1000000|     100|0.78715690|  PASSED  
      rgb_lagged_sum|  21|   1000000|     100|0.93364519|  PASSED  
      rgb_lagged_sum|  22|   1000000|     100|0.84438533|  PASSED  
      rgb_lagged_sum|  23|   1000000|     100|0.77439531|  PASSED  
      rgb_lagged_sum|  24|   1000000|     100|0.12530311|  PASSED  
      rgb_lagged_sum|  25|   1000000|     100|0.79035917|  PASSED  
      rgb_lagged_sum|  26|   1000000|     100|0.93286961|  PASSED  
      rgb_lagged_sum|  27|   1000000|     100|0.32567247|  PASSED  
      rgb_lagged_sum|  28|   1000000|     100|0.39563718|  PASSED  
      rgb_lagged_sum|  29|   1000000|     100|0.15628693|  PASSED  
      rgb_lagged_sum|  30|   1000000|     100|0.69368810|  PASSED  
      rgb_lagged_sum|  31|   1000000|     100|0.00197963|   WEAK   
      rgb_lagged_sum|  32|   1000000|     100|0.23325783|  PASSED  
     rgb_kstest_test|   0|     10000|    1000|0.18940877|  PASSED  
     dab_bytedistrib|   0|  51200000|       1|0.57007834|  PASSED  
             dab_dct| 256|     50000|       1|0.76567665|  PASSED  
Preparing to run test 207.  ntuple = 0
        dab_filltree|  32|  15000000|       1|0.60537852|  PASSED  
        dab_filltree|  32|  15000000|       1|0.78894908|  PASSED  
Preparing to run test 208.  ntuple = 0
       dab_filltree2|   0|   5000000|       1|0.11775507|  PASSED  
       dab_filltree2|   1|   5000000|       1|0.34799105|  PASSED  
Preparing to run test 209.  ntuple = 0
        dab_monobit2|  12|  65000000|       1|0.69182598|  PASSED  

Finally, a visual check on the data, even though it's safe to assume that it's "true random" given the previous testing:

$ dd if=white.bmp of=entropy.kidekin bs=1 count=54 conv=notrunc
54+0 records in
54+0 records out
54 bytes (54 B) copied, 0.000547208 s, 98.7 kB/s
$ gimp entropy.kidekin # convert to grayscale, export as "entropy.png"
$ optipng entropy.png
** Processing: entropy.png
512x512 pixels, 8 bits/pixel, grayscale
Input IDAT size = 250107 bytes
Input file size = 250564 bytes

Trying:
  zc = 9  zm = 8  zs = 0  f = 0		IDAT size = 215319
  zc = 9  zm = 8  zs = 1  f = 0		IDAT size = 214467
  zc = 1  zm = 8  zs = 2  f = 0		IDAT size = 214467
  zc = 9  zm = 8  zs = 3  f = 0		IDAT size = 214467
                               
Selecting parameters:
  zc = 1  zm = 8  zs = 2  f = 0		IDAT size = 214467

Output IDAT size = 214467 bytes (35640 bytes decrease)
Output file size = 214564 bytes (36000 bytes = 14.37% decrease)

And the result is:

RNG visual output of the Kidekin TRNG

My conclusion of the Kidekin TRNG is positive. I love the throughput of the device, loved the price, and aside from the UDEV rule, it is plug-and-play. Unfortunately, the TRNG is a bit on the big side for a physical device, and because it doesn't come with a security proof, and the hardware design is closed, I would be skeptical to trust it for your random numbers directly. Instead, I would recommend adding it the Linux kernel's CSPRNG, and rely on /dev/urandom instead. This is trivial with rngd(8). But, overall, I am very pleased with the device, and which I had actually purchased a second one.

]]>
https://pthree.org/2015/06/20/the-kidekin-trng-hardware-random-number-generator/feed/ 9
Additional Testing Of The rtl-sdr Dongle As A HWRNG https://pthree.org/2015/06/18/additional-testing-of-the-rtl-sdr-dongle-as-a-hwrng/ https://pthree.org/2015/06/18/additional-testing-of-the-rtl-sdr-dongle-as-a-hwrng/#comments Fri, 19 Jun 2015 02:40:57 +0000 https://pthree.org/?p=4113 A couple days ago, I put up a post about using the Realtek SDR dongles as a hardware true random number generator. I only tested the randomness of a 512 MB file. I thought this time, I would but a bit more stock into it. In this case, I let it run for a while, until it was 1.8 GB in size. Interestingly enough, it stopped getting bigger after that point. Not sure why. However, I ran more tests on that 1.8 GB file. Creating this file took its time:

$ tail -f /run/rtl_entropy.fifo | dd of=random.img iflag=fullblock
3554130+0 records in
3554130+0 records out
1819714560 bytes (1.8 GB) copied, 3897.22 s, 467 kB/s

This filled up a bit faster than I had previously tested, going at a clip of about 3.826 Mbps.

Now it was time for the testing:

$ ent random.img
Entropy = 8.000000 bits per byte.

Optimum compression would reduce the size
of this 1819714560 byte file by 0 percent.

Chi square distribution for 1819714560 samples is 246.86, and randomly
would exceed this value 63.11 percent of the times.

Arithmetic mean value of data bytes is 127.4990 (127.5 = random).
Monte Carlo value for Pi is 3.141611317 (error 0.00 percent).
Serial correlation coefficient is 0.000013 (totally uncorrelated = 0.0).

It passes with flying colors on entropy estimation, compression, chi-square distributions, arithmetic mean, the Monte Carlo estimation for Pi, and serial correlation. Testing further, I ran it through the FIPS 140-2 tests:

$ rngtest < random.img
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 14557716480
rngtest: FIPS 140-2 successes: 727288
rngtest: FIPS 140-2 failures: 597
rngtest: FIPS 140-2(2001-10-10) Monobit: 99
rngtest: FIPS 140-2(2001-10-10) Poker: 57
rngtest: FIPS 140-2(2001-10-10) Runs: 210
rngtest: FIPS 140-2(2001-10-10) Long run: 233
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=114.212; avg=6626.942; max=9536.743)Mibits/s
rngtest: FIPS tests speed: (min=61.133; avg=147.877; max=151.377)Mibits/s
rngtest: Program run time: 96034230 microseconds
You have new mail.
$ echo $?
1

Finally, the beast of beasts, I ran it through every Dieharder test. This took some time to complete. Here is a listing of the tests that it went through:

$ dieharder -l
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
Installed dieharder tests:
 Test Number	                     Test Name	              Test Reliability
===============================================================================
  -d 0  	                  Diehard Birthdays Test	      Good
  -d 1  	                     Diehard OPERM5 Test	      Good
  -d 2  	          Diehard 32x32 Binary Rank Test	      Good
  -d 3  	            Diehard 6x8 Binary Rank Test	      Good
  -d 4  	                  Diehard Bitstream Test	      Good
  -d 5  	                            Diehard OPSO	   Suspect
  -d 6  	                       Diehard OQSO Test	   Suspect
  -d 7  	                        Diehard DNA Test	   Suspect
  -d 8  	      Diehard Count the 1s (stream) Test	      Good
  -d 9  	        Diehard Count the 1s Test (byte)	      Good
  -d 10  	                Diehard Parking Lot Test	      Good
  -d 11  	Diehard Minimum Distance (2d Circle) Test	      Good
  -d 12  	Diehard 3d Sphere (Minimum Distance) Test	      Good
  -d 13  	                    Diehard Squeeze Test	      Good
  -d 14  	                       Diehard Sums Test	Do Not Use
  -d 15  	                       Diehard Runs Test	      Good
  -d 16  	                      Diehard Craps Test	      Good
  -d 17  	            Marsaglia and Tsang GCD Test	      Good
  -d 100  	                        STS Monobit Test	      Good
  -d 101  	                           STS Runs Test	      Good
  -d 102  	           STS Serial Test (Generalized)	      Good
  -d 200  	               RGB Bit Distribution Test	      Good
  -d 201  	   RGB Generalized Minimum Distance Test	      Good
  -d 202  	                   RGB Permutations Test	      Good
  -d 203  	                     RGB Lagged Sum Test	      Good
  -d 204  	        RGB Kolmogorov-Smirnov Test Test	      Good
  -d 205  	                       Byte Distribution	      Good
  -d 206  	                                 DAB DCT	      Good
  -d 207  	                      DAB Fill Tree Test	      Good
  -d 208  	                    DAB Fill Tree 2 Test	      Good
  -d 209  	                      DAB Monobit 2 Test	      Good

So here are the results:

 $ dieharder -a < random.img
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
        mt19937|  1.25e+08  | 169223456|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.91937112|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.77213572|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.04709503|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.93031877|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.12183977|  PASSED  
        diehard_opso|   0|   2097152|     100|0.96023625|  PASSED  
        diehard_oqso|   0|   2097152|     100|0.61237304|  PASSED  
         diehard_dna|   0|   2097152|     100|0.66045974|  PASSED  
diehard_count_1s_str|   0|    256000|     100|0.16999968|  PASSED  
diehard_count_1s_byt|   0|    256000|     100|0.00992823|  PASSED  
 diehard_parking_lot|   0|     12000|     100|0.69592283|  PASSED  
    diehard_2dsphere|   2|      8000|     100|0.95358410|  PASSED  
    diehard_3dsphere|   3|      4000|     100|0.89028448|  PASSED  
     diehard_squeeze|   0|    100000|     100|0.81631204|  PASSED  
        diehard_sums|   0|       100|     100|0.03559934|  PASSED  
        diehard_runs|   0|    100000|     100|0.75027140|  PASSED  
        diehard_runs|   0|    100000|     100|0.43076351|  PASSED  
       diehard_craps|   0|    200000|     100|0.57749359|  PASSED  
       diehard_craps|   0|    200000|     100|0.00599436|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.60121369|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.04254338|  PASSED  
         sts_monobit|   1|    100000|     100|0.94352358|  PASSED  
            sts_runs|   2|    100000|     100|0.77549833|  PASSED  
          sts_serial|   1|    100000|     100|0.46198961|  PASSED  
          sts_serial|   2|    100000|     100|0.46002706|  PASSED  
          sts_serial|   3|    100000|     100|0.73076110|  PASSED  
          sts_serial|   3|    100000|     100|0.90967100|  PASSED  
          sts_serial|   4|    100000|     100|0.32002297|  PASSED  
          sts_serial|   4|    100000|     100|0.07478887|  PASSED  
          sts_serial|   5|    100000|     100|0.27486408|  PASSED  
          sts_serial|   5|    100000|     100|0.57409336|  PASSED  
          sts_serial|   6|    100000|     100|0.05095556|  PASSED  
          sts_serial|   6|    100000|     100|0.06341272|  PASSED  
          sts_serial|   7|    100000|     100|0.00941089|  PASSED  
          sts_serial|   7|    100000|     100|0.53679805|  PASSED  
          sts_serial|   8|    100000|     100|0.00122125|   WEAK   
          sts_serial|   8|    100000|     100|0.16239101|  PASSED  
          sts_serial|   9|    100000|     100|0.24007712|  PASSED  
          sts_serial|   9|    100000|     100|0.02659941|  PASSED  
          sts_serial|  10|    100000|     100|0.64616186|  PASSED  
          sts_serial|  10|    100000|     100|0.78783799|  PASSED  
          sts_serial|  11|    100000|     100|0.77618602|  PASSED  
          sts_serial|  11|    100000|     100|0.33875893|  PASSED  
          sts_serial|  12|    100000|     100|0.50423715|  PASSED  
          sts_serial|  12|    100000|     100|0.77528158|  PASSED  
          sts_serial|  13|    100000|     100|0.57625144|  PASSED  
          sts_serial|  13|    100000|     100|0.73422196|  PASSED  
          sts_serial|  14|    100000|     100|0.40891605|  PASSED  
          sts_serial|  14|    100000|     100|0.48542772|  PASSED  
          sts_serial|  15|    100000|     100|0.67319390|  PASSED  
          sts_serial|  15|    100000|     100|0.74730027|  PASSED  
          sts_serial|  16|    100000|     100|0.67519158|  PASSED  
          sts_serial|  16|    100000|     100|0.73171087|  PASSED  
         rgb_bitdist|   1|    100000|     100|0.87216594|  PASSED  
         rgb_bitdist|   2|    100000|     100|0.18831902|  PASSED  
         rgb_bitdist|   3|    100000|     100|0.16757216|  PASSED  
         rgb_bitdist|   4|    100000|     100|0.05327115|  PASSED  
         rgb_bitdist|   5|    100000|     100|0.75278396|  PASSED  
         rgb_bitdist|   6|    100000|     100|0.64749144|  PASSED  
         rgb_bitdist|   7|    100000|     100|0.20311557|  PASSED  
         rgb_bitdist|   8|    100000|     100|0.39994123|  PASSED  
         rgb_bitdist|   9|    100000|     100|0.52805289|  PASSED  
         rgb_bitdist|  10|    100000|     100|0.96091722|  PASSED  
         rgb_bitdist|  11|    100000|     100|0.97794399|  PASSED  
         rgb_bitdist|  12|    100000|     100|0.75009561|  PASSED  
rgb_minimum_distance|   2|     10000|    1000|0.58923867|  PASSED  
rgb_minimum_distance|   3|     10000|    1000|0.54294743|  PASSED  
rgb_minimum_distance|   4|     10000|    1000|0.59446131|  PASSED  
rgb_minimum_distance|   5|     10000|    1000|0.00047025|   WEAK   
    rgb_permutations|   2|    100000|     100|0.89040191|  PASSED  
    rgb_permutations|   3|    100000|     100|0.47917416|  PASSED  
    rgb_permutations|   4|    100000|     100|0.30964668|  PASSED  
    rgb_permutations|   5|    100000|     100|0.70217495|  PASSED  
      rgb_lagged_sum|   0|   1000000|     100|0.12796648|  PASSED  
      rgb_lagged_sum|   1|   1000000|     100|0.15077254|  PASSED  
      rgb_lagged_sum|   2|   1000000|     100|0.31141471|  PASSED  
      rgb_lagged_sum|   3|   1000000|     100|0.94974697|  PASSED  
      rgb_lagged_sum|   4|   1000000|     100|0.99256987|  PASSED  
      rgb_lagged_sum|   5|   1000000|     100|0.67854004|  PASSED  
      rgb_lagged_sum|   6|   1000000|     100|0.08600877|  PASSED  
      rgb_lagged_sum|   7|   1000000|     100|0.91633363|  PASSED  
      rgb_lagged_sum|   8|   1000000|     100|0.06794590|  PASSED  
      rgb_lagged_sum|   9|   1000000|     100|0.59024027|  PASSED  
      rgb_lagged_sum|  10|   1000000|     100|0.59285975|  PASSED  
      rgb_lagged_sum|  11|   1000000|     100|0.87178336|  PASSED  
      rgb_lagged_sum|  12|   1000000|     100|0.63401541|  PASSED  
      rgb_lagged_sum|  13|   1000000|     100|0.47202172|  PASSED  
      rgb_lagged_sum|  14|   1000000|     100|0.34616699|  PASSED  
      rgb_lagged_sum|  15|   1000000|     100|0.97221211|  PASSED  
      rgb_lagged_sum|  16|   1000000|     100|0.95576739|  PASSED  
      rgb_lagged_sum|  17|   1000000|     100|0.32367098|  PASSED  
      rgb_lagged_sum|  18|   1000000|     100|0.92792046|  PASSED  
      rgb_lagged_sum|  19|   1000000|     100|0.58128429|  PASSED  
      rgb_lagged_sum|  20|   1000000|     100|0.78197001|  PASSED  
      rgb_lagged_sum|  21|   1000000|     100|0.86068846|  PASSED  
      rgb_lagged_sum|  22|   1000000|     100|0.22496908|  PASSED  
      rgb_lagged_sum|  23|   1000000|     100|0.52387665|  PASSED  
      rgb_lagged_sum|  24|   1000000|     100|0.52748770|  PASSED  
      rgb_lagged_sum|  25|   1000000|     100|0.96442902|  PASSED  
      rgb_lagged_sum|  26|   1000000|     100|0.51298847|  PASSED  
      rgb_lagged_sum|  27|   1000000|     100|0.99123470|  PASSED  
      rgb_lagged_sum|  28|   1000000|     100|0.69774674|  PASSED  
      rgb_lagged_sum|  29|   1000000|     100|0.83646714|  PASSED  
      rgb_lagged_sum|  30|   1000000|     100|0.98573851|  PASSED  
      rgb_lagged_sum|  31|   1000000|     100|0.23580471|  PASSED  
      rgb_lagged_sum|  32|   1000000|     100|0.19150884|  PASSED  
     rgb_kstest_test|   0|     10000|    1000|0.67771558|  PASSED  
     dab_bytedistrib|   0|  51200000|       1|0.07152541|  PASSED  
             dab_dct| 256|     50000|       1|0.53841656|  PASSED  
Preparing to run test 207.  ntuple = 0
        dab_filltree|  32|  15000000|       1|0.09092747|  PASSED  
        dab_filltree|  32|  15000000|       1|0.83382174|  PASSED  
Preparing to run test 208.  ntuple = 0
       dab_filltree2|   0|   5000000|       1|0.37363586|  PASSED  
       dab_filltree2|   1|   5000000|       1|0.26890999|  PASSED  
Preparing to run test 209.  ntuple = 0
        dab_monobit2|  12|  65000000|       1|0.80810458|  PASSED  

I don't have an image to look at to visually verify that there are no obvious patterns. At 1.8 GB, I feel that it would be just a bit too unwieldy anyway. So, I'll need to trust the previous tests for randomness that the data really is random. After these 3 series of tests, I can only conclude that using a Realtek SDR as a HWRNG will generate as "true random" data as you can hope for.

]]>
https://pthree.org/2015/06/18/additional-testing-of-the-rtl-sdr-dongle-as-a-hwrng/feed/ 2
Hardware RNG Through an rtl-sdr Dongle https://pthree.org/2015/06/16/hardware-rng-through-an-rtl-sdr-dongle/ https://pthree.org/2015/06/16/hardware-rng-through-an-rtl-sdr-dongle/#comments Tue, 16 Jun 2015 22:50:05 +0000 https://pthree.org/?p=4093 An rtl-sdr dongle allows you to receive radio frequency signals to your computer through a software interface. You can listen to Amateur Radio, watch analog television, listen to FM radio broadcasts, and a number of other things. I have a friend to uses it to monitor power usage at his house. However, I have a different use- true random number generation.

The theory behind the RNG is by taking advantage of radio frequency noise such as atmospheric noise. which is caused by natural occurrences, such as weak galactic radiation from the center of our Milky Way Galaxy to the stronger local and remote lightning strikes. It's estimated that roughly 40 lightning strikes are hitting the Earth every second, which equates to about 3.5 million strikes per 24 hour period. Interestingly enough, this provides a great deal of entropy for a random number generator.

Check out Blitzortung. It is a community run site, where volunteers can setup lightning monitoring stations and submit data to the server. Of course, it isn't an accurate picture of the entire globe, but you can at least get some idea of the scope of lightning strikes around the continents.

Lightning Map of the United States

Unfortunately, however, the rtl-sdr dongle won't get down to the frequencies necessary for sampling atmospheric noise; about 100 KHz to 10 MHz, and above 10 GHz. However, it can sample cosmic noise, man-made (urban and suburban) noise, solar noise, thermal noise, and other terrestrial noises that are well within the tuner frequency range of the dongle.

In order to take advantage of this, you obviously need an rtl-sdr dongle. They're quite cheap, about $15 or so, and plug in via USB with an external antenna. Of course, the larger the antenna, the more terrestrial noise you'll be able to observe. With a standard telescoping antenna, I can observe about 3 Mbps of true random data.

The other piece, however, will be compiling and installing the rtl-entropy software. This will provide a FIFO file for observing the random data. Reading the random data can be done as you would read any regular file:

$ sudo rtl_entropy -b -f 74M
$ tail -f /run/rtl_entropy.fifo | dd of=/dev/null
^C8999+10 records in
9004+0 records out
4610048 bytes (4.6 MB) copied, 13.294 s, 347 kB/s

That's roughly 2.8 Mbps. Not bad for $15. Notice, that I passed the "-b" switch to detach the PID from the controlling TTY and background. Further, I am not tuning to the default frequency of 70 MHz, which is part of Band I in the North America band plan for television broadcasting. Instead, I am tuning to 74 MHz, which is in the middle of a break in the band plan, where no television broadcasting should be transmitted. Of course, you'll need to make sure you are tuning to a frequency that is less likely to encounter malicious interference. Even though the rtl_entropy daemon has built-in debiasing and FIPS randomness testing, a malicious source could interrupt with the operation of the output by transmitting on the frequency that you are listening to.

In order to guarantee that you have random data, you should send it through a battery of standardized tests for randomness. One popular test for randomness are the FIPS 140-2 tests. Suppose I create a 512 MB file from my sdr-rtl dongle, I can test it as follows:

$ rngtest < random.img
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 83886080
rngtest: FIPS 140-2 successes: 4190
rngtest: FIPS 140-2 failures: 4
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 1
rngtest: FIPS 140-2(2001-10-10) Runs: 1
rngtest: FIPS 140-2(2001-10-10) Long run: 2
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=174.986; avg=4379.165; max=4768.372)Mibits/s
rngtest: FIPS tests speed: (min=113.533; avg=147.777; max=150.185)Mibits/s
rngtest: Program run time: 560095 microseconds

It's expected to see some failures, but they should be outliers. There is also the Dieharder battery of randomness tests. This will take substantially longer to work through, but it can be done. Here are the first few lines:

$ dieharder -a < random.img 
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
        mt19937|  1.30e+08  | 334923062|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.98331589|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.12201131|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.69993313|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.55365877|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.85077208|  PASSED  
        diehard_opso|   0|   2097152|     100|0.76171650|  PASSED  

The whole dieharder results of my 512 MB random file can be found here.

Last, but not least, it helps to observe the data visually. In this image, I created a plain white file in Gimp, that was 600x600 pixels in size. I then counted the number of bytes in that file, and generated an equally sized random binary data file. Finally, I added the bitmap header to the file, converted it to a PNG file, optimized it, and uploaded it here. The steps are as follows:

$ gimp # create 600x600px plain white file and save as 16-bit "white.bmp"
$ ls -l white.bmp | awk '{print $5}'
720138
$ tail -f /run/rtl_entropy.fifo| dd of=random.img bs=1 count=720138 iflag=fullblock
720138+0 records in
720138+0 records out
720138 bytes (720 kB) copied, 24.8033 s, 29.0 kB/s
$ dd if=white.bmp of=random.img bs=1 count=54 conv=notrunc
$ gimp random.img # export to PNG file

When viewing the output, there should be no obvious patterns in the output. As an example:

Visual representation of random

For more practical use, here is a quick application for generating 80-bit entropy unambiguous passwords:

$ for i in {1..10}; do
> strings /run/rtl_entropy.fifo | grep -o '[a-hjkmnp-z2-9.]' | head -n 16 | tr -d '\n'; echo
> done
8dfn42w6dagqnt4z
2vcsqu6sew.g6pp2
kv9nstj4gq39x5f.
wmpdpy7yz75xrhkh
.4ra2b38hmbf5jw5
7ngyk3c58k3eeq7c
8e4t8ts3ykhckdst
9g6yqqce.bxrrhpb
xwnw6mtk8njv76b2
xdmd89n68f.kcthp

Obviously, the practical uses here can be for Monte Carlo simulations, game theory, gambling, cryptography, and other practical uses where high quality randomness is needed. Unfortunately, I can seem to get rngd(8) to add the /run/rtl_entropy.fifo file as a hardware device. So, I can't feed the Linux CSPRNG with with the dongle, other than "dd if=/run/rtl_entropy.fifo of=/dev/random", which doesn't increase the entropy estimate, of course.

]]>
https://pthree.org/2015/06/16/hardware-rng-through-an-rtl-sdr-dongle/feed/ 13
Encrypting Combination Locks https://pthree.org/2015/05/31/encrypting-combination-locks/ https://pthree.org/2015/05/31/encrypting-combination-locks/#comments Sun, 31 May 2015 12:33:50 +0000 https://pthree.org/?p=4085 This morning, my family and I went swimming at the community swimming center. Unfortunately, I couldn't find my key-based lock that I normally take. However, I did find my Master combination lock, but couldn't recall the combination. Fortunately, I knew how to find it. I took this lock with me to lock my personal items in the locker while swimming around in the pool.

While swimming, I started thinking about ways to better recall lock combinations in the future. The obvious choice is to encrypt it, so I could engrave the encrypted combination on the lock. However, it needs to be simple enough to do in my head should I temporarily forget it while swimming, and easy enough to recall if I haven't used the lock in a few years. Thankfully, this can be done easily enough with modulo addition and subtraction.

Before beginning, you need a 6-digit PIN that you won't easily forget. Tempting enough, dates can easily be in 6-digits, and something like a birthday or an anniversary are not hard to remember. Unfortunately, if someone knows you, and knows these dates, they can easily reverse the process to open the lock. So, as tempting as dates are, don't use them. Instead, you should probably use a 6-digit PIN, that only you would know, and always know. So, knowing this, let's see how this works.

You need to be familiar with modulus math, aka "clock math". The idea, is that after a certain maximum, the numbers reset back to 0. For example, 00:00 is midnight while 23:59 is the minute before. As soon as the hour is "24", then it resets back to 0, for a full 24-hour day. You could call telling time "mod 24 math". For combination locks, we're going to be using "mod 40 math", if the maximum number on your combination lock is "40", on "mod 60 math" if the max is "60", and so forth.

Suppose the combination to your lock is "03-23-36", and suppose your 6-digit PIN is "512133". Let's encrypt the combination with our PIN, by using "mod 40 subtraction". We'll use subtraction now, because most people have an easier time with addition than subtraction. When you are trying to rediscover your combination, you'll take your encrypted number, and do "mod 40 addition" to reverse it, and bring it back to the original combination lock numbers.

Here it is in action:

Encrypting the original combination

  03 23 36    <- original combination
- 51 21 33    <- secret PIN
= --------
 -48 02 03
= --------
  32 02 03    <- encrypted after "mod 40"

Because the first number in our combination is "03", and we are subtracting off "51", we end up with "-48". As such, we need to add "40" until our target new number is in the range of [0, 40), or "0 <= n < 40". This gives us "32" as the result. The rest of the numbers fell within that range, so no adjusting was necessary. I can then engrave "32-02-03" on the bottom of the lock, so when I hold the lock up while in a locker, the text is readable. Okay, that's all fine and dandy, but what about reversing it? Taking the encrypted combination, and returning to the original combination? This is where "mod 40 addition" comes in. For example:

Decrypting the encrypted combination

  32 02 03    <- encrypted combination
+ 51 21 33    <- secret PIN
= --------
  83 23 36
= --------
  03 23 36    <- original combination after "mod 40"

Notice that this time, the first number in our "mod 40 addition" is "83". So, we subtract of "40" until our original combination number is in the range of [0,40), or "0 <= n < 40", just like when doing "mod 40 subtraction" to create the new combination lock values. At worst case, you'll have to subtract a "40" only three times per number. On thing to watch out for, is that your encrypted combination numbers are far enough away from the original, that trying out the encrypted combination, won't accidentally open the lock, due to their proximity to the original numbers. If only one number is substantially off, that should be good enough to prevent an accidental opening. I want to come back to dates however, and why not to use them. Not only do they fall victim to a targeted attack, but they also have an exceptionally small key space. Assuming there are only 365 days per year, and assuming the attacker has a good idea of your age, plus or minus five years, that's a total of 3,650 total keys that must be tried following the common convention of "MM-DD-YY". It could be greatly reduced, if the attacker has a better handle on when you were born. If a 6-digit PIN is chosen instead, then the search space has 1,000,000 possible PINs. This is greater than the 64,000 possible maximum combination numbers a 40-digit Master lock could have, which puts the attacker on a brute force search for the original combination, if they aren't aware that Master combination locks can be broken in 8 tries or less.

]]>
https://pthree.org/2015/05/31/encrypting-combination-locks/feed/ 2
The Lagged Fibonacci Generator https://pthree.org/2015/05/29/the-lagged-fibonacci-generator/ https://pthree.org/2015/05/29/the-lagged-fibonacci-generator/#comments Fri, 29 May 2015 06:12:47 +0000 https://pthree.org/?p=4061 Lately, I have been studying pseudorandom number generators (PRNGs, also called "deterministic random bit generators", or DRBGs). I've been developing cryptographically secure PRNGs (CSPRNGs), and you can see my progress on Github at https://github.com/atoponce/csprng. This project is for nothing more than for me to somewhat get a feeling for new languages, while also learning a thing or two about applied cryptograhpy. However, for the subject of this post, I want to address one PRNG that is not cryptographically secure- the Lagged Fibonacci Generator.

What drew me to this generator was thinking about a way to have a PRNG to do by hand. I started thinking about different ways to construct a PRNG mathematically. But, before creating an algorithm, I needed to identify all the points that make a good PRNG. A good PRNG should have:

  • An easy implementation.
  • High efficiency in calculating the pseudorandom values.
  • Long (practically un-observable) periods for most, if not all initial seeds.
  • A uniform distribution over the finite space.
  • No correlation between successive values.

I put a great deal of thought into it, but couldn't come up with anything I was very proud of. I thought of using trigonometric functions, various logarithm functions, geometric and algebraic expressions, and even fancy equations using derivatives. The more I thought about it, the further away I drifted from something simple that could be done by hand with pencil and paper.

The best I came up with, which required using a scientific calculator, was forcing the sequence to grow (a monotonically increasing function), then forcing it into a finite field with a modulus. However, no matter what I threw at it, I always struggled with either dealing with "0" or "1". For example, taking the n-th exponent of either "0" or "1" will always return a "0" or "1". I realized quickly that multiplication might be a problem. For example, one thought I had was the following:

Si = Floor[(Si-1)3/2], mod M

This works out fine, until your output is a "0" or "1", then the generator sits on either of those numbers indefinitely. I realized that my function should probably just stick with addition, or I'm bound to get myself into trouble. I thought, and thought about it, then it hit me. It was my "Beautiful Mind" moment.

I thought of the Fibonacci sequence.

The Fibonacci sequence is monotonically increasing for two seeds S1 and S2, where 0 < S1 < S2. If you put an upper bound on the sequence via a modulus, you can limit it to a finite space, and I can have my PRNG. However, I also know that the distance between any two sequential digits in the Fibonacci sequence approaches the Golden Ratio Phi. I'm not sure how this would affect my simple PRNG, and if a correlation between successive digits could be identified, but I started scribbling down numbers on a text pad anyway.

Immediately, however, I found something interesting: If both seeds are even, then the whole sequence of numbers would be even. For example, take the following Fibonacci PRNG:

S1 = 6, S2 = 8, mod 10
6 8 4 2 6 8 4 2 6 8 4 2 ...

There are two problems happening here- first, the period of the PRNG is 4 digits- 6, 8, 4, & 2. Second, because even numbers were chosen for the seeds, even numbers are the only possibility for the PRNG. So, either one of the seeds or the modulus must be odd, or the PRNG algorithm needs to be modified.

At this point, I threw my hands up in the air, and said "screw it". I decided to see what history had discovered with simple PRNGs. Turns out, I wasn't far off. A Fibonacci sequence PRNG exists called the Lagged Fibonacci Generator. Here is how it works:

Sn = Sn-j ⊙ Sn-k mod M, 0 < j < k

Where "⊙" is any binary function, such as addition, subtraction, multiplication, or even the bitwise exclusive-or.

First off, it doesn't address the "all evens" problem with my naive generator. If addition is used to calculate the values, then at least one number in the seed must be odd. If multiplication is used, then at least k-elements must be odd. However, what is interesting about this generator, is that rather than picking the first and second elements of the list to calculate the random value (Si-1 and Si-2), any j-th and k-th items in the list can be used (Si-j and Si-k). However, you must have at least k-elements in the list as your seed before beginning the algorithm.

To simplify things, lets pick "j=3" and "k=7" mod 10 addition. I need at least seven elements in the list, and at least one of them must be odd. I've always like the phone number "867-5309", so let's use that as our seed. Thus, the first 10 steps of our generator would look like this:

j=3, k=7, mod 10 addition

        [j]       [k]
 1. 8 6 [7] 5 3 0 [9] => 7+9 = 6 mod 10
 2. 6 7 [5] 3 0 9 [6] => 5+6 = 1 mod 10
 3. 7 5 [3] 0 9 6 [1] => 3+1 = 4 mod 10
 4. 5 3 [0] 9 6 1 [4] => 0+4 = 4 mod 10
 5. 3 0 [9] 6 1 4 [4] => 9+4 = 3 mod 10
 6. 0 9 [6] 1 4 4 [3] => 6+3 = 9 mod 10
 7. 9 6 [1] 4 4 3 [9] => 1+9 = 0 mod 10
 8. 6 1 [4] 4 3 9 [0] => 4+0 = 4 mod 10
 9. 1 4 [4] 3 9 0 [4] => 4+4 = 8 mod 10
10. 4 4 [3] 9 0 4 [8] => 3+8 = 1 mod 10

Generated: 6 1 4 4 3 9 0 4 8 1

The following Python code should verify our results:

1
2
3
4
5
6
7
8
9
10
11
12
j = 3
k = 7
s = [8, 6, 7, 5, 3, 0, 9]
for n in xrange(10):
    for i in xrange(len(s)):
        if i is 0:
            out = (s[j-1] + s[k-1]) % 10 # the pseudorandom output
        elif 0 < i < 6:
            s[i] = s[i+1] # shift the array
        else:
            s[i] = out
            print s[i], # print the result

Running it verifies our results:

$ python lagged.py
6 1 4 4 3 9 0 4 8 1

It's a "lagged" generator, because "j" and "k" lag behind the generated pseudorandom value. Also, this is called a "two-tap" generator, in that you are using 2 values in the sequence to generate the pseudorandom number. However, a two-tap generator has some problems with randomness tests, such as the Birthday Spacings. Apparently, creating a "three-tap" generator addresses this problem. Such a generator would look like:

Sn = Sn-j ⊙ Sn-k ⊙ Sn-l mod M, 0 < j < k < l

Even though this generator isn't cryptographically secure (hint: it's linear), it meets the above requirements for a good PRNG, provided the "taps" are chosen carefully (the lags are exponents of a primitive polynomial), and the modulus is our traditional "power-of-2" (2M, such as 232 or 264). Supposing we are using a two-tap LFG, it would have a maximum period of:

(2k-1)*k    if exclusive-or is used
(2k-1)*2M-1 if addition or subtraction is used
(2k-1)*2M-3 if multiplication is used (1/4 of period of the additive case)

For a good LFG, it is found that a three-tap generator should be used, as a 3-element spacing correlation can be found in two-tap generators, and that initial taps should be very high for a large modulus. Further, the full mathematical theory hasn't been worked out on Fibonacci generators, so the quality of the generators rests mostly on the statistics of the generated output, and randomness tests.

However, this is simple enough to do by hand, if nothing else than to impress your friends.

]]>
https://pthree.org/2015/05/29/the-lagged-fibonacci-generator/feed/ 1
Financially Supporting Open Crypto https://pthree.org/2015/02/06/financially-supporting-open-crypto/ https://pthree.org/2015/02/06/financially-supporting-open-crypto/#respond Fri, 06 Feb 2015 21:14:36 +0000 https://pthree.org/?p=4032 In April 2014, Heartbleed shook the Internet. OpenSSL had introduced a feature called "TLS Heartbeats" Heartbeats allow for a client-encrypted session to remain open between the client and the server, without the need to renegotiate a new connection. In theory, the feature is sound. Heartbeats should minimize load on busy servers, and improve responsiveness on the client. However, due to a simple oversight in the code, buffers could be over-read, allowing the client to request much more data from the server's memory than needed. As a result, usernames and passwords cached in the server's memory could be leaked to the client.

This was a nasty bug, and it underscored how under-staffed and under-funded the OpenSSL development team is. OpenSSL is the de facto standard in securing data in motion for the Internet. It protects your web connections when visiting your bank's website, and it protects your email communication between your email client and the upstream mail server.

Ars Technica started off an article about tech giants finally agreeing to fund the OpenSSL development. Quote:

The open source cryptographic software library secures hundreds of thousands of Web servers and many products sold by multi-billion-dollar companies, but it operates on a shoestring budget. OpenSSL Software Foundation President Steve Marquess wrote in a blog post last week that OpenSSL typically receives about $2,000 in donations a year and has just one employee who works full time on the open source code.

If that isn't bad enough, Werner Koch, the sole developer and maintainer of the encryption software "GnuPG" is in much the same position as Steve Marquess. ProPublica put up a post about the very sobering financial situation of GnuPG. Quote:

The man who built the free email encryption software used by whistleblower Edward Snowden, as well as hundreds of thousands of journalists, dissidents and security-minded people around the world, is running out of money to keep his project alive.

Werner Koch wrote the software, known as Gnu Privacy Guard, in 1997, and since then has been almost single-handedly keeping it alive with patches and updates from his home in Erkrath, Germany. Now 53, he is running out of money and patience with being underfunded.

To understand just how critical this piece of software is to the Internet and the community at large, OpenPGP (the specification upon which GnuPG is built) is used by software developers around the world to prove the integrity of their software, when downloading it from their website. It's used by operating system vendors, such as Microsoft, Apple, Google, and GNU/Linux to provide package integrity when installing "apps" on your computer or mobile device. People and corporations have used it internally for data at rest as well, such as encrypting backups before sending them offsite.

Thankfully, after ProPublica published their article, Werner Koch, father and husband, got the donation funding he needed to continue focusing on it full time. Thanks to Facebook and Stripe, he has $100,000 of annual sponsored donations to help keep the development of GnuPG pressing forward.

Why is it that the two most fundamental cryptographic tools in our community are so under developed, under funded, and under staffed? I can understand that cryptography is hard. There is a reason why people get doctorate degrees in mathematics and computer science to understand this stuff. But with such critical pieces of infrastructure protection, you would think it would be getting much more attention than it is.

A good rule of thumb for cryptography, is if you want to protect your data in transit, use OpenSSL; if you want to protect your data at rest, use GnuPG. Let's hope that these two projects get the attention and funding they need to continue well into the future for years to come.

If you want to help donate to these two projects, you can donate to GnuPG here and to OpenSSL here. Alternatively, there is a Flattr donation page for GnuPG where you can setup recurring donations here.

]]>
https://pthree.org/2015/02/06/financially-supporting-open-crypto/feed/ 0
Reasonable SSH Security For OpenSSH 6.0 Or Later https://pthree.org/2015/01/12/reasonable-ssh-security-for-openssh-6-0-or-later/ https://pthree.org/2015/01/12/reasonable-ssh-security-for-openssh-6-0-or-later/#comments Mon, 12 Jan 2015 13:00:54 +0000 https://pthree.org/?p=4006 As many of you have probably seen, Stribik András wrote a post titled Secure Secure Shell. It's made the wide rounds across the Internet, and has seen a good, positive discussion about OpenSSH security. It's got people thinking about their personal SSH keys, as well as the differences between ECC and RSA, why the /etc/ssh/moduli file matters, and other things. Because of that post, many people who use SSH are increasing their security when they get online.

However, the post does one disservice- it requires OpenSSH 6.5 or later. While this is good, and people should be running the latest stable release, there are many, many older versions of OpenSSH out there, that are still supported by the distro, such as Debian GNU/Linux 7.8, which ships OpenSSH 6.0. Most people will be using the release that ships with their distro.

As a side note, CentOS 5 ships OpenSSH 4.3, and CentOS 6 ships OpenSSH 5.3. Because these are very old releases, and CentOS is still providing support for them, you will need to check the man pages for OpenSSH, and see how your client and server configurations need to be adjusted. It won't be covered here.

So, with that in mind, let's look at OpenSSH 6.0, and see what it supports.

OpenSSH 6.0 Ciphers

The following is the default order for symmetric encryption ciphers:

  1. aes128-ctr
  2. aes192-ctr
  3. aes256-ctr
  4. arcfour256
  5. arcfour128
  6. aes128-cbc
  7. 3des-cbc
  8. blowfish-cbc
  9. cast128-cbc
  10. aes192-cbc
  11. aes256-cbc
  12. arcfour

CTR mode should be preferred over CBC mode, whenever possible. It can be executed in parallel, and it seems to be the "safer" choice over CBC although it's security margin over CBC is probably minimal. The internal mechanisms are more simplistic, which is why modes like EAX and GCM use CTR internally. With that said, CBC mode is not "unsafe", so there is no strong security argument to avoid it. However, modern and older OpenSSH implementations support CTR mode, so there really is no need for CBC.

The "arcfour" protocols are "alleged RC4", but adhere to the RC4 RFC. RC4 has been showing weaknesses lately. Cryptographers have been advising to move off of it, PCI vendors will fail scans with SSL implementations that support RC4, and OpenBSD 5.5 switched to a modified ChaCha20 for its internal CSPRNG. So, it's probably a good idea to move away from the arcfour ciphers, even if it may not be practically broken yet.

However, arcfour is really the only high performance cipher in the OpenSSH 6.0 suite, and is very handy when trying to transfer many gigabytes of data over the network, as AES will pin the CPU before flooding the pipe (unless of course you have hardware AES on board). So, I would recommend the arc4 ciphers as a last resort, and only enable them on private networks, where you need the throughput.

The cast128 cipher was an AES candidate, and is a Canadian standard. To my knowledge, it does not have any near practical security attacks. However, because only CBC mode is supported with CAST, and not CTR mode, and we're disabling CBC mode, it is not included in our final list.

3DES was designed to address the short 56-bit key sizes in DES, which was replaced later by AES. 3DES cascades DES three times, with three distinct 56-bit keys. 3DES also does not have any near practical security attacks, and it is believed to be secure. However, DES was designed with hardware in mind, and is slow, slow, slow in software. 3DES three times as much. It's horribly inefficient. As such, I would recommend disabling 3DES.

Blowfish was designed by Bruce Schneier as a replacement for DES. While Blowfish might still have a considerable security margin, Blowfish suffers from attacks from weak keys. As such, Blowfish implementations must be careful when selecting keys. Blowfish can be efficient in both hardware and software, but it's usually less efficient than AES. Further, Bruce himself recommends that people stop using Blowfish and move to its successor Twofish, or even Threefish. As such, because both stronger and more efficient algorithms exist, I would recommend disabling Blowfish. It really isn't offering anything to OpenSSH clients.

So, in my opinion, I would sort my OpenSSH 6.0 ciphers like so:

  1. aes256-ctr
  2. aes192-ctr
  3. aes128-ctr
  4. arcfour256
  5. arcfour128
  6. arcfour

OpenSSH 6.0 Key Exchange

The following is the default order for key exchange algorithms:

  1. ecdh-sha2-nistp256
  2. ecdh-sha2-nistp384
  3. ecdh-sha2-nistp521
  4. diffie-hellman-group-exchange-sha256
  5. diffie-hellman-group-exchange-sha1
  6. diffie-hellman-group14-sha1
  7. diffie-hellman-group1-sha1

The NIST curves are considered to be insecure. Not because it's some government agency tied with the NSA, but because the curves are not ECDLP rigid, and suffer from a lack of constant-time single-coordinate single-scalar multiplication, they aren't complete, and are distinguishable from uniform random strings. If you want to blame the NSA for rubber-stamping and backdooring the NIST ECC curves, fine. I'll stick with the crypto.

And, although the security margin gap is closing on SHA-1, some commercial SSH providers, such as Github may still require it for your SSH client. So, in your client config, I would put the preference on SHA-256 first, followed by SHA-1. On your own personal servers, you can disable the SHA-1 support completely.

Thus, I would recommend the following key exchange order:

  1. diffie-hellman-group-exchange-sha256
  2. diffie-hellman-group-exchange-sha1
  3. diffie-hellman-group14-sha1
  4. diffie-hellman-group1-sha1

OpenSSH 6.0 Message Authentication Codes

The following is the default order for message authentication codes:

  1. hmac-md5
  2. hmac-sha1
  3. umac-64@openssh.com
  4. hmac-ripemd160
  5. hmac-sha1-96
  6. hmac-md5-96
  7. hmac-sha2-256
  8. hmac-sha256-96
  9. hmac-sha2-512
  10. hmac-sha2-512-96

Things get interesting here, because with HMAC algorithms, successful attacks require breaking the preimage resistance on the cryptographic hash. This requires a complexity of 2^n, where "n" is the output digest size in bits. MD5 is 128-bits, and SHA-1 is 160-bits. All currently known attacks on MD5 and SHA-1 are collision attacks, and not preimage attacks. Collision attacks require a complexity of only 2^(n/2). Thus, for MD5, collision attacks require a complexity of only 64-bits at worst, and SHA-1 requires 80-bits. However, as we know now, MD5 collision resistance is fully broken in practical time with practical hardware. SHA-1 still remains secure, although its collision resistance has been weakened to 61-65-bits. This is almost practical.

Regardless, the HMAC-MD5 and HMAC-SHA1 remain secure, with wide security margins, due to their preimage resistance. The only concern, however, is that in order to succesfully break the preimage resistance of a cryptographic hash function, it requires first breaking its collision resistance. Because MD5 is broken in this regard, and SHA-1 is almost broken, it is advised to move away from any protocol that relies on MD5 or SHA-1. As such, even though HMAC-MD5 and HMAC-SHA1 remain very secure today, it would be best to disable their support. Interestingly enough, even though RIPEMD-160 has the same digest output space as SHA-1, it has no known collision weaknesses, and remains secure today, almost 20 years since its introduction.

Due to the almost practical collision attacks on SHA-1 with a a complexity of 61-65 bits, UMAC-64 probably does not have a wide enough security margin. As such, it should probably be disabled.

I would recommend the following order for your MACs:

  1. hmac-sha2-512
  2. hmac-sha2-256
  3. hmac-ripemd160

OpenSSH 6.0 Configuration

Okay. Now that we've everything ironed out in hardening our OpenSSH 6.0 connections, let's see how this would look in the client and on the server. For both the client config and the server config, it should support algorithms for both OpenSSH 6.0 and 6.7.

For an OpenSSH 6.0 client, I would recommend this config:

# OpenSSH 6.0 client config
Host *
    Ciphers aes256-ctr,aes192-ctr,aes128-ctr,arcfour256,arcfour128,arcfour
    KexAlgorithms diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
    MACs hmac-sha2-512,hmac-sha2-256,hmac-ripemd160

For an OpenSSH 6.0 server, I would recommend this config:

# OpenSSH 6.0 server config
Ciphers aes256-ctr,aes192-ctr,aes128-ctr,arcfour256,arcfour128,arcfour
KexAlgorithms diffie-hellman-group-exchange-sha256
MACs hmac-sha2-512,hmac-sha2-256,hmac-ripemd160

Going back now to Stribik András' post, here is what your configurations would look like for OpenSSH 6.7:

For an OpenSSH 6.7 client, I would recommend this config. Further, ChaCha20-Poly1305 is a high performance cipher, similar to RC4. So we should prefer it as our first cipher, with AES following, and finally disabling RC4:

# OpenSSH 6.7 client config
Host *
    Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
    KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,hmac-ripemd160,umac-128@openssh.com

For an OpenSSH 6.7 server, I would recommend this config (also disabling SHA-1 from the key exchanges):

# OpenSSH 6.7 server config
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,hmac-ripemd160,umac-128@openssh.com

Conclusion

It's important that you pay attention to the versions of the clients and servers that you are using, so you can accurately set your configuration. In this case, we looked at what would be necessary to support OpenSSH versions 6.0 and 6.7. There may be slight differences in versions between those two, and you'll need to make the necessary adjustments.

]]>
https://pthree.org/2015/01/12/reasonable-ssh-security-for-openssh-6-0-or-later/feed/ 7
Verifying Keybase Identities https://pthree.org/2015/01/02/verifying-keybase-identities/ https://pthree.org/2015/01/02/verifying-keybase-identities/#respond Fri, 02 Jan 2015 14:37:06 +0000 https://pthree.org/?p=3988 When using Keybase, occasionally, people will track your identity. This has cryptographic value. Your identity on Keybase is based on what you do online and how long you have done it. As people track you, they cryptographically sign your Keybase identity. This creates a snapshot in time that states you've taken the precautions to verify the identity, by checking the digital signature of each of their online proofs. This snapshot is frozen in time, and as more and more people track your identity, the stronger the statement of the validity of that identity. In other words, Keybase compliments the PGP Web of Trust, without actually replacing key signing parties, or actually signing PGP keys.

In this post, I want to discuss what it takes to verify signatures of Keybase identity proofs, so you can verify that Keybase isn't doing anything sneaky the data. In this post, I am going to verify the identity proofs of a friend of mine, Joshua Galvez as an example of how to verify each identity proof out-of-band (not using the Keybase client software).

First, all identity proofs are stored in JSON, which is a standardized format. The JSON object is cleanly formatted for easy readability, so you can examine what has been signed, and exactly what you are verifying. Nothing should be hidden up Keybase's sleeves. To start, I am going to navigate to Josh's Keybase identity page. I see that he has proved he owns a Twitter account, a Github account, a reddit account, and a personal website, all with his personal OpenPGP key.

To verify the proofs, I need to get a physical copy of the statement. Again, I am going to do this all out-of-band, away from the Keybase client software. As such, I'll copy and paste each statement proof into a text editor, and save it to disk, as well as each PGP signature. I'll do this with his Twitter account as an example.

Because of the brevity of Twitter, a full JSON object with a PGP signature can't be sent. So, Keybase keeps this proof on their server, with a link in the tweet pointing to the proof. So, we'll need to get it there. The link in his tweet points to https://keybase.io/zevlag/sigs/0Pl859RFLHZuEi7ozQyrbT1cphZCxYQMuoyM. There is a "Show the proof" link on the page, which gives me all the necessary data for verifying his identity. All I need is his JSON object and his PGP signature. I need to combine them in a single file, and save it to disk. As such, my file will look like this:

{
   "body": {
      "client": {
         "name": "keybase.io node.js client",
         "version": "0.7.3"
      },
      "key": {
         "fingerprint": "12c5e8619f36b0bb86b5be9aea1f03e20cf2fdbd",
         "host": "keybase.io",
         "key_id": "EA1F03E20CF2FDBD",
         "uid": "2b26e905f5b23528d91662374e840d00",
         "username": "zevlag"
      },
      "service": {
         "name": "twitter",
         "username": "zevlag"
      },
      "type": "web_service_binding",
      "version": 1
   },
   "ctime": 1416507777,
   "expire_in": 157680000,
   "prev": null,
   "seqno": 1,
   "tag": "signature"
}
-----BEGIN PGP MESSAGE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - https://gpgtools.org

owGbwMvMwMVYnXN9yvZMuQbG0wdeJjGE5BnOrlZKyk+pVLKqVkrOyUzNKwGx8hJz
U5WslLJTK5MSi1P1MvMV8vJTUvWyihWganSUylKLijPz84CqDPTM9YyVanVAykGa
0zLz0lOLCooyQWYpGRolm6ZamBlaphmbJRkkJVmYJZkmpVompiYaphkYpxoZJKcZ
paUkpQCNzMgvLkGxVQlsZnxmClDU1dHQzcDY1cjA2c3IzcXJBShXCpYwSjIyS7U0
ME0zTTIyNjWySLE0NDMzMjY3SbUwMUgxMAApLE4tgnqpKrUsJzEd5FqgWFlmciqS
d0vKM0tKUotwaSipLAAJlKcmxUP1xidl5qUAfYscHIZAlcklmSC9hiaGZqYG5kCg
o5RaUZBZlBqfCVJham5mYQAEOkoFRallSlZ5pTk5IPcU5uUDZYEWAe2zUirOTM9L
LCktSlWq7WSSYWFg5GJgY2UCxRgDF6cALB6Z5wowNEXaavBftOToU3lx8YCcUIzU
Y5FN3LxXjrbNOum2oiQyrMMszfWM5Kz+D3O5ZZJabOUO2v99UPBMRmqZy6ZLdh0n
1t92OPT+7ILwL2+Y+rfoLDJxufXh7ykfZos1L66fVhc+e1HOw9rEuChW+eBJkbCE
3y42k3yXNJ5sSpDc9Ujhcewsw3nuM86G/8tbzGUo9OSERcif0wou4Qtnnj2cs6Nz
nk3qLUvHWRfX753v+mPlNMm2hXZbbjzIF3XaK93JZj/5im6QyA3fjy6CbKGtqi1X
Am+d+ljtGD0h5MPRDLPG3Xtmiqps3Hfo7fGvUUwF6gnzHm7xdD42banr8wMHfZ6q
a26o3nPras+GHwERx898vvTq14lfxgKc7wXOXEl30bN7KTa/TERMXK2nafZT9qOb
7sSvSnjLsj9jw1nVGaKcvtd8855Xr6/eySwtsjPOapVT85JPisu3l/ten/XqxPa8
mRxWAvrLdl9NXNS46zYPW61MwJQPYQfYN3DxTX3ucMO5Qd/EftWeF0depDlsme7S
pTmr0Eey8sqkPo9f63/yfFgg9GPP3PY/22KP7+2Ifm7/N6FZ95xh2bXXM9dPvJ79
gCVVbH5m4OKOTI3aD9bWnVekglJLpwboJb6dfPC4V0vmpgWxb0w49JfE3j8fHHNa
jTnvgo3T+vzZ4mcOG06+/2CpebSKtmlTyIyYPzYbDXWczSPVPN28Yhr7rwIA
=V5X5
-----END PGP MESSAGE-----

I'll save this to disk as /tmp/zevlag-twitter.txt

Now, I just need Josh's public PGP key imported from a key server. I can, and should use Keybase here. Instead of using the MIT PGP key server, and running the risk of getting the wrong key, I can be reasonably confident I will get the correct key from Keybase. The raw public key can be accessed by appending "key.asc" at the end of their identity URL. So, in this case https://keybase.io/zevlag/key.asc. So, I'll grab it via the shell:

$ wget -O - https://keybase.io/zevlag/key.asc 2> /dev/random | gpg --import -

Now that I have Josh's public key imported into my GPG public key ring, I am read to verify Josh's Twitter proof of identity:

$ gpg --verify /tmp/zevlag-twitter.txt
gpg: Signature made Thu 20 Nov 2014 11:23:23 AM MST using RSA key ID B7691E80
gpg: Good signature from "Joshua Galvez <josh@zevlag.com>"
gpg:                 aka "Joshua Galvez (Work - Emery Telcom) <jgalvez@emerytelcom.com>"
gpg:                 aka "keybase.io/zevlag <zevlag@keybase.io>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 12C5 E861 9F36 B0BB 86B5  BE9A EA1F 03E2 0CF2 FDBD
     Subkey fingerprint: DC35 E3CF 1179 41A9 7D72  BC9A 7B6C D794 B769 1E80

At this point, I can confirm that the owner of the private key for 0xEA1F03E20CF2FDBD cryptographically signed a JSON object for Twitter. Further, that individual has access to the Twitter account, so the signature can be posted. After verifying the other accounts, I can be reasonably confident that the individual is who they claim- Josh Galvez. Otherwise, an attacker has successfully compromised all of Josh Galvez's online accounts, as well as his OpenPGP key (or forged a new one), and either compromised his Keybase account, or created one masquerading as him. The former seems more likely than the latter. Further, because I have previously met with and engaged online with Josh, I have no doubt that this is indeed Josh Galvez, and 0xEA1F03E20CF2FDBD is indeed his public key.

So, I can now track Josh through Keybase, which means me cryptographically signing his Keybase identity, and creating a snapshot in time that says "I am reasonably sure this is Josh Galvez, these accounts are part of his online presence, and 0xEA1F03E20CF2FDBD is his OpenPGP key. Staying out of band from the Keybase client software, I can do this entirely with curl(1) and gpg(1).

Navigating to his Keybase identity, I'll click the "Track zevlag" button. A pop-up displays with the following options:

  • in the browser
  • command line with keybase
  • command line with [bash + GPG + cURL]

I have not integrated an encrypted copy of my private key with Keybase, so tracking Josh in the browser is unavailable to me. Further, I wish to do this out-of-band from Keybase anyway, so I'll select "command line with [bash + GPG + cURL]" and click "Continue". This displays that I need to copy and paste the following content into my shell:

echo '{"body": (... large JSON object snipped ....) }' | \
gpg -u 'e0413539273a6534a3e1925922eee0488086060f' -a --sign | \
perl -e '$_ = join("", <>); s/([^\w\.@ -])/sprintf("%%%2.2x",ord($1))/eg; s/ /+/g; print("sig=", $_)' | \
curl -d @- \
  -d type=track \
  -d session=lgHZIDg3ZWNjY2NiNTRiMTBiNThjOTQ2NDJhODA3MzM2NjAwzlSh4WnOAeEzgNkgZjZmNWVmZDg4YzcwZDI2NDNlZGY2ZWYyYTc3M2IyMDLEIM0QqHGrtfga4a%2Bnz7soXFHqFbbiio7PaVGjh7DfyyPG \
  -d csrf_token=lgHZIDg3ZWNjY2NiNTRiMTBiNThjOTQ2NDJhODA3MzM2NjAwzlSkLg7OAAFRgMDEIA8egS4XVUzH%2BkPY8pMJbmMFiN3%2BAdZEdTm7Buvm551L \
  -d plain_out=1 \
  -d uid=2b26e905f5b23528d91662374e840d00 https://keybase.io/_/api/1.0/follow.json

After entering that into my shell, and hitting enter, I am presented with typing in my passphrase for my private key, which in turn signs the object, and uses the Keybase API to post the result. I can then reload my profile, and see that I am now tracking Josh with Keybase. This means that at this point in time, I have made a cryptographic statement regarding the key ownership and identity of Joshua Galvez. Of course, I can revoke that statement at any time, if for any reason I believe his account has become compromised, he himself has become untrustworthy, or for other reasons.

]]>
https://pthree.org/2015/01/02/verifying-keybase-identities/feed/ 0
Keybase and The PGP Web of Trust https://pthree.org/2014/12/31/keybase-and-the-pgp-web-of-trust/ https://pthree.org/2014/12/31/keybase-and-the-pgp-web-of-trust/#respond Wed, 31 Dec 2014 09:16:55 +0000 https://pthree.org/?p=3985 Recently, I have been playing with my Keybase account, and I thought I would weigh in on my thoughts about it compared to the PGP Web of Trust (WoT).

The PGP WoT tries to solve the following two problems directly:

  1. You have the correct key of the person to whom you wish to communicate.
  2. You have verified that the owner of that key is who they claim to be.

These two problems are solved through key signing parties. Two or more people will meet up, exchange key fingerprints, then verify personal identity, usually through government issued identification. Unfortunately, the PGP WoT is complex, and in practice, rarely, if ever used. The idea behind using the PGP WoT is this:

  • I have verified Adam's identity and confirmed I have his correct key.
  • I cryptographically signed his key as a statement of this verification.
  • Adam cryptographically signed Bruce's key, issuing a similar statement.
  • I haven't met Bruce, but I have met Adam, and trust him.
  • Through Adam, I can make a statement about Bruce's claim to identity.

In practice, if I wished to communicate securely with Bruce, I would see if Bruce's key has signatures of individuals that I have cryptographically signed. If so, I can make a weak statement about his identity, and the ownership of his key through that signature. From that standpoint, I can then determine if I wish to communicate securely with Bruce, or not.

Since using GnuPG these past 10 years, I have probably really used the PGP WoT only 2-3 times. Other than that, it makes for a sweet-looking directed graph.

Keybase is not a PGP WoT replacement. IE, it's not here to replace key signing parties, and it's not a tool for signing other's keys. However, Keybase does make strong statements regarding key ownership and identity. In fact, Keybase has given up on the PGP Wot entirely. Rather than validating government issued identification cards in person, Keybase solves identity through online social proofs. This is handled by what you have accomplished online and how long you have been using the account.

Looking first and accomplishing online tasks. When a user signs up for an account at Keybase, they need to prove identities that they own on the web. This is done by inserting some text at the online account, then cryptographically signing it with your private PGP key, and storing the signature at Keybase. This establishes a relationship between the owner of the PGP key and the online account. The more online accounts that the user can establish, the stronger the proof of identity for that individual.

Currently, accounts can be:

  • Twitter
  • Reddit
  • Hacker News
  • Coinbase
  • Github
  • Websites

For each of these accounts, I can pull down the notice, and verify the signature. Thus, each online account becomes coupled with the owner's PGP key. But, it's important to understand that this is making a statement of online activity. IE- "This is my Twitter account @AaronToponce, and I am Aaron Toponce."

Once the accounts have been proved, you can then make statements about other identities through "tracking". Tracking on Keybase is similar to "following" on other social sites, but it's actually cryptographically useful. Each account has a database object of their online identities (all cryptographically signed remember), among other data, including who they are tracking, and who is tracking them.

When you track someone, you cryptographically sign their identity with your personal PGP key. The previous signature is part of that identity, as well as the current signature. Each time someone is tracked, their identity gets cryptographically updated, and anyone can see when those signatures took place. Think of tracking like cryptographic snapshots, or digital photographs.

Tracking is useful for people whom you wish to communicate, are interested in "following" them online. By looking at the previous snapshots, you can get a sense of the age of that account. The older the account, and the more people tracking the account, the stronger the statement of identity, and that the account has not been compromised. Should the account get compromised at any time, people can revoke their tracking snapshot, thus removing the statement of identity.

Will Keybase improve the overall PGP WoT? I hope so. Currently, the accounts that you can make verifiable proofs with are limited, and you'll notice the Big Players like Google, Facebook, and Pinterest are missing. Currently Keybase is in limited invite-only alpha testing, so it makes sense why those accounts are have not been brought into the system yet. However, Keybase will remain only a "geek it up" thing until those services are included in identity proofs. So, if Keybase wants to improve things with PGP in general, it must get those accounts on board, or it won't make a ripple in the world at large.

Oh, and the Keybase client is Free Software.

]]>
https://pthree.org/2014/12/31/keybase-and-the-pgp-web-of-trust/feed/ 0
SHA512crypt Versus Bcrypt https://pthree.org/2014/12/26/sha512crypt-versus-bcrypt/ https://pthree.org/2014/12/26/sha512crypt-versus-bcrypt/#comments Fri, 26 Dec 2014 18:03:52 +0000 https://pthree.org/?p=3968 On the Internet, mostly in crypto circles, you'll see something like the following in a comment, forum post, on a mailing list, other otherwise:

Do not use fast hashes to store passwords on disk. Use bcrypt.

In most cases, however, the understanding of why to use bcrypt isn't entirely clear. You'll hear the standard answer "It's slow", without a real understand as to "how slow?" nor as to "why is it slow?". In this post, I want to explain why bcrypt is slow, some misconceptions about using fast hashes, and where the real strength of bcrypt lies (hint- it's not speed). Finally, I'll close with an alternative that many are starting to talk about as a possible replacement to bcrypt.

First, when people are talking about using bcrypt for password hashing, they are referring to the bcrypt cryptographic key derivation function, designed by Niels Provos and David Mazières. Bcrypt is designed to be intentionally slow and expensive. It was designed specifically with password storage in mind. The motivation is clear- if a password database of any kind is leaked to the Internet, it should be cost prohibitive for password crackers to make any sort of progress recovering the unknown passwords from the known hashes.

bcrypt algorithm
How does bcrypt work though? What is the algorithm? According to the paper, the core bcrypt function in pseudocode is as follows:

bcrypt(cost, salt, input)
    state = EksBlowfishSetup(cost, salt, input)
    ctext = "OrpheanBeholderScryDoubt" //three 64-bit blocks
    repeat (64)
        ctext = EncryptECB(state, ctext) //encrypt using standard Blowfish in ECB mode
    return Concatenate(cost, salt, ctext)

The first function, "EksBlowfishSetup(cost, salt, input)" in the algorithm is defined as follows:

EksBlowfishSetup(cost, salt, key)
    state = InitState()
    state = ExpandKey(state, salt, key)
    repeat (2^cost) // exponential cost by powers of 2
        state = ExpandKey(state, 0, key)
        state = ExpandKey(state, 0, salt)
    return state

In the "EksBlowfishSetup", you'll notice the "repeat" step uses a binary exponential parameter. As the cost is increased, the time it will take to finish the algorithm will take exponentially longer. Bcrypt was designed with this cost parameter to adjust for Moore's law. As computing strength continues to improve, bcrypt should be flexible in its design to adjust for those advancements. This is why the cost parameter is baked into bcrypt, and why people call it "slow".

Finally, you'll notice the "ExpandKey(state, salt, key)" function in the algorithm. It is defined as follows:

ExpandKey(state, salt, key)
    for(n = 1..18)
        P_n  key[32(n-1)..32n-1] XOR P_n //treat the key as cyclic
    ctext = Encrypt(salt[0..63])
    P_1 = ctext[0..31]
    P_2 = ctext[32..63]
    for(n = 2..9)
        ctext = Encrypt(ctext XOR salt[64(n-1)..64n-1]) //encrypt using the current key schedule and treat the salt as cyclic
        P_2n-1) = ctext[0..31]
        P_2n = ctext[32..63]
    for(i = 1..4)
        for(n = 0..127)
            ctext = Encrypt(ctext XOR salt[64(n-1)..64n-1]) //as above
            S_i[2n] = ctext[0..31]
            S_i[2n+1] = ctext[32..63]
    return state

Because bcrypt was based on Blowfish, the "ExpandKey(state, 0, key)" function used in the "EksBlowfishSetup" function is the same as regular Blowfish key schedule since all XORs with the all-zero salt value are ineffectual. The bcrypt "ExpandKey(state, 0, salt)" function is similar, but uses the salt as a 128-bit key.

Also, to clarify, a 128-bit salt is also baked into the algorithm, as you can see. This is to prevent the building of lookup tables for bcrypt, such as rainbow tables. Salts do not slow down crackers, and it's assumed that salts will be leaked with the database. All salts provide is the protection against using a hash lookup table to find the originating plaintext. Because salts are baked into bcrypt, bcrypt lookup tables will never exist. This forces password crackers to brute force the hash.

Understanding password security
There are a few key security elements related to passwords that you must understand. They are the following:

  1. The unpredictability measurement, aka "entropy", of the password provided by the user.
  2. The speed at which brute forcing passwords can commence.
  3. The cryptographic strength of the function itself.

I ordered these for a specific reason- the most likely "weak link" in the chain of password security is password the user provides. History of leaked password databases have shown us that. If users understood real strength behind passwords, they would understand the basic concepts of entropy, even if they weren't familiar with the term itself. If entropy levels were high in all user's passwords, no matter what, then the success of recovering passwords from hashes via brute force would be ineffective. But, 70-80%, and better, of password databases are recovered, because of this simple concept not getting applied. The speed at which password crackers brute forced their way through the hashes in the database would no longer matter, because no amount of practical computing power would be able to work fast enough within the death of the Universe, to recover the user's password.

Sadly, this just isn't the case. People suck as picking passwords.

Key stretching
So, we need to compensate for users picking bad passwords, and bcrypt makes a great leap in this regard. Because of the cost parameter which is part of the algorithm, we can adjust the cost to make password hashing intentionally slow. And, as computing power increases, the cost parameter can continue to be adjusted to compensate. This is what most people understand, when they claim that "bcrypt is slow".

The argument is that cryptographic hashes are designed to be fast, fast, fast. And they're right. Cryptographic hash functions are designed to provide data integrity regardless of the size of the input. If I have a 4.7 GB CD image, I should be able to calculate its digest in reasonable time, so when I transfer the image to another computer, I can recalculate the digest, and compare that the two digests match, in reasonable time.

This would seem like a Bad Thing for password storage, because passwords are short (much shorter than 4.7 GB at least), so password crackers would be able to guess millions or billions of passwords per second using a fast cryptographic hash. You're right, for the most part. Ars Technica ran a story on password cracking with a 25-GPU cluster. It achieves a speed of 350 billion NTLM passwords per second, which means every conceivable Windows NTLM password can be recovered in less than 6 hours using this behemoth. It can work MD5 at 180 billion per second, or 63 billion per second with SHA-1.

At these speeds, the argument against using fast cryptographic hashes to store passwords is sounding pretty strong. Except, that Ars Technica article, and most bcrypt fanboys seem to overlook one thing- key stretching. Just because cryptographic hashes are fast, doesn't mean we can't intentionally slow them down. This is key stretching.

Key stretching is the idea that you reuse internal state for calculating a new key. For cryptographic hash functions, this is "iterations" or "rotations". The idea is taking the cryptographic digest of the input, and using this digest as the input for another hashing round. In pseudocode, you could think of it like this:

salt = random() // programmatically determine a salt randomly
password = input() // get the password from the user
key = ''
cost = 5000

for ROUND in 1 to cost: do
    digest = SHA512(salt, password,  key)
    key = digest
done

If our 25-GPU cluster could work through 50 billion SHA-512 cryptographic hashes per second, by forcing 5,000 SHA-512 calculations before getting to the desired hash, our 25-GPU cluster can now only work through 10 million SHA-512 hashes per second. As the iterative count is increased, the time it takes to calculate the resulting digest increases. As such, we have created a "sha512crypt" that has a similar cost parameter as bcrypt. Now the question remains- does it hold up?

Practical examples
To see if this "key stretching" idea holds up, I wrote two Python scripts- one using SHA-512, and the other using bcrypt. In both cases, I increase the cost parameter from a reasonable starting point, and increased it well beyond a reasonable expectation.

Here is my Python code for "test-sha512.py":

1
2
3
4
5
6
7
8
9
#!/usr/bin/python
import hashlib
password = b'password'
cost = 5000
key = ''
m = hashlib.sha512()
for i in xrange(cost):
    m.update(key+password)
    key = m.digest()

And here is my Python code for "test-bcrypt.py":

1
2
3
4
5
6
#!/usr/bin/python
import bcrypt
cost = 6
password = b'password'
salt = bcrypt.hashpw(password,bcrypt.gensalt(cost))
hash = bcrypt.hashpw(password, salt)

In both cases, I incremented the cost, then timed re-running the script. Of course, Python is an interpreted language, so the absolute times would be much lower if this were implemented in C, or assembly. Further, this was done on my aging T61 laptop. Running it on my 8-core i7 workstation with triple-channel DDR3 would show improved times. It not the times that are critical. What is critical is seeing the exponential back-off as the cost is increased.

Here is a table detailing my findings. Notice that the bcrypt cost increments by a single digit. Because it's binary exponential back-off, the times increase by a power of 2 at each iteration. I also adjusted the sha512crypt cost a little to more closely match the bcrypt timings, even though it's not a strict doubling of each cost value.

bcrypt sha512crypt
cost time iterations time
6 0.032 5,000 0.036
7 0.045 10,000 0.047
8 0.064 20,000 0.064
9 0.114 40,000 0.105
10 0.209 80,000 0.191
11 0.384 160,000 0.368
12 0.745 320,000 0.676
13 1.451 640,000 1.346
14 2.899 1,280,000 2.696
15 5.807 2,560,000 5.347
16 11.497 5,500,000 11.322
17 22.948 11,000,000 22.546
18 45.839 22,000,000 45.252
19 1:31.95 44,000,000 1:30.14
20 3:07.27 88,000,000 3:07.52

In the Python bcrypt implementation, the default cost is "10". For most modern GNU/Linux operating systems, when storing the user password in /etc/shadow with sha512crypt (yes, I didn't come up with the name), the default cost is 5,000 iterations. In both these cases, the cost can be adjusted. In the case of the Python bcrypt module, it's just passing the function with a numerical argument. In the case of GNU/Linux, it's editing PAM by adding "rounds=" to a config file.

As such, sha512crypt can be just as slow as bcrypt. Remember, we are trying to adjust for increased computing power that password crackers will have access to. In both cases, bcrypt and sha512crypt address that requirement.

Bcrypt's additional strength
So, if sha512crypt can operate with a cost parameter similar to bcrypt, and can provide that exponential back-off that we are looking for to slow down password brute force searching, then what's the point of bcrypt? Are there any advantages to running it? It turns out, there is, and I suspect this is a consequence of the design, and not something that was intentionally added.

What we would like is to prevent password crackers from using non-PC hardware on attacking the password database. SHA-2 functions, such as SHA-512, work very well on GPUs. SHA-2 functions work well on specialized hardware such as ASICs and FPGAs. As such, while we could make things slow for CPU or GPU crackers, those password crackers with specialized hardware would still have an upper hand on attacking the password database. Further, by addressing GPU cracking, and making it intentionally slow there, we make like more difficult for CPUs, which means hurting the honest user when trying to login to your web application. In other words, if I adjusted sha512crypt for GPU crackers, such that only 1,000 passwords per second could be achievable on a GPU, that might be a full second, or more, for the user logging into your CPU server. This may or may not be desirable.

So, how does bcrypt attack this problem? Part of the algorithm requires a lookup table stored in RAM that is constantly modified during execution. This turns out to work out very well on a standard PC where the CPU has exclusive access to RAM. This turns out to work out fairly poorly on a GPU, where the cores share the on-board memory, and each core must compete on the bus for access to those registers. As a result, any additional cracking speed is greatly minimized on a GPU when compared to the CPU. In other words, GPU password cracking with bcrypt isn't entirely effective either.

For SHA-2 functions, like SHA-512 however, this is not the case. SHA-2 functions use only 32-bit logic and arithmetic operations, which GPUs excel at. By using a GPU over a CPU for our sha512crypt function, a password cracker can get a couple to many orders of magnitude of additional cracking power.

So, the reason to use bcrypt isn't because "it's slow". The reason to use bcrypt is because "it's ineffective on GPUs".

A better alternative- scrypt
Unfortunately for bcrypt, however, due to its low memory requirement, bcrypt can be implemented in a field programmable gate array (FPGA) or custom ASICs. Understand that bcrypt was designed in 1999, when such specialized hardware had low gate counts, and was few and far bewteen. 15 years later, times have drastically changed. Bitcoin ASICS with SHA-256 FPGAs on board are common place. Hardware AES is common in CPUs and embedded systems. The fact of the matter is, these FPGAs, with their onboard, and fast RAM are well suited to bring bcrypt password cracking well into "fast" territory.

An alternative would be a solution that not only requires address registers to be constantly modified during algorithm execution, but to also exponentially bloat the memory requirement for the increased cost. scrypt addresses this shortcoming in bcrypt. Scrypt is another password key derivation function that was initially designed in 2009 by Colin Percival as part of the Tarsnap online backup service.

Scrypt has all of the advantages that bcrypt provides- baked in salt, exponential cost parameter, and ineffectiveness on GPUs, while also adding an exponential RAM requiremnt per the cost. Because of this RAM requirement, it is no longer cost efficient to build FPGAs with the necessary RAM.

Security, standards, and implementations
Scrypt is only 5 years young. This gives bcrypt a good 10 year head start. In terms of security, this is preferred. We want cryptographic functions that have withstood the test of time with cryptographers constantly attacking and analyzing their functions, primitives, and implementations. The longer it remains "unbroken", the more "secure" we deem the functions to be.

Bcrypt continues to be attacked and analyzed, and is showing no serious sign of weakness, 15 years later. This is good for the security of bcrypt. Scrypt however has seen less scrutiny, mostly due to its young age. However, it has been 5 years, and like bcrypt, no serious signs of weakness have been shown. By comparison, the SHA-2 family of functions was created in 2001, and has been scrutinized much more than bcrypt and scrypt combined, and also is not showing any serious signs of weakness. So, from a security standpoint, the SHA-2, bcrypt, and scrypt functions all seem to be fairly secure.

When looking at governing body standards, NIST has no paper on bcrypt or scrypt. They do recommend using PBKDF2 (another key derivation function (which I haven't explained here, but love)) for password storage, and NIST has standardized on SHA-2 for data integrity. Personally, while I like the ideas of bcrypt and scrypt, I would recommend sticking with the NIST recommendations with high iteration counts, as shown above. Until we see more standards boy interest in bcrypt and scrypt, IMO, you are taking a risk using them for password storage (less so for bcrypt than scrypt at least).

Finally, because of the newness of scrypt, there are less tools for its use in programming languages than bcrypt, and even more so for SHA-2. Further, most programming languages don't include either bcrypt or scrypt in their "standard library" of modules or functions, while SHA-2 is more generally found. And for those implementations, some 3rd party libraries are more trust worthy than others. Because you're dealing with password storage, it's critical you get this right.

Conclusion
While I love the algorithms behind bcrypt and scrypt, I've always advocated for using high iterative counts on the SHA-2 or PBKDF2 functions. Even further is the advocating of teaching people how to understand the concepts behind their password entropy, and improving their own online security. That is the weakest link, and needs the most work, IMO.

So, if you ask me, I'll tell you to use either PBKDF2 or SHA-2 with high iterative counts. Or, if you must absolutely use bcrypt, then I'll recommend that you use scrypt instead.

]]>
https://pthree.org/2014/12/26/sha512crypt-versus-bcrypt/feed/ 3