Image of the glider from the Game of Life by John Conway
Skip to content

Opera, VPNs, and Security

Yesterday, Opera announced that they are bundling a VPN with the latest release of their browser. This is what the release says:

Why we are adding free VPN in Opera

Bringing this important privacy improvement marks another step in building a browser that matches up to people’s expectations in 2016. When you think about it, many popular options offered by desktop browsers today were invented (quite frequently by Opera) many years ago. The innovation energy in the industry has been recently so focused on mobile, even if the desktop is still thriving.

In January, we were reviewing our product plans, and we realized that people need new features in order to browse the web efficiently in 2016. It also became apparent to us that what people need are not the same features that were relevant for their browsers ten years ago. This is why we today have more engineers than ever before working on new features for our desktop browser.

So far we have the native ad blocker. And, we’re introducing another major feature in just a matter of a few weeks; a native, unlimited and free VPN client, right inside your browser!

Enhanced privacy online with Opera’s free VPN

According to Global Web Index*, more than half a billion people (24% of the world’s internet population) have tried or are currently using VPN services. According to the research, the primary reasons for people to use a VPN are:

– To access better entertainment content (38%)
– To keep anonymity while browsing (30%)
– To access restricted networks and sites in my country (28%)
– To access restricted sites at work (27%)
– To communicate with friends/family abroad (24%)
– To access restricted news websites in my country (22%)

According to the research, young people are leading the way when it comes to VPN usage, with almost one third of people between 16-34 having used a VPN.

Better than traditional VPNs

Until now, most VPN services and proxy servers have been limited and based on a paid subscription. With a free, unlimited, native VPN that just works out-of-the-box and doesn’t require any subscription, Opera wants to make VPNs available to everyone.

That’s why Opera’s built-in free VPN feature is easy to use. To activate it, Mac users just need to click the Opera menu, select “Preferences” and toggle the feature VPN on, while Windows and Linux users need to go to the “Privacy and Security” section in “Settings” and enable VPN there. A button will appear in the browser address field, from which the user can see and change location (more locations will appear later), check whether their IP is exposed and review statistics for their data used. It’s free and unlimited to use, yet it offers several must-have options available in paid VPNs, such as:

  • Hide your IP address – Opera will replace your IP address with a virtual IP address, so it’s harder for sites to track your location and identify your computer. This means you can browse the web more privately.
  • Unblocking of firewalls and websites – Many countries, schools and workplaces block video-streaming sites, social networks and other services. By using a VPN you can access your favorite content, no matter where you are.
  • Public Wi-Fi security – When you’re surfing the web on public Wi-Fi, intruders can easily sniff data. By using a VPN, you can improve the security of your personal information.

There were a couple things that stuck out to me rather quickly when reading this press release:

  1. Is it a true VPN, or just an HTTP proxy?
  2. If either a VPN or an HTTP proxy, how is it handling DNS requests?
  3. If an HTTP proxy, is the request through a transparent TLS connection to Opera?
  4. Why is the press release specifically absent about logs and tracking?

Well, some of these questions have been answered. First, it's not a true VPN. Instead, it's just an HTTP/HTTPS proxy. Here's the details:

How the “VPN” works

Once the user enables the feature in settings, Opera VPN sends API requests to https://api.surfeasy.com to obtain credentials and proxy IPs. The browser then talks to a proxy like de0.opera-proxy.net, and its IP address can only be resolved from within Opera when the VPN feature is turned on. It’s an HTTP/S proxy that requires authentication.

When the Opera browser with enabled VPN loads a page, it sends many requests to de0.opera-proxy.net with a Proxy-Authorization request header.

The Proxy-Authorization header decoded:
CC68FE24C34B5B2414FB1DC116342EADA7D5C46B:9B9BE3FAE67
4A33D1820315F4CC94372926C8210B6AEC0B662EC7CAD611D86A3

Since we’re talking about a proxy, these credentials can be used with de0.opera-proxy.net even when connecting from a different machine. This means that if you use the proxy on a computer with no Opera installed, you’ll get the same IP as when using Opera’s VPN.

From this, we can learn that it's not a VPN at all. In fact, it's not even deploying a TLS tunnel for the HTTP/S proxy. So, traditional HTTP requests will still be in the clear, just with a different target. So while a school or library might be filtetring requests based on DNS, this HTTP/S proxy in Opera doesn't address more active smart filtering based on content.

Unfortunately, Help Net Security also suggests the use of a general VPN service provider (emphasis mine):

"What Opera offers is not a VPN as such. It's just a proxy for the browser. You still need a full VPN if privacy is what you care about (and you should care about your privacy). Other tools you use, including for example email clients like Outlook, won’t use this 'VPN'," Špaček told Help Net Security.

VPN service providers are scary. Sven Slootweg posted a "Don't use VPN services" Gist where he addresses some real concerns with using VPN service providers (I don't agree with a couple points):

  • VPN service providers log connections and other metadata.
  • VPN service providers have full accounting and payment information of their customers.
  • VPNs really are just glorified proxies, and don't provide any meaningful security or privacy.
  • VPNs don't obfuscate your IP address like Tor, and your IP address is meaningless to trackers anyway.
  • VPN service providers exist, because it's easy money.

I don't fully agree with a couple points (IP addresses are extremely valuable to trackers), but I think the overall topics Sven is trying to drive home, are the following: know how VPNs work, who has access to data at the VPN endpoint(s), and your security and privacy risks when using a VPN. There are valid times when using a VPN. Data is encrypted between your VPN client and the provider, so it is an easy way to get around restrictive firewalls, which you would think Opera would be trying to address with their HTTP/S proxy. You may also need to access your corporate internal network when "on the road", in which case using your corporate VPN server is needed.

But in both cases, understand the security and privacy concerns when using the VPN. Your VPN provider isn't going to go to jail for you. If the FBI catches unsavory traffic coming out of the VPN provider, you can rest assured they'll give the authorities all the logging, account, and payment information to comply with the request. You can rest assured that if your employer catches you breaking policy with the VPN, you will lose your VPN access, and possibly your job.

So, what to do? Well, realistically, if you want to obfuscate your traffic dynamically, security, and pseudoanonymously, then use Tor. Install a Tor client on your machine, install a Tor proxy extension in your browser, and when you want to get around restrictive firewalls, flip the proxy switch, and get on Tor.

Of course, Tor isn't a security and privacy panacea. You still need to understand the risks associated with using Tor. For example, the extension you installed may not tunnel DNS requests through Tor. Of course, HTTP traffic is still in the clear when it leaves the Tor exit relay. Tor clients and extensions may contain vulnerabilities that reveal metadata about you. Basically, don't be ignorant or stupid with your Tor connection.

Regardless, I think we can take a few things away from this post:

  1. The Opera VPN is just an HTTP/S proxy.
  2. Opera is very likely logging all your traffic.
  3. Your Opera VPN browsing habits are likely unique enough for Opera to identify you.
  4. VPN service providers should be avoided.
  5. VPN service providers also are likely logging all your traffic.
  6. Your VPN service provider won't go to jail for you.
  7. When in doubt, use Tor, just understand the risks.

Tor and the CloudFlare Problem

Before I go anywhere with this post, let me make three things very clear:

  1. I do not work for CloudFlare.
  2. I work for a small local ISP in Utah.
  3. I have been using Tor probably almost as long as many of you have been alive.

I first blogged about Tor in 2006. I had discovered it around 2004, only a couple of years after it's first release. I had used it as a way to prevent my ISP and my employer from tracking what I'm doing with my Internet connection. I would setup a simple SOCKS proxy in Firefox, then switch to it when I wanted to get on the Tor network, and switch away from it when I didn't. Oh, and you think latencies are bad on Tor now? You should have been on it back then.

Here is a metrics graph showing the time it took to download a 50 KiB file over the Tor network. Unfortunately, they don't have the data back when I started using the network, but you get a rough idea of what it was like:

Graph from metrics.torproject.org showing the latencies of downloading a 50 KiB file.

This makes a good deal of sense, because back then, ISPs didn't provide a lot of bandwidth to customers (it can be argued they still don't), and there wasn't a lot of exit nodes in the Tor network to handle the bandwidth (again, it can be argued there still isn't enough):

Graph from metrics.torproject.org showing the bandwidth of of relays in the network.

Graph from metrics.torproject.org showing the size of the Tor network

Spend some time on metrics.torproject.org looking over the historical data, and you'll get a good sense that using Tor in 2004 was a lot like getting data over dial-up. It was anything but pleasant.

What's the point? The point is, that while things can still be improved (we need more exit nodes, and we need more bandwidth on each exit node), the Tor network latencies, bandwidth, and relays is in a good position compared to 12 years ago when I started using it. So running large-scale attacks through the network is now practical.

So, where does CloudFlare fit into this? CloudFlare deploys solving captchas when you wish to consume a service behind the CloudFlare CDN. For example, while connected to Tor, visit medium.com, and you will be presented with a captcha, similar to something like this:

Screenshot of Chromium showing the need to solve a visual coptcha while trying to browse medium.com while connected to Tor.

This has gotten a lot of criticism from the "cypherpunk" millennials who feel that Tor access should be unrestricted. If you follow the "#dontblocktor" hashtag on Twitter, you will see the continued repeated criticism of CloudFlare deploying these captchas to Tor users on their CDN. Some of the arguments include:

  • Solving the captcha may only bring up another, repeatedly, never being able to consume the website in question.
  • Visually impaired users cannot solve the visual captchas.
  • Non-native English speakers will not be able to solve the audio version of the cpatcha.
  • People using browsers that disable JavaScript will not be able to reach the page.
  • There may be other security concerns where the choice of Tor is preferred over not using Tor.

No doubt, all captchas on the Web should be reconsidered. Personally, for JavaScript enabled browsers, I think forcing a proof-of-work puzzle onto the browser is transparent, and provides exactly the sort of rate-limiting needed for mitigating large-scale malicious attacks. For non-javascript puzzles, captchas seem to be the best alternative. But, I'm sure as a society, can can find alternatives to non-javascript browsers (such as network-based proof-of-work puzzles).

No doubt physical limitations, such as visual or audible impairments, can make solving a visual and audible captcha challenging, if not impossible. I don't have good solutions here except for JavaScript-based proof-of-work puzzles. But the real question that need to be addressed, is why is CloudFlare deploying captchas for Tor users?

CloudFlare addressed this due to the on-going criticism a select few on Twitter have giving the company. The blog post "The Trouble with Tor" basically comes down to the following:

  1. You must pick two between: security, anonymity, and convenience.
  2. CloudFlare is a large CDN that deals regularly with malicious traffic sourced from Tor exit relays.
  3. Captchas are a compromise, allowing Tor users to remain anonymous, while also getting access to the website.
  4. A CloudFlare CDN customer has an option in their control panel to whitelist Tor or captcha Tor.
  5. CloudFlare is investigating "blind token" proof-of-work client puzzles for something long-term.

I don't see anything unreasonable here. As a system administrator and security engineer for XMission, I understand and sympathize with CloudFlare's stance toward captachas, even if I don't agree with the implementation of the captcha itself. I have had to fight off malicious Tor traffic from our network many times during my employment, such as DNS and NTP amplification attacks, HTTP POST DDoS attacks, SQL injection and XSS attacks, and many others.

So, even as CloudFlare put it in their reasonable post, how do you allow honest Tor users with high degrees of convenience to consume the website while also minimizing and proactively mitigating malicious Tor traffic?

Again, I don't care for captchas, and wish they would die in a fire. But, what should CloudFlare do? Should they abandon the captcha altogether? If so, how should they proactively prevent malicious Tor traffic from negatively impacting their customer base? It's easy and knee-jerky to post screenshots to Twitter with the "#dontblocktor" hashtag, and shame CloudFlare and the customer using the CDN. I don't think that's the right approach, personally (nevermind that a captcha isn't a block (yes, semantics are important)). I'm curious how many of those who are reacting to CloudFlare captchas are actual system or network administrators that have to deal with these attacks. Instead, I would try to architect solutions to the problem.

Personally, I see the following:

  • Consume CloudFlare without Tor. There are no captchas, but you sacrifice a level of anonymity.
  • Consume CloudFlare behind Tor, but understand the compromise you are making to solve captchas sacrificing convenience.
  • Consume CLoudFlare beind a VPN, thus providing both anonymity and convenience.

If it really bothers you that you have to solve a captcha to reach a CloudFlare website, then rather than shaming CloudFlare, it might be worth your time to reach out to the site operator, and let them know about whitelisting Tor. If they engage in conversation, they may not have been aware of the configuration option, or they may have reasons why they want you to solve the captcha. Either way, you've come out ahead without the knee-jerking of #dontblocktor.

I guess in conclusion, while I hate captchas as much as the next guy, what would you do if you were employed by CloudFlare and in charge of this problem? What is a reasonable solution to keeping customers happy by mitigating malicious Tor traffic while also allowing honest Tor users to consume the website with high levels of convenience? Let's engage in discussion about how to create and architect these solutions, so we get as many people happy as possible- CloudFlare network admins, customers, and clients.

A final note about the term "block". The CloudFlare captcha is not blocking you from the reading the website. Instead, it's rate-limiting you. Some will argue that you get caught in endless captcha loops, consistently solving them over and over, never to actually reach the service. Personally, I have never encountered this, but others swear it exists. At most, I've had to solve 3 captchas in a row, usually because I did not solve them quick enough. I guess the effect is the same, but as already mentioned, the "#dontblocktor" hash tag is a knee-jerk, and incorrectly placed. Semantics are important, because CloudFlare is not actually blocking Tor, like Akamai does with "Access denied". It's one thing to provide a 502 HTTP error, it's quite another to rate limit requests.

Two OCB Block Cipher Mode Patents Expired Due To Nonpayment

Peter Gutmann on the "[Cryptography]" mailing list wrote some thoughts about the impending crypto monoculture of all-things-Bernstein that seems to be currently sweeping the crypto world. In his post, he mentions the following (emphasis mine):

The remaining mode is OCB, which I'd consider the best AEAD mode out there (it shares CBC's graceful-degradation property in which reuse or misuse of the IV doesn't lead to a total loss of security, only the authentication property breaks but not the confidentiality). Unfortunately it's patented, and even though there are fairly broad exceptions allowing it to be used in many situations, the legal minefield that ensues makes it untouchable for most potential users. For example does the prohibition on military use cover the situation where an open-source crypto package is used in a vendor library that's used in a medical insurance app that's used by the US Navy, or where banking transactions protected by TLS may include ones of a military nature (both of these are actual examples that affected decisions not to use OCB). Since no-one wants to call in lawyers every time a situation like this comes up, and indeed can't call in lawyers when the crypto is several levels away in the service stack, OCB won't be used even though it may be the best AEAD mode out there.

Dr. Matthew Green also wrote about authenticated encryption and block cipher modes. He had this to say about OCB mode (emphasis mine):

In performance terms Offset Codebook Mode blows the pants off of all the other modes I mention in this post. It's 'on-line' and doesn't require any real understanding of Galois fields to implement** -- you can implement the whole thing with a block cipher, some bit manipulation and XOR. If OCB was your kid, he'd play three sports and be on his way to Harvard. You'd brag about him to all your friends.

I've known that OCB mode was patented, and as a result, why it has not been included in OpenSSL and other cryptographic protocol implementations. Peter said it correctly, it is a legal minefield. However, I wanted to read up on the patents, their design, operation, etc., mostly because I wanted to get out of doing the dishes. Discover my shock when I stumbled upon the following:

  • Patent 7,046,802 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Lapsed
  • Patent 7,200,227 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Lapsed

Not fully understanding what "Lapsed" means, I went to the official source: The United States Patent and Trademark Office website. I searched for those two patent numbers, and got the following:

  • Patent 7,046,802 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Patent Expired Due to NonPayment of Maintenance Fees Under 37 CFR 1.362
    • Status Date: 06-06-2014
  • Patent 7,200,227 - Method and apparatus for facilitating efficient authenticated encryption
    • Status: Patent Expired Due to NonPayment of Maintenance Fees Under 37 CFR 1.362
    • Status Date: 05-04-2015

Sure enough, Phillip Rogaway's first two patents regarding the OCB block cipher mode of encryption are expired due to nonpayment. I had to tweet this:

Patents 7949129 (Method and apparatus for facilitating efficient authenticated encryption) and 8321675 (Method and apparatus for facilitating efficient authenticated encryption) are still valid however. I'm not sure how this applies to the Charanjit Jutla's IAPM mode patents now owned by IBM. Also, I don't know exactly what OCB modes patents 7,046,802 and 7,200,227 cover. OCB1 and OCB2? if someone can comment here, that would be great.

So, what does this mean for the cryptography world? It means that OCB covered by those two patents can now be implemented royalty-free, without fear of legal entanglements, in Free Software as well as proprietary and commercial software. OpenSSL, LibreSSL, BoringSSL, OpenPGP, Open Whisper Systems Signal, and so many other protocols, projects, and software should be able to implement OCB now.

All because Phillip Rogaway did not make the payments necessary to keep the patent valid. Two more software patents bite the dust.

Linux Kernel CSPRNG Performance

I'm hardly the first one to notice this, but I was having a discussion in ##crypto on Freenode about the Linux kernel CSPRNG performance. It was mentioned that the kernelspace CSPRNG was "horrendously slow". Personally, I found the performance sufficient for me needs, but I decided to entertain his definition. I'm glad I did; I wasn't disappointed.

Pull up a terminal, and run the following command, passing 10GB of data from /dev/urandom to /dev/null:

$ dd if=/dev/urandom of=/dev/null bs=1M count=1024 iflag=fullblock  
1024+0 records in  
1024+0 records out  
1073741824 bytes (1.1 GB) copied, 80.1537 s, 13.4 MB/s
$ pv < /dev/urandom > /dev/null # cancel in a different terminal, unless you have "-S"
1.02GB 0:01:20 [13.3MB/s] [                   < =>                              ]

13.4 MBps of throughput for reading data directly out of the kernelspace CSPRNG. But, can we do better?

In the ##crypto channel, and as should be across development mailing lists, forums, groups, and discussion channels, I recommend that developers should not generally develop their own userspace CSPRNG. There are all sorts of pitfalls and traps waiting for you when you attempt it. Unless you know what you're doing, you could end up with a CSPRNG that isn't actually cryptographically secure (the "CS" in "CSPRNG").

However, what happens when I do actually run a userspace CSPRNG on the same machine? What can I expect out of performance? For example, I could implement AES-128 in CTR mode as a CSPRNG. In fact, we can do this with OpenSSL:

$ dd if=/dev/zero bs=10M count=1024 iflag=fullblock 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 15.3137 s, 701 MB/s
$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null
31.9GB 0:00:34 [ 953MB/s] [                                  < =>               ]

700-950 MBps (notice that dd(1) incurs a performance penalty). That's 52-70x the speed of reading the kernelspace CSPRNG directly. That's more than a full order of magnitude faster. However, this is on a box with AES-NI. What about disabling AES-NI on the same box? How badly does it damage performance, and how does it compare to reading the kernelspace CSPRNG? We can use OpenSSL speed(1SSL) to benchmark algorithms.

First, with AES-NI enabled:

$ openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr     468590.43k  1174849.02k  1873606.83k  2178642.60k  2244471.47k

And with AES-NI disabled:

$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-ctr 2> /dev/null  
(...snip...)  
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      74272.21k    83315.43k   340393.30k   390135.47k   391279.96k

In this case, we see about a 5x performance improvement when using the AES-NI instruction set as compared to when not using it. That's significant. And even with AES-NI disabled in userspace, we're still outperforming /dev/urandom by almost 30x.

Interestingly enough, even the OpenBSD CSPRNG (different hardware than previously tested), which uses ChaCha20, outperforms the Linux CSPRNG (although its userspace CSPRNG with openssl(1) doesn't outperform kernelspace):

% dd if=/dev/urandom of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 13.630 secs (78775541 bytes/sec)
% dd if=/dev/zero bs=1M count=1024 2> /dev/null | openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt | dd of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes transferred in 33.498 secs (32052998 bytes/sec)
% openssl speed -elapsed -evp aes-128-ctr 2> /dev/null
(...snip...)
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ctr      41766.37k    46930.74k    49593.54k    50669.32k    50678.33k

Roughly 78 MBps for OpenBSD on an Intel Xeon CPU running at 2.80GHz. Basically, six times the speed of the Linux kernel CSPRNG on an Intel Xeon CPU running at 2.67GHz.

So why is the Linux CSPRNG so slow? And, what can we do about it? Well, first, the kernel is using SHA-1 for its cryptographic primitive. In very loose terms, the CSPRNG hashes the input pool with SHA-1, and spits out the output to /dev/urandom. It's output is also its input, so its digesting its own output.

But, that's not all it's doing actually. The first function actually adds data into the input pool without increasing the entropy estimate. Then, after adding those bytes, the input pool is mixed with a Skein-like mixing function. Then some math is done to credit the entropy estimator, and the system is polled for data to add to the input entropy pool. Things like disk IO, CPU timings, interrupts, and user activity. Finally, we're ready to hash the data. This is done by extracting the data out of the input pool, and hashing it with SHA-1. But, we don't want any recognizable output, so the output is left-rotated and folded in half. Then, and only then, is the data ready for consumption.

W.T.F.

Unfortunately, the Linux kernel CSPRNG is not based on any sound theoretical security design. It's very much a hodge-podge home-brew design by developers who think they know what they're doing, when in reality, they don't. In 2013, a security audit and analysis was performed on the Linux kernel CSPRNG (PDF), and concluded that not only is it not robust, but it has some weaknesses:

In the literature, four security notions for a PRNG with input have been proposed: resilience (RES), forward security (FWD), backward security (BWD) and robustness (ROB), with the latter being the strongest notion among them.

(...snip...)

Distributions Used in Attacks based on the Entropy Estimator As shown in Section 5.4, LINUX uses an internal Entropy Estimator on each input that continuously refreshes the internal state of the PRNG. We show that this estimator can be fooled in two ways. First, it is possible to define a distribution of zero entropy that the estimator will estimate of high entropy, secondly, it is possible to define a distribution of arbitrary high entropy that the estimator will estimate of zero entropy. This is due to the estimator conception: as it considers the timings of the events to estimate their entropy, regular events (but with unpredictable data) will be estimated with zero entropy, whereas irregular events (but with predictable data) will be estimated with high entropy.

(...snip...)

As shown in Section 5.7, it is possible to build a distribution D0 of null entropy for which the estimated entropy is high (cf. Lemma 3) and a distribution D1 of high entropy for which the estimated entropy is null (cf. Lemma 4). It is then possible to mount attacks on both /dev/random and /dev/urandom, which show that these two generators are not robust.

(...snip...)

We have proposed a new property for PRNG with input, that captures how it should accumulate the entropy of the input data into the internal state. This property actually expresses the real expected behavior of a PRNG after a state compromise, where it is expected that the PRNG quickly recovers enough entropy. We gave a precise assessment of Linux PRNG /dev/random and /dev/urandom security. In particular, we prove that these PRNGs are not robust. These properties are due to the behavior of the entropy estimator and the mixing function used to refresh its internal state. As pointed by Barak and Halevi [BH05], who advise against using run-time entropy estimation, we have shown vulnerabilities on the entropy estimator due to its use when data is transferred between pools in Linux PRNG. We therefore recommend that the functions of a PRNG do not rely on such an estimator.

Finally, we proposed a construction that meets our new property in the standard model and we showed that it is noticeably more efficient than the Linux PRNGs. We therefore recommend to use this construction whenever a PRNG with input is used for cryptography.

TL;DR? The Linux CSPRNG does not meet the definitions of a secure CSPRNG per the PDF. It's not that it's theoretically broken, it's just not theoretically secure either. It's really nothing theoretically at all. This isn't great.

A replacement for random.c in the kernel would be to ditch the homebrew entropy collection, mixing, and output mangling, and instead, stick with AES-128 in CTR mode. Of course, as per the PDF, the entropy collectors need serious work, but if AES-128-CTR was deployed as the CSPRNG instead of SHA-1, then the generator could take advantage of hardware AES performance, which as I've shown, is exceptionally superior. It's frustrating, because the kernel already ships AES, so the code is already there. It's just not being utilized.

The Linux kernel could have 1 GBps in CSPRNG output, but is deliberately choosing not to. That's like having a V12 turbo-charged sleeper, without the turbo, and only firing on 3 of the 12 cylinders, with a duct taped muffler on the back.

Why does 1 GBps of performance matter? How about wiping hard drives or secure data removal in general? With 20 MBps, we can't even saturate a single drive in IOPS. With 1 GBps, we could saturate many simultaneously. As someone who wipes old employee workstations when they leave the company, backup servers with dozens of drives, or old decommissioned hardware, I see great benefit here.

Or, how about HTTPS web sites for a shared web hosting provider? I have seen countless times HTTPS and SSH connections lag due to waiting on the CSPRNG. Not that it's being intentionally blocked, but because the load is so intense on the server, it just can't generate enough cryptographic randomness to keep up with requests.

I'm sure there are plenty of other examples where end userspace applications could benefit with improved performance of the CSPRNG. And, as shown, it can't be that difficult to implement correctly. The real question is, of course, who will do the work and submit the patch?

Cryptographic Hashing, Part I- Introduction

Introduction

Lately, I've been seeing some discussion online about cryptographic hashing functions, along with some confusion between a cryptographic digest, a cryptographic signature, and a message authentication codes. At least in that last post, I think I did well defining and clarifying the differences between those terms, but I also feel like I could take this discussion a lot further. So, I decided to dedicate a series to generic cryptographic hashing functions, which will include building compression frameworks with security proofs, specific implementations of cryptographic hashing functions, and some implementations of these functions. So, without further ado, let's get started.

Collisions

When we talk about a hashing function (cryptographic or otherwise), we are referring to any function that can take an arbitrary length of data, and compress it into a fixed-length digest. Typically, this digest is called a "fingerprint", a "checksum", or a "hash". The goal, is that any time we input the same data, our function outputs the same digest. Further, it's important that not only can I produce that digest, but anyone can produce the same digest. This gets us prepared for the Random Oracle, but we still have some ground to cover first.

Because our hashing function has a fixed length output, say 128-bits, then an ideal function would map every input to one of those outputs. In other words, our function maps an element in the domain (our data to be hashed) to exactly one element in the range (our actual hash). So, if our function produces 128-bit digests, then there are a total of 2^128 digests in the range. This means, that we have at least a one-to-one mapping of elements in the domain to elements in the range. Again, speaking about an ideal hashing function.

However, we know that there are many more inputs than just 2^128; there are infinitely many, actually. But think about it for a second. Take the number zero, and send it through our hashing function. Increment that number by 1, then hash that number. Continue in this manner, assuming infinite computing resources and infinite time, until you've hashed every number between 0 and 2^128. Ideally, you've produced exactly 2^128 unique digests. But, what happens when you now want to hash 2^128+1? Now we have what is called a collision. In other words, two distinct inputs was hashed to the same output. To put it formally:

Definition: A collision is when two distinct pieces of data hash to the same digest, checksum, or fingerprint.
Theorem: For any fixed-length hashing function, there are infinitely many collisions.
Proof: This can be proven using the pigeon-hole principle. Given a fixed-length hashing function of n-bits of output, hashing n+1 inputs from the domain will produce a collision in the range. As n tends to infinity, the collisions tend to infinity. Q.E.D.

I don't think I need to tell you how much larger infinity is to 128-bits. As a result, collisions are overwhelming. In fact, would you like to see a collision in practice? Below are 2 different hexadecimal strings. The differences are very subtle, but they indeed distinct (emphasized in bold red). Here, we'll take the two strings, and hash them with the known MD5 algorithm. Then, just to show I'm not cheating, we'll hash the same strings with SHA-1. While we produce a collision in MD5, we have distinct digests with SHA-1. Go ahead, and verify that you get the same results.

$ INPUT1=d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89\
55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b\
d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0\
e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
$ INPUT2=d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89\
55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b\
d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0\
e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
$ printf "$INPUT1" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT2" | xxd -r -p | md5sum
79054025255fb1a26e4bc422aef54eb4  -
$ printf "$INPUT1" | xxd -r -p | sha1sum
a34473cf767c6108a5751a20971f1fdfba97690a  -
$ printf "$INPUT2" | xxd -r -p | sha1sum 
4283dd2d70af1ad3c2d5fdc917330bf502035658  -

Crazy, right? With an ideal hashing function, it should be at least as difficult as a brute force search to find these collisions, and it should take searching an entire 128-bit domain to find a collision. Unfortunately, however, finding blind collisions with a brute force search turns out to be much faster, thanks to the Birthday Paradox. The Birthday Paradox says the following:

In a room of just 23 people, there is a 50% probability that at least two of them share the same birthday. In a room of just 75 people, there is a 99.9% probability that at least two of them share the same birthday.

Wait, what? Uhm, last I checked, there are 366 days days in a year, assuming leap year. Soooo, if there are 23 people in a room, then there should be a 23/366, or about a 6% probability that two people share the same birthday. Unfortunately, this isn't how it works. There may be a 6% chance someone shares your birthday, but there is a 50% chance two arbitrary people share the same birthday. Now do you see the problem? Not only must you compare your birthday to everyone, but so must everyone else. This is a case of permutations. So, with 23 people in the room, there are actually 253 possible comparisons that must be made (23*22/2). The math gets a little hairy, and to be honest, it's a bit outside the scope of this post, and this series (it's going to be long enough as it is). Refer to the Wikipedia article if you want to work through the theory and the proof.

We can use this Birthday Paradox to work out an attack on finding two distinct inputs that produce an identical digest. This is called the Birthday Attack, and it's the primary driver in finding collisions. The attack basically says something like this:

To find a collision in a n-bit range with approximately 50% probability, you need to only search the the square root of 'n' of elements in the domain.

So, for a 128-bit digest (2^128 possible distinct outputs), using the Birthday Attack, I only need to search 2^64 possible inputs to have approximately a 50% probability that I have found a collision. If you don't think 2^64 is very small, the bitcoin network is currently mining 2^64 SHA-256 digests about every 20 seconds.

Blind, preimage, and second preimage collisions

Armed with this knowledge, we can now formalize some definitions of collision attacks. This might be confusing, so I'll define it first, then give some examples.

Collision attack:
A blind search, where two distinct inputs produce the same digest.
Preimage attack:
A search to find an input that matches a defined digest.
Second preimage attack:
A search to find a second file that matches the digest of a defined file.

Let's break these down individually. A collision search is literally a blind search, without any respect to inputs or outputs. You don't know what the inputs will be nor do you know what their outputs will be. You only know that you have found two distinct inputs that collide to the same output, all of which is entirely arbitrary.

A preimage attack is where you have a digest in your possession, but you would like to find an input that matches it. In this case, while the input is completely arbitrary, the output is static. For example, suppose you have the 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9". A preimage attack would be successfully finding the input that produced it. In this case, it was "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz".

A second preimage attack means having both the input "Cz3eJlm4I2I2rHt8hioZ7evonLyukwlz" and its 256-bit hexadecimal digest "ec58d903a9f9dcc9d783da72401b1c94fc8fb9d9623d7141b8b90997382088f9", and finding a second input that produces that same digest.

Usually, when breaking cryptographic hash functions, the first thing to break is the compression function, which I'll cover in later posts. Once the compression function is broken, the next step is to break searching for blind collisions. This is generally done by analyzing the weaknesses in mathematics, find bias in the output, observe the quality of the avalanche effect, and so forth. You eventually learn where the hashing function is weak, and where you can take "shortcuts" to get to your goal. Eventually, the algorithm is broken to the point that finding blind collisions is practical. MD5 is broken in this regard.

After breaking the compressing function, and weakening the algorithm to the point of practical collision attacks, preimage attacks become the next focus of analysis. However, when the compression function is broken, such as in the case of SHA-1, it's a strong sign to start moving away from the algorithm, long before you find collisions. So, analysis tends to slow down after collisions have been found, because no one should be using the function anymore. This also means continuing to find second preimage collisions gets even less attention.

Avalanche Effect

The final property of cryptographic hashing functions that needs to be addressed is the "avalanche effect". It is absolutely critical in cryptographic hashing functions that even though inputs may be sequential, their outputs do not show that to be the case. For example, consider the SHA-256 of the first 10 digits:

$ for I in {1..10}; do printf "$I: "; printf "$I" | sha256sum -; done
1: 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b  -
2: d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35  -
3: 4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce  -
4: 4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a  -
5: ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d  -
6: e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683  -
7: 7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451  -
8: 2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3  -
9: 19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7  -
10: 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5  -

Notice that there is no clear indication on sequential digests. For all practical purposes, they are truly randomized output, despite the sequential input (merely flipping a single bit on each input from the previous). However, can we formally define the avalanche effect? What would be ideal is that with each bit change on the input, every bit in the digest output has as close to a 50% chance of being flipped as theoretically possible.

I'll talk more about "rounds" in future posts when I talk about specific implementations and designs. Suffice it to say that a cryptographic hashing function will iterate through the compression functions a certain number of times, before outputing the state. On each round, the bits in the output should each have a 50% chance of being flipped. So, on each output of each round iteration, close to half of the bits have been flipped in some pseudorandom manner. After a certain number of rounds, the final output should be indistinguishable to true random noise.

So, how about this as a formal definition:

When a single input bit is flipped, each output bit should change with a 50% probability.

Of course, the cryptographic strength doesn't rest solely on the avalanche effect. There are mathematical properties that determine that. But, the output should be completely unpredictable. You could apply the "next bit test", in that there is no algorithm you could produce that would determine the next state of the next bit, without actually compromising the state of the machine (this is a test held to cryptographically secure pseudorandom number generators).

Unfortunately, all we have to test the avalanche effect is standard randomness tests, such as the chi-square distribution, Monte Carlo for Pi, and the Birthday Paradox, among others. This doesn't say anything about the cryptographic strength of the hashing function, but says a lot about randomness properties (non-cryptographic hashing functions can also exhibit strong randomness qualities).

There are a couple software utilities we can use to test and analyze cryptographic hashing functions. First, we have standard randomness tests, such as Dieharder and the FIPS 140-2 suite. But, for something more specific on analyzing cryptographic primitives, I would recommand Cryptol. On the one side, this isn't an out-of-the-box software solution for just running a battery of tests and analysis. It is actually a domain-specific language that will require a bit of a learning curve. On the other hand, it's Free Software, and you'll probably learn more about cryptanalysis with this tool, than just playing with randomness tests.

Conclusion

This was just a primer post to get you thinking about cryptographic hashes, specifically thinking about their output, and the task of finding collisions. The rest of the posts in the series will cover specific functions such as MD5, SHA-1, -2, and -3, as well as some others. We'll talk about hashing constructions, and where you'll find cryptographic functions in practice (I think you'll be surprised). I may even throw in a post or two about random oracles, and how we want cryptographic hashing functions to not only imitate them, but be proven secure under the "Random Oracle Model".

Regardless, this post will get you started, and hopefully excited for what is to come.

Manual Authenticated File Encryption With OpenSSL

One thing that bothers me about OpenSSL is the lack of commandline support for AEAD ciphers, specifically AES in CCM and GCM block modes. Why does this matter? Suppose you want to save an encrypted file to disk, without GnuPG, because you don't want to get into key management. Further, suppose you want to send this data to a recipient or store it on a server outside of your full control. The authenticated encryption is important, otherwise the ciphertext is malleable and vulnerable to bit flipping.

So, when you get to the shell, you may try using AES in GCM mode with OpenSSL's "enc(1)" command, only to be left wanting. Here, we generate a key from /dev/urandom, convert it to hexadecimal, and provide the key as an argument on the command line.

$ LC_CTYPE=C tr -cd 'abcdefghjkmnpqrstuvwxyz23456789-' < /dev/urandom | head -c 20; echo
sec2tk24ppprcze33ucs
$ echo sec2tk24ppprcze33ucs | xxd -p
73656332746b323470707072637a6533337563730a
$ openssl enc -aes-256-gcm -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
AEAD ciphers not supported by the enc utility
$ echo $?
1

So, rather than using GCM, however, we can build the authentication tag manually with HMAC-SHA-512, which OpenSSL does support. This means using a non-authenticated block cipher mode, such as CTR, as a first step, then authenticating the ciphertext manually as a second step.

Using our same password from the previous example, we'll do this in two steps now:

$ openssl enc -aes-256-ctr -k 73656332746b323470707072637a6533337563730a -out file.txt.aes -in file.txt
$ openssl dgst -sha512 -binary -mac HMAC -macopt hexkey:73656332746b323470707072637a6533337563730a -out file.txt.aes.mac file.txt.aes

Now you have three files- your plaintext file, your AES encrypted ciphertext file, and your HMAC-SHA-512 authentication file:

$ ls -l file.txt*
-rw-rw-r--. 1 aaron aaron 1050 Feb 27 10:26 file.txt
-rw-rw-r--. 1 aaron aaron 1066 Feb 27 10:27 file.txt.aes
-rw-rw-r--. 1 aaron aaron   64 Feb 27 10:28 file.txt.aes.mac

When sending or remotely storing the "file.txt.aes" file, you'll want to also make sure the "file.txt.aes.mac" authentication file is accompanied with it. Unfortunately, the OpenSSL dgst(1) command does not support verifying message authentication codes, so you'll have to script this manually. So, you'll need to generate a second file, maybe "file.txt.tmp.mac", then compare the two. If they match, you can decrypt the "file.txt.aes" ciphertext file. If not, discard the data.

This isn't elegant, and I wish enc(1) supported AEAD, but as it stands, it doesn't. So, you'll have to stick with doing things manually. However, this is something simple enough to script, and provides both data confidentiality and authenticity, which should be the goal of every ciphertext.

Digest Algorithms in Google Spreadsheets

I can't imagine there are a lot of uses for using digest algorithms in spreadsheets, but I came up with one, and I really wished I had access to them. Seeing as though most spreadsheet applications don't ship one, I figured I would create my own.

Mostly, I use Google for my document processing and spreadsheet use, and I had a spreadsheet of Louis L'Amour books. My grandfather gave me his entire selection of Louis L'Amour books last year, and I made a goal to read them all during 2016. I grew up listening to them on audiotape in the truck when I was on the road with him laying carpet, heading up to the family cabin in Idaho, and other things, so, I have memories of many of the stories. It will be fun to read them.

So, what do digest algorithms have to do with Louis L'Amour novels? Well, after the spreadsheet was created to track what I've read and what I have left, as well as a pace (I have to be reading at least 60 pages every day), I wanted to start reading the books in a random order. Sure, I'll read the Sackett, Hopalong Cassidy, Talon & Chantry, and Kilkenny series first, but when I'm finished with the series, I want to read the novels in random order. Why? Because I don't want to get caught up in published year watching him change as a writer, or go in alphabetical order, because that's boring. Randomness is exciting!

Now I could have used the =RAND() function in the spreadsheet, but when I sort the columns, the numbers change. So, I need to copy and paste their values, then sort the columns. Besides, is RAND() even cryptographically secure (indistinguishable from true random noise)? Even better, I could just get ASCII data off of /dev/urandom and paste those results into the column, then sort off of that. But that requires using an external tool. However, I could also use a digest algorithm to calculate the digest of the book title, then sort by the digest. Because digest algorithms aren't part of the Google Spreadsheet default functions, my OCD kicked in, and I had to create one.

Here is what I came up with. You'll notice that MD2(), MD5(), and SHA1() are created, even if they're not cryptographically secure for today's modern cryptographic applications. However, in this specific use case, such as sorting columns, they are fine. Also, notice that SHA256(), SHA384(), and SHA512() exist, but not SHA224(). This is because "Utilities.DigestAlgorithm" does not export a "SHA_224" algorithm, which in my opinion, is just odd. MD4 is also not available, nor any of the SHA-3 functions. Regardless, all the digest algorithms supported by the API are available.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Cryptographic hash functions for use in Google Spreadsheets
// Use with =MD5(string)
// A string or cell (or concatenation of cells) can be provided
// Output in hex
// Released to the public domain

function MD2(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD2, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function MD5(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.MD5, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA1(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_1, s)
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA256(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_256, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA384(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_384, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}
function SHA512(s) {
  var hexstr = '';
  var digest = Utilities.computeDigest(Utilities.DigestAlgorithm.SHA_512, s);
  for (i = 0; i < digest.length; i++) {
    var val = (digest[i]+256) % 256;
    hexstr += ('0'+val.toString(16)).slice(-2);
  }
  return hexstr;
}

So, how do we apply it? Because this is a script which is applied to your spreadsheet, you need to add it manually every time you create a new spreadsheet document. Supposedly, you can permanently add it through the Chrome Web Store (I created a project that is currently pending review), but for the time being, copying/pasting works.

Navigate to "Tools > Script Editor", remove the default function Google provides, and add the code above. Save it as a project, then it will be available to your spreadsheet. Now you can use it.

For example,

To calculate the MD5 of cell "A1":
    =MD5(A1)

To calculate the SHA1 of cells "A1" through "D1":
    =SHA1(CONCATENATE(A1:D1))

Retrieve the 12 left-most characters from a SHA512 digest of cells "A1" through "D1":
    =LEFT(SHA512(CONCATENATE(A1:D1)), 12)

I'm sure there are a few uses for digest algorithms in spreadsheets, they're just not very common; at least web searching for their use gives me very scant results. If you find these helpful, even if there are other solutions, I would be interested in how you used them in the comments below.

Oh, and here's my Louis L'Amour spreadsheet tracking my reading progress. 🙂

My Strange Tweets

You may have noticed some tweets from me that look.... strange. Probably something like these:


First, let me provide some background. When Twitter was announced, a couple Free Software developers got together to create a self-hosted Free Software alternative. They called that alternative "Identica", because it was hosted in Canada, and a way to establish your social identity. It made sense, and the Free Software and Open Source ecosystem ate it up. Within no time, it was a thriving online social network, involving mostly those from the Free Software and Open Source world, with all sorts of very influential developers and people creating accounts.

One account that seemed to catch the eye of many was @key. It posted what appeared to be MD5 checksums every 2 hours, regularly and consistently. Plenty of people were following the account, yet it wasn't following anyone. People replied to the tweets, asking what it was posting, who it was, why it was doing what it was doing, if it was a government account, etc. No one could figure it out, and if there were MD5 checksums, no one could reproduce them. It was a social enigma, and it kept people enthralled and engaged.

I thought this was exceptionally creative, and I was quite jealous that I didn't think of it first. The best I figured was that it was posting the timestamp of the tweet with a custom salt. At least, that is what I would have done. It couldn't be an MD5 of random data, otherwise, why not just post the random data? Or is that exactly what it is? So, instead, I decided to play with the Identica API and roll my own, using my own account. I had already setup the "Identica-Twitter bridge", so anything I posted to my Identica account would get posted to Twitter automatically.

But, I have to be different. So rather than a random digest that no one could figure out (I'm sure it's a timestamp), I wanted something a little more transparent. I started with taking the SHA-1 of the Unix epoch (the number of seconds since Jan 1, 1970 00:00.00) at 13:37 local time, because it's leet. This was easily accomplished with a bit of shell code:

$ EPOCH=$(date --date="today 13:37" +%s); printf "$EPOCH: "; printf "$EPOCH" | sha1sum - | cut -d ' ' -f 1

This was the first tweet:


Later however, I wanted something even more creative. I go by the online nick "eightyeight" on IRC, because I play the piano. However, some Asian cultures see the number "8" as lucky. With "Chinese" fortune cookies, I figured I would "encrypt" a fortune at 08:08 local time. Again, I decided to do this with a bit of shell code:

$ fortune -s -n 70 | gzip -c | base64 | rot13 | paste -sd ''

The first tweet to hit that was (testing the API, so this one actually wasn't on 08:08):


However, Identica started going downhill. First, we had big challenges fighting bot spam. Despite repeated bug reports and discussion on the network, very little change was happening in the code to combat the spam (for future reference, just use Hashcash tokens as a proof-of-work for form submissions). Then getting venture capital, and attempting to appeal to the mass market, things started changing. First it rebranded itself as "Status.Net", then we lost threaded replies. The API was no longer Twitter compatible (at least some things were different), and branding got real weird. Then it rebranded itself again under a completely new code rewrite as "pump.io", and that is the status today. At this last rebranding, the API was no longer functional, and my scripts stopped. I didn't want to work with the Twitter API, so I didn't bother setting it up again.

It wasn't until some time ago I decided to resurrect my cryptic tweets. However, I made some changes. Instead of using SHA-1, I decided to use RIPEMD-160. Although it hasn't had the mountains of analysis SHA-1 has had, RIPEMD-160 is still considered secure, although with its 160-bit digest size, the security margin might be a bit too slim for some. However, I stuck with the same Unix epoch timestamp automated at 13:37 local time.

Then, after developing my own playing card cipher, and refining it with the help of @timshadel, I decided to actually attempt a legitimate (if still insecure) cipher with Talon. It's still a fortune (BOFH style) and it's still published at 08:08 local time for the same reasons. If you want a crack at decrypting it, check out my playing card cipher repository at https://github.com/atoponce/cardciphers. There should be a new one every day, but it may be possible that the fortune is 1 character too long, and as a result, it doesn't get posted (I've accounted for this, but I'm sure I've missed something).

What's the point? Nothing more than just a bit of fun. It's probably not something you're interested in seeing on your timeline, and I don't blame you. Granted, there will be one of each every day. If you don't have a busy timeline, I guess it could get a bit old. But, I don't plan on stopping, nor using a separate account.

Checksums, Digital Signatures, and Message Authentication Codes, OH MY!

I recently submitted a bug to the Vim project about its Blowfish encryption not using authentication. Bram Moolenaar, the lead developer of Vim, responded about using checksums and digital signatures. I hope he doesn't mind me using him as an example here, but I want to quote the relevant bits (emphasis mine):

The encryption is meant to avoid other people, who don't have the key, from reading the text. It does not have the goal of protecting manipulation of the text, that is something else. You could add a checksum even when not using encryption. I believe it's called signing.

Unfortunately, Bram is confusing checksums, digital signatures, and message authentication codes, all rolled up into one. I don't blame him. This is a topic that is not well understood by those not intimately familiar with cryptography. In a nutshell, each provide data integrity at the core. Where they differ is whether or not you're using encryption keys, and whether or not those encryption keys are symmetric or asymmetric. So, in this post, I would like to break it down.

Checksums

Checksums do not require any sort of encryption key. They are simply digests, or "fingerprints" that represent some data. When you download a piece of software from the Internet, there may be a file with an MD5, SHA-1, or SHA-256 hash of the file. This is the software vendor providing a way for you to verify that you got all the correct bits when the download completes.

For example, suppose you wish to download the latest Debian 8.3.0 amd64 ISO from https://mirrors.xmission.com/debian-cd/8.3.0/amd64/iso-cd/. Notice that there are the following files: MD5SUMS, SHA1SUMS, SHA256SUMS, & SHA512SUMS. Part of the SHA256SUMS file looks like this:

$ head SHA256SUMS
1dae8556e57bb04bf380b2dbf64f3e6c61f9c28cbb6518aabae95a003c89739a  debian-8.3.0-amd64-CD-1.iso
89facfbb5039e49d4e3eeff1cca6ab55e9121ff46affeb46ed510c11731acf41  debian-8.3.0-amd64-CD-10.iso
7f6bc807d3636975374b937c2724353f7468ecd7a61e60f2a8b71f92eeefe629  debian-8.3.0-amd64-CD-11.iso
bd99b7c274ea400b50960ab9e46dd23bad76f87574d2ceee1e8e43859fbd045b  debian-8.3.0-amd64-CD-12.iso
e85679304a509593526cffa77ff0d675329565eb4430444ee2c0d2cdd87842a8  debian-8.3.0-amd64-CD-13.iso
69f727bceb0460957bbd5023fe79749c6bf9f0e3a1b89945e6c63c6b3f04f509  debian-8.3.0-amd64-CD-14.iso
d1dab389f8cb794013986d2da8a6dc72c0be8bc932fcc6d7291cb09b418724d5  debian-8.3.0-amd64-CD-15.iso
913b5d89322b500a02f699d44778901cb59aae909f09bff64963115143c2a6ca  debian-8.3.0-amd64-CD-16.iso
0638aca6f59a8f5bec6d1cd4d272cea01758c2b2d6ec1412048ecb78ef684a77  debian-8.3.0-amd64-CD-17.iso
6f17742fbc82828f04da39f66647e958b0ac667cb4d2a40c9888c749680f1eb8  debian-8.3.0-amd64-CD-18.iso

So, when downloading "debian-8.3.0-amd64-CD-1.iso", I can use the sha256sum(1) command to verify the file:

$ sha256sum debian-8.3.0-amd64-CD-1.iso
1dae8556e57bb04bf380b2dbf64f3e6c61f9c28cbb6518aabae95a003c89739a  debian-8.3.0-amd64-CD-1.iso

The digest matches, so the download was successful and all the correct bits exist. Another way would be to download the SHA256SUMS file, and use the "-c" switch for the utility to verify the checksum automatically, rather than you eyeballing it:

$ sha256sum -c SHA256SUMS 
debian-8.3.0-amd64-CD-1.iso: OK

The important thing to understand about checksums, is they are completely and totally anonymous. There is no secret shared between the server where I downloaded the software and myself, and there is no identity attached to the checksum. This means that anyone can change the original file and recalculate the checksum. So, if transferring data over the Internet, there is nothing preventing a man-in-the-middle attack from replacing the bits you're downloading with something else, while also replacing the checksum.

In other words, checksums provide data integrity, but they do not offer any sort of authentication. However, there are a number of checksum hashing functions, both cryptographically secure and not, such as CRC, MurmurHash, MD5, SHA-1, SHA-2, SHA-3, and so forth. For non-authenticated data integrity, a cryptographically secure hash function isn't always desirable, which is why non-cryptographic hash functions exist.

Digital Signatures

Digital signatures are a form of checksum, in that they provide data integrity, but they require asymmetric encryption to also provide authenticity. Digital signatures are away to attach an identity to the checksum. This implies a level of trust between you and the 3rd party, such as Debian with our example above. If you have met with the 3rd party, or dealt with them enough to establish some level of trust, then you can install the 3rd party's public key into your system. Then, when they provide data attached with a digital signature, you can verify that the data did in fact come from the 3rd party, and no other source.

Notice that a man-in-the-middle attack is no longer valid here, if I already have the 3rd party's public key installed on my system. Going back to our example with Debian, I have the Debian signing public key already installed. So, I can now download the MD5SUMS.sign, SHA1SUMS.sign, SHA256SUMS.sign, or SHA512SUMS.sign file, along with the checksums file I already downloaded, and verify that the checksums are those intended by Debian:

$ gpg --verify SHA256SUMS.sign                   
gpg: assuming signed data in `SHA256SUMS'
gpg: Signature made Sun 24 Jan 2016 11:08:33 AM MST using RSA key ID 6294BE9B
gpg: Good signature from "Debian CD signing key <debian-cd@lists.debian.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: DF9B 9C49 EAA9 2984 3258  9D76 DA87 E80D 6294 BE9B

If we look at the contents of the SHA256SUMS.sign file, we get the following:

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABCAAGBQJWpRMhAAoJENqH6A1ilL6bYz4P/3ZNCR8N+rrlSSgTN/AkpSVt
WXWg2BTflY3cPYmKK/osJUvLT7HTPDhabPiuQY2jJxrHYJhq5sCOrhbgc4eSRmIf
IsSm7OxQ9TXqde4mg9DVsxmIRui/rVbhEjkAVu47A0eGDUrRxczgJUo14En3jO0Z
qhXypCIN90y8HWaqy6OMe+eCsPyGxmXpWRT1XEH9tOX21wCAaxUl6ZHkiqNqdt8u
Erojls77nlaBR/tvB9CHXTkUmqsocdYD+n5UsvtmLlYN0nz85b7NhrLEW2QtugLd
MJngeI5eJvI4Hjyas0HfSlsdoBAvF+Uw3Dn9aHiTIeWVIeCYUKhdXmLww0dL0n95
jVuBSuMavQwOKKRGTbvG++RET9s/2U/G95wK0Vfx5fsf1neKVJgYf9q9iyObgcH8
dRLAqkgWJBNkvm9oXmpcy7jAq8jlXzDfaPz8plAyqDuIXoOSCHpJ5KAbAS1cYLIT
9U2cQLKTbCPrWJT5xZzOMuCPWu1CzfluDEafFsNzurWG5vCmFEJ+vV9strkeEIuX
tFeKVDkkhVEZYQKSbIlidXBa/WP2Q0g1KvlKXb+nsnWDtWAjLUPD621F3ZjUcjlX
aDPv3J+7kqfryA/7qYMVTH67KY3DwKIDKt6XtquxSf7HuYqEwXKIXp2De7zCCEqH
csWVPFNUQyOdetIC/l/w
=TjXr
-----END PGP SIGNATURE-----

The details of a PGP signature aren't important for this blog post. Suffice it to say that it requires the signer's private key to produce the signature, and the signer's public key to verify it. The sender will hash the message with their private key, and append the signature to the message. When the recipient receives the message, they will separate the signature from the message, hash the message, and verify that the two signatures match using the sender's public key. If they do match, the recipient knows that only the sender signed the data, and no one else.

Because the private key is required to create the signature, and because only the 3rd party should have access to the private key, this means that a man-in-the-middle attack is no longer effective. A miscreant should not be able remove the signature, apply a new signature, and have the recipient still verify the signature as "good" from Debian, unless that miscreant also had access to Debian's private signing key.

To make an example of an existing software vendor, Arch Linux was under heat about this. A core developer strongly disagreed about digitally signed packages. They would provide their software from their repositories with MD5 checksums only. The packages were not digitally signed.

So, when your local Arch Linux installation would request packages from the Arch Linux software repository, unless served over HTTPS, a man-in-the-middle could interject their own bits with their own MD5 checksum. Your pacman(8) package manager would verify that the MD5 is valid, and proceed to install the software with root privileges, because that is what you told it to do. By also digitally signing the package with an Arch Linux signing key, this attack is no longer possible.

Eventually, Arch Linux fixed the vulnerability, and closed a very large security hole, by digitally signing their packages.

As with checksums, digital signatures really should be using a cryptographically secure hashing function as part of the protocol. This can include RIPEMD160, SHA-2, SHA-3, BLAKE2, Skein, and others. MD5 and SHA-1 are no longer considered cryptographically secure, and should not be used with digital signatures (thus why SHA-256 SSL certificates instead of SHA-1).

Message Authentication Codes

Finally, message authentication codes (also called "MAC tags") are another way to provide data integrity with authentication, but this time using symmetric encryption. Where digital signatures imply a physical identity behind the authentication, MAC tags provide anonymous authentication. Generally, symmetric keys don't have identities associated with them. They're usually short-lived and shared via complex key exchanges, such as the Diffie-Hellman key exchange.

A MAC tag is keyed, meaning a shared secret is used when calculating the digest. There are a number of different implementations of MACs, such as CBC-MAC, HMAC, UMAC, & Poly1305, among others. The differences between each of those isn't important for this post. What is important, is how they are calculated, and how they are used with encryption.

The sender of some message will apply a cryptographic hashing function to the message, and append (or prepend) the resulting digest to the message, and send the full payload off. Because both the ciphertext and the MAC were calculated with a shared secret key, a man-in-the-middle cannot strip the MAC tag and apply their own without knowing the shared secret. Because, when the recipient receives the payload, they will strip off the MAC tag, rehash the message with the same keyed hashing function, and see if the two MAC tags match. If they do match, the message can be acted upon. If they do not match, something happened to the data in transit, and the payload can be safely ignored.

There are three main ways to apply MAC tags to messages: encrypt-then-MAC, MAC-then-encrypt, and encrypt-and-MAC. The first, encrypt-then-MAC, is considered "best practice" for message authentication. First the message is encrypted, then the ciphertext is authenticated, and the resulting MAC is appended to the ciphertext. This provides both ciphertext and plaintext integrity. The big advantage to this approach, is that if the MAC tag does not match the newly calculated MAC tag during verification, the ciphertext does not need to be decrypted. This is the default approach with IPsec and modern versions of OpenSSH. RFC 7366 standardizes this for TLS (yet to be implemented by OpenSSL last I checked). Also an ISO/IEC 19772:2009 standard.

Encrypt-then-MAC flowchart.

The next approach, MAC-then-encrypt, means authenticating the plaintext, appending the resulting MAC tag to the plaintext, and then encrypting the full plaintext and MAC tag payload. While this approach offers plaintext data integrity, it does not offer ciphertext integrity. As such, the ciphertext must be decrypted before the MAC tag can be verified. This is the default behavior in older versions of OpenSSH.

MAC-the-encrypt flowchart

Finally, encrypt-and-MAC, means authenticating the plaintext first, then encrypting the plaintext. The resulting MAC tag is appended to the ciphertext. Again, like MAC-then-encrypt, this approach offers plaintext data integrity, but it does not offer ciphertext integrity. So, you must detach the MAC tag first, then decrypt the ciphertext, then verify if the MAC tag is valid. This is the default behavior with OpenSSL.

Encrypt-and-MAC

As I understand it, there are no known vulnerabilities with MAC-then-encrypt and encrypt-and-MAC MACs. However, by having both ciphertext and plaintext integrity with encrypt-then-MAC, as well as not needed to decrypt the ciphertext on failure, is why encrypt-then-MAC is the preferred way to handle message authentication.

As with digital signatures, MACs should be calculated with cryptographically secure hashing functions, such as RIPEMD160, SHA-2, SHA-3, BLAKE2, Skein, etc. MD5 and SHA-1 would not qualify (although we could get into a discussion about HMAC-MD5 and HMAC-SHA1, but we won't).

Conclusion

No doubt, it's confusing to separate checksums from digital signatures from message authentication codes. Things get even a bit more hairy with blind signatures (used primarily in digital currencies) and Merkle trees (used primarily in peer-to-peer networks and copy-on-write filesystems), but they're special cases of the primary three functions discussed above. However, if you can get checksums, digital signatures, and message authentication codes cleared up, then you're that much closer to implementing cryptographic protocols correctly.

Bitcoin Mining Rate and Waste

Recently, the Bitcoin mining rate surpassed 1 exahash per second, or 1 quintillion SHA-256 hashes per second.

Bitcoin mining graph showing gigahashes per second over time.

If we do some quick math, we can determine the following:

  • If SHA-1 collisions can be found in 2^65.3 hashes, that's one SHA-1 collision found every 45 seconds.
  • Every combination of bits can be flipped in an 84-bit keyspace every year.
  • If mining is done strictly with ASICs and each ASIC can produce 1 trillion hashes per second, that's 1,000,000 ASICs.
  • If each ASIC above consumes 650 Watts of power, that's 650 Megawatts of power consumed.
  • At 650,000 kWh per ASIC, that's 1.3 million pounds of CO2 released into the atmosphere every hour if using fossil fuels.
  • Current global rate is about 160 Bitcoins mined per hour.
  • At $0.15 USD per kWh, that's $609 spent on electricity per Bitcoin mined. Bitcoin is currently trading at $376/BTC.

That's "back of the envelope" calculations, with some big assumptions made about the mining operation (how it's powered, who is powering it, etc.).

Of course, not all mining is using fossil fuels, not all miners are using ASICs, not all ASICs can do 1 trillion hashes per second (some more, some less), not all ASICs are consuming that wattage per rate, and the cost of electricity was strictly a U.S. figure. Of course, if you're using a GPU, or worse, a CPU for your mining, then you expending more electricity per the rate than ASICs are. That may help balance out some of the miners who are using renewable energy, such as solar power for their mining. Many Chinese and Russian mining data centers certainly have less overhead costs on electricity. You get the point- we've made some big assumptions, to come to some very rough "ballpark" figures. I don't think we're too far off.

So, according to those numbers, unless using renewable energy, cheaper electricity, or Bitcoin trading goes north of $610 USB/BTC mining for Bitcoin is a net loss. This comes at the expense of 1.3 million pounds of CO2 released into the atmosphere every hour. I would argue that Bitcoin is the worst idea to come out of Computer Science in the history of mankind.

Using Your Monitors As A Cryptographically Secure Pseudorandom Number Generator

File this under the "I'm bored and have nothing better to do" category. While coming into work this morning, I was curious if I could use my monitors as a cryptographically secure pseudorandom number generator (CSPRNG). I don't know what use this would have, if any, as your GNU/Linux operating system already ships a CSPRNG with /dev/urandom. So, in reality, there is really no need to write a userspace CSPRNG. But what the hell, let's give it a try anywho.

The "cryptographically secure" piece of this will come from the SHA-512 function. Basically, the idea is this:

  • Take a screenshot of your monitors.
  • Take the SHA-512 of that screenshot.
  • Resize the screenshot to 10% it's original size.
  • Take the SHA-512 of that resized file.
  • Take the SHA-512 of your previous two SHA-512 digests.
  • Take the last n-bits of that final digest as your random number.

Most GNU/Linux systems come with ImageMagick pre-installed, as well as the "sha512sum(1)" function. So thankfully, we won't need to install any software. So, here's a simple shell script that can achieve our goals:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/sh
# Produces random numbers in the range of [0, 65535].
# Not licensed. Released to the public domain.

cd ~/Private # assuming you have an encrypted filesystem mounted here

TS1=$(date +%Y%m%d%H%M%S%N)
import -window root ${TS1}.png
sha512sum ${TS1}.png > /tmp/SHA512SUMS

TS2=$(date +%Y%m%d%H%M%S%N)
convert -scale 10% ${TS1}.png ${TS2}.png
sha512sum ${TS2}.png >> /tmp/SHA512SUMS

DIGEST=$(sha512sum /tmp/SHA512SUMS)
printf "%d\n" 0x$(printf "$DIGEST" | awk '{print substr($1, 125, 128)}')

shred ${TS1}.png ${TS2}.png
rm ${TS1}.png ${TS2}.png
shred /tmp/SHA512SUMS
rm /tmp/SHA512SUMS

Running it for 10 random numbers:

$ for i in {1..10}; do sh monitor-csprng.sh; done
15750
36480
64651
7942
2367
10905
53889
9346
52726
63570

A couple things to note:

  • This is slow, due to taking the screenshot, and resizing it.
  • The data on your monitors should be sufficiently random. Chats, social updates, etc. The security of this will depend entirely on the entropy of the initial screenshot.
  • You really should be saving your screenshots to an encrypted filesystem, such as eCryptfs.
  • We're using timestamps with nanosecond accuracy to provide some additional entropy for the final SHA-512 digest.
  • This is using the last 4 hexadecimal characters to be converted to decimal. In reality, it could be anything, including some convoluted dynamic search algorithm in the string.

It's worth noting that the entropy of the initial screenshot is critical, which is actually difficult to accurately measure. So, it may help to have a chat window or more open, with recent chat logs. Same could be said for social update "walls", with the most recent updates (Twitter, Facebook, Goodreads, etc.). Having a clock with seconds ticking in a status bar can also help (although not unpredictable, at least semi-unique). Tabs in browsers, running applications, etc. The more unpredictable your workspace in the screenshot, the better off you'll be. But, people in general suck at randomness, so I'm not advocating this as something you should rely on for a cryptographically secure random number generator.

If you wanted, you could add this to a terminal, giving you a sort of "disco rave" before taking the screenshot:

1
2
3
4
5
6
7
#!/bin/sh
# Disco lights in your terminal
# No license. Released to the public domain

while true; do
    printf "\e[38;5;$(($(od -d -N 2 -A n /dev/urandom)%$(tput colors)))m•\e[0m"
done

Get that running first, then take your screenshot. But then, if you're reading data off of /dev/urandom, you might as well do that for your random numbers anyway...

Disable Pocket From Iceweasel

I'm not sure who I should be more disappointed in- Mozilla or Debian. Iceweasel 43 recently arrived in Debian unstable, and with it, Pocket. For those who are not familiar, Pocket is a 3rd party service that allows users to save sites they want to read or visit for later. Provided the extension is installed, this allows users to sync pages they want to read for later, across devices and platforms.

But here's the catch: it's a proprietary non-free service-as-a-software-substitue (SaSS).

Thankfully, you can disable it, and it really isn't that difficult. Open up about:config in a new tab, and type "pocket" into the search filter. From there, set "browser.pocket.api" and "browser.pocket.site" to "localhost", and set "browser.pocket.enabled" to "false", then restart your browser.

Screenshot showing about:config with the above settings in place.

It really bothers me that Mozilla has enabled this sort of integration into their browser. Not only Pocket, but other proprietary or privacy invasive plugins and extensions also, such as "sponsored tiles" (which is finally removed), "encrypted media extensions", and "Hello" (which I haven't figured out how to disable). These sorts of things should be separate extensions or plugins that the user can install at their whim. Shipping it by default takes away freedom and choice, and it's turning the browser into a proprietary non-free software application.

What ultimately bothers me about this, is that Mozilla already has bookmark synchronization support, and their sync server is Free Software, allowing you to roll your own. Pocket doesn't offer anything that Mozilla Sync doesn't. I already have a "TOREAD" bookmark folder, where I can put pages I want to read later. And it's synched across all of my devices.

Mozilla pushing the 3rd party proprietary Pocket, and Debian shipping it in Iceweasel (thankfully, a bug is submitted) is a great disservice to users and a threat to software freedom.

Hopefully, Pocket goes the way of sponsored tiles, and gets removed.

Encrypted Account Passwords with Vim and GnuPG

Screenshot of terminal showing hidden password with Vim syntax highlighting

Background

I've been a long-time KeepassX user, and to be honest, I don't see that changing any time soon. I currently have my password database on an SSH-accessible server, of which I use kpcli as the main client for accessing the db. I use Keepass2Android with SFTP on my phone to get read-only access to the db, and I use sshfs mounts on my workstations with KeepassX for read-only GUI access. It works great, and allows me to securely access my password databases from any client, mobile or otherwise.

However, I recently stumbled on this post on how to use Vim with GnuPG to create an encrypted file of passwords: http://pig-monkey.com/2013/04/password-management-vim-gnupg/. I've heard about a GnuPG plugin for Vim for years now, and know friends that use it. I've even recommended that others use it as a simplistic means of keeping an encrypted password database, instead of relying on 3rd-party tools. However, I've never really used it myself. Well, after reading that post, I decided to give it a try.

Defining a specification

Ultimately, everything in that post I'm carrying over here, with only a couple modifications. First, fields should end with a colon, which include the comments. Comments could just be just a single line, or multi-line, but it's still a field just as much as "user" or "pass". Further, there should be a little flexibility in the field keywords, such as "user" or "username". Additionally, because I exported my Keepass db to an XML file, then used a Python script to convert it into this syntax, I also carried over some additional fields. So, I've defined my database with the following possible fields:

  • comment|comments
  • expire|expires
  • pass|password
  • tag|tags
  • type
  • url
  • user|username

Notice that I did not define a "title" as would be the case in the Keepass XML. The entry itself is the title, so I find this redundant. Also, you'll noticed I defined an additional "type" field. While not explicitly defined in the Keepass XML, it is implicitly defined with icons for entries. This could be useful for defining "ssh" vs "mysql" vs "ldap" vs "http" authentications when doing searching in the file.

So, an invalid example on pig-monkey.com is:

Super Ecommerce{{{
    user:   foobar
    pass:   g0d
    Comments{{{
        birthday:   1/1/1911
        first car:  delorean
    }}}
}}}

This is invalid due to the "Comments" field. Fixed would be:

Super Ecommerce{{{
    user:   foobar
    pass:   g0d
    Comments:{{{
        birthday:   1/1/1911
        first car:  delorean
    }}}
}}}

Another valid entry could be:

Example {{{
    username: aarontoponce
    password: toomanysecrets
    url: https://example.com
    type: http
    tags: internet,social,2fa
    comments: {{{
        backup codes: vbrd83ezn2rjeyj, p89r4zdpjmyys2k, rdh6e7ubz8vh82g, er4ug6vp25xsgn5
        2fa-key: "3udw mkmm uszh cw2a 5agm 7c3p 5x32 tyqz"
    }}}
}}}

Notice that I have not defined file comments, such as those found in configuration files or source code. There is a comment section per entry, so that seems to be the fitting place for any and all comments.

I really liked the post, and how thought out the whole thing was, including automatically closing the PGP file after an inactivity timeout, automatically folding entries to prevent shoulder surfing, and clearing the clipboard when Vim closes. However, one oversight that bothered me, was not concealing the actual password when the entry is expanded. Thankfully, Vim supports syntax highlighting. So, we just need to define a filetype for GnuPG encrypted accounts, and define syntax rules.

Vim syntax highlighting

EDITED TO ADD: I tried getting the Vim syntax working in this post, but WordPress is clobbering it. So, you'll need to get it from pastebin instead. Sorry.

To get this working, we need a syntax file. I don't know if one exists already for this syntax structure, but it isn't too difficult to define one. Let's look at what I've defined in this pastebin, then I'll go over it line-by-line.

The first four lines in the syntax file define are just comments. Next is just a simple if-statement checking if syntax highlighting is enabled. If so, use it. The first interesting line is the following:

let b:current_syntax = "gpgpass"

This defines our syntax. Whenever we load a file with syntax highlighting enabled, and we set the "filetype" to "gpgpass", this syntax will be applied.

syntax case ignore

This just allows us to have "comment" or "Comment" or "COMMENT" or any variations on the letter case, while still matching and proving a highlight for the match.

After that, we get into the meat of it. This "syntax match" section allows me to conceal the passwords on the terminal to prevent shoulder surfing, even when the entry is expanded. this is done with setting the background terminal color to "red" and the foreground text color also to "red". Thus, we have red text on a red background. The text is still yankable and copyable, even with the mouse cursor, it's just not visible on screen.

The actual concealment is done with the regular expression. An atom is created to match "pass:" or "password:" surrounded by whitespace as the first word on the line. However, I don't want to conceal the actual text "pass:", just the password itself. So, the regular expression "\@<=" says to ignore our atom in the match, and only match "\S\+" for concealing. The concealment is achieved with red foreground text on a red background with:

highlight gpgpassPasswords ctermbg=red ctermfg=red

The rest of the syntax matching in that pastebin is for identifying our fields, and highlighting them as a "Keyword" using regular expressions. All field names will be highlighted the same color based on your colorscheme, as they are all defined the same. Thus, aside from the hidden password, there is uniformity and elegance in the presentation of the syntax.

Using the syntax in Vim

This syntax file won't do us much good if it isn't installed and Vim isn't configured to use it. We could save it system-wide to "/usr/share/vim/vim74/syntax/gpgpass.vim", or just keep it in our home directory at "~/.vim/syntax/gpgpass.vim". Whatever works.

Now that the syntax file is installed, we need to call it when editing or viewing GnuPG password files. We can use the vimrc from pig-monkey.com, with one addition- we're going to add "set filetype=gpgpass" under the "SetGPGOptions()" function. Now, I understand that you may edit encrypted files that are not GnuPG password files. So, you're going to get syntax highlighting in those cases. Or, you could enable the modeline and set a modeline in the password file. The problem with the modeline, is its long history of vulnerabilities. Most distributions, including Debain, disable it, and for good reason too. So, I'd rather have it set here, and unset the "filetype" if it's bothering me.

Here's the relevant config:

if has("autocmd")
    """"""""""""""""""""
    " GnuPG Extensions "
    """"""""""""""""""""

    " Tell the GnuPG plugin to armor new files.
    let g:GPGPreferArmor=1

    " Tell the GnuPG plugin to sign new files.
    let g:GPGPreferSign=1

    augroup GnuPGExtra
    " Set extra file options.
	autocmd BufReadCmd,FileReadCmd *.\(gpg\|asc\|pgp\) call SetGPGOptions()
    " Automatically close unmodified files after inactivity.
	autocmd CursorHold *.\(gpg\|asc\|pgp\) quit
    augroup END

    function SetGPGOptions()
    " Set the filetype for syntax highlighting.
	set filetype=gpgpass
    " Set updatetime to 1 minute.
	set updatetime=60000
    " Fold at markers.
	set foldmethod=marker
    " Automatically close all folds.
	set foldclose=all
    " Only open folds with insert commands.
	set foldopen=insert
    endfunction
endif " has ("autocmd")

Conclusion

What I like about this setup is the portability and simplicity. I am in a terminal on a GNU/Linux box most of my waking hours. It makes sense to use tools that I already enjoy, without needing to rely on 3rd party tools. This also closes the gap of potential bugs with 3rd party password managers leaking my passwords. I'm not saying that Vim and GnuPG won't be vulnerable, of course, but I do place more trust in these tools than the Keepass ones, to be honest.

As of right now, however, I am still a Keepass user. But, I wanted to put this together, and try it out for size, and see how the shoe fits. As such, I've exported my KeepassX database, encrypted it with GnuPG, configured Vim, and I'm off to the races. I'll give this a go for a few months, and see how I like it. I know it's going to pose issues for mp on my phone, even with ConnectBot and SSH keys. But, maybe I don't need it on my phone anyway. Time will tell.

Oh, and I can still view the database as read-only and still enjoy the syntax highlighting benefits by using "view /path/to/passwords.gpg" instead of "vim /path/to/passwords.gpg".

Multiple Encryption

I hang out in ##crypto in Freenode, and every now and then, someone will ask about the security of multiple encryption, usually with the context that AES could be broken in the near future. When talking about multiple encryption, they are usually referring to cascade encryption which has the form of:

CT = Alg_B(Alg_A(M, key_A), key_B)

The discussion revolves around the differences between "Alg_A" and "Alg_B". Such as using AES for "Alg_A" and Camellia for "Alg_B". Also, the discussion will include whether or not "key_A" and "key_B" should be the same key, or different.

Cascade encryption is more efficient in storage space than some alternatives, such as this one suggested by Bruce Schneier:

CT = Alg_A(OTP, key_A) || Alg_B(XOR(M, OTP), key_B),  where OTP is a true one-time pad

I'm not going to go into the theoretical concerns with multiple encryption. However, I would like to cover some practical considerations:

  1. Multiple key security.
  2. Long-term storage.
  3. Complexity.
  4. Host security.

Multiple key security

It should come as no surprise that when dealing with multiple encryption, that you are going to be dealing with multiple keys, if you choose to keep "key_A" and "key_B" separate. Probably the most difficult aspect of encryption implementations, is keeping the secret key secret. For example, key exchanges between machines over the scary Internet has been notoriously difficult to get correct. Current best practice is implementing authenticated ephemeral elliptic-curve Diffie-Hellman (ECDHE) when communicating secret symmetric keys between machines. So, not only do you need to communicate one key, but multiple keys when encrypting and decrypting data.

If the multiple-encrypted data is to be stored on disk, then keys will need to be retrieved for later. How are these stored? This isn't an easy question to answer. If you store them in a password manager, they are likely just getting single-encrypted, probably with AES. So, the security of your ciphertext rests on the security of your stored keys, likely protected by the very algorithm you are trying to safe-guard.

Now, you could use the same key for every encryption layer. But, this poses a theoretical concern (which I promised I wouldn't cover- sorry). If the same key is used for every layer, then if an attacker can recover the key through cryptanalysis of the first encryption layer, then the attacker could possibly decrypt all remaining layers. Obviously, you don't want to use ciphers where the decryption process is exactly the same as the encryption process. Otherwise, the second encryption process on the ciphertext would decrypt the first encryption! While not probable, this last scenario could even occur with different algorithms, such as AES and Camellia. So, it seems at least at a cursory glance, that using the same key for all encryption layers probably is not a wise idea. So, we're back to key management, which is the bane of cryptographers everywhere.

Long-term storage

In my opinion, a larger problem is that of storage. It's one thing to get multiple encryption correct and safe on the wire, it's another to place value on long-term data storage. Think about it for a second- what is the longest you have kept data on the same drive? In personal scenarios, I have some friends that have had personal backups for up to five years. To me, this is impressive. It's likely more common that data switches drives every couple of years. RAID arrays die, hardware is replaced, higher drive capacity is demanded, or even bit rot creeps in, destroying data (such as on magnetic or optical mediums). When push comes to shove, the encrypted data is just going to move from drive-to-drive. But, ask yourself this next question- what is the oldest data you have in your possession right now?

Let's be realistic here for a second. I would be hard-pressed to find data I stored back in 2000, 15 years ago. I could find some photos in photo albums, on mugs, and on Christmas cards, but I'm not 100% confident I could get the digital original. Despite my best efforts, accidents happen, mistakes are made, and data is just lost. I don't think I'm alone here. I've even worked for companies, with large budgets, that had a hard time recovering data that is 10+ years old. For one, it's expensive to hold on to data indefinitely, but a great amount of data also becomes less and less valuable as time progresses. Yes, I still use my same email account from 2004- Google has done a great job of keeping all of my emails these past 11 years, and I would expect other data service providers to do the same. But, how many of you have kept an email address for 10+ years? Or even the data, for that matter? (This blog is actually 11 years old as well- kudos to me on keeping the data going this long).

My point is, hardware fails and is changed. Your personal value on data also changes, and accidents happen. So you're concerned about AES being broken in 20 years, or even sooner. Do you think by that time you'll still place value on that encrypted data? Do you think you'll even still have access to it, or can find it? And, if so, will it really be that difficult to decrypt the AES data, and encrypt it with the current best practice encryption algorithm?

Complexity

This is probably the problem you should be concerned with the most. As a collective group, we as developers have a hard time getting single encryption correct, let alone multiple encryption. This deeply enters the theoretical realm, which I promised I wouldn't blog about. But, you do have a practical concern as well- order of operations and correct implementations.

First, order of operations. It's one thing to do "double encryption", where only two algorithms are chosen and used. If you can't recall if you used AES first or second, it's a 50/50 shot at getting the order correct (provided you know which key belongs to which algorithm, otherwise it's a one-in-four chance). Imagine however, using three encryption layers, and lining the keys up correctly. Imagine the complexity of four layers, or more. Ugh. Seems like you certainly don't want to go higher than two layers.

Second, look at implementations. AES is AES. It shouldn't matter what algorithm does the calculations. But, implementations like to put "magic bytes" at the beginning of ciphertexts (OpenSSL, OpenPGP, etc.). This data is only valuable for that implementation, and even worst, for a specific subset of versions. Just imagine encrypting a file with OpenSSL version 1.0 now, and needing to decrypt it in 10 years. Will OpenSSL version X be able to read those magic bytes, and correctly decrypt the file? Or will it error out, unable to decrypt the data because the data structure of the magic bytes changed in that 10 year time frame?

So, it seems best to encrypt it with some programming language library, where you can control exactly what data is stored. But, as everyone will tell you while frothing at the mouth, "don't roll your own crypto". Technically, you aren't if you "import aes" and use the "aes" module provided by that language correctly. It just remains to be seen if you implemented it correctly to thwart an attacker. Crypto is hard and full of sharp edges. It's very difficult to get things right, without getting cut. Regardless, while the "aes" module might be available in 10 years, what about the "camellia" module, or whatever algorithm you chose for the second layer? Is it still in development, or was it abandoned due to either being broken, or lack of development? Can you find that module, so you can decrypt your data?

Host Security

In a more practical real-world, everyday person scenario, how secure is the host that is doing the multiple encryption? Do others have physical access to the machine? Is it free of viruses, malware, and other badware? Does the system run an encrypted filesystem? Where and how are backups stored? Who has access to those backups? So many more questions can be asked that judge the quality of the security level of the host storing or processing the data.

Viruses and malware would probably be my number one concern if the data was so valuable, as to be multiple-encrypted. So, I would probably encrypt the plaintext on one machine, encrypt the ciphertext on a second machine, and store it on a third machine, preferably air-gapped. Thus, if a virus exists on one machine, hopefully it doesn't exist on another, and hopefully it doesn't attach itself to my encrypted data, and hopefully the badware didn't report my plaintext to a botnet pre-encryption.

Physical host security is hard. People have crappy passwords protecting their workstations. Physical access can get the attacker root regardless. Systems are infected with badware all the time, just by visiting websites! So there is hardly a guarantee that your data is safe, even though it was encrypted multiple times with different keys and algorithms.

A Couple Thoughts

It hardly seems worth the effort to encrypt your data multiple times with different algorithms and different keys, provided the overhead necessary in managing everything (hardware and software). Further, in reality, modern encryption algorithms aren't usually broken. For example, DES as an algorithm, isn't broken- it just requires a small key space. So, encrypting your data multiple times is solving a problem that for the most part, just doesn't exist.

That's not to say that AES will remain secure in 10, 20, or 40 years. I'm not that naive. But, as a user, you do have the ability to switch algorithms when AES does break. So, decrypt your AES ciphertext, and encrypt it with SevenFish (sorry Bruce- bad joke). Keep it encrypted with SevenFish until that breaks, and then decrypt it, and encrypt it with whatever the new modern cipher is at the time (if you still have the data, it's still valuable to you, and all implementations can still work with the ciphertext).

Conclusion

In my opinion, don't worry about multiple encryption. Generate a GnuPG key pair, encrypt your data once, and be done with it.

Getting Root On The Nexus 6 With Android 6

This probably the 40th millionth time, since owning this phone, that I've needed to root my device. Because I keep doing it over and over, while also referring to past commands and notes, it's high time I blogged the steps. If I can benefit myself from my own blog post, then chances are someone else can. So, with that said, here's what we're going to do:

  1. Grab the latest Nexus factory images from Google.
  2. Update the phone by flashing all the images (without wiping user data).
  3. Flash the recovery with the latest TWRP image.
  4. Get root on the device with Chainfire's "system-less root" SuperSU package.
  5. Enable USB tethering and the wireless hotspot functionality.

Before beginning, I should mention that if the title isn't immediately clear, this post is specific to the Motorola Nexus 6, which is the phone I currently own. It's probably generic enough, however, to be applied to a few Nexus devices. Minus getting the factory Nexus images from Google, this might even be generic enough for non-Nexus devices, but you're on your own there. Proceed at your own risk. With that said, it's fairly hard to brick an Android phone these days.

Also, you need to make sure you have an unlocked bootloader. Google ships with the bootloader locked by default. Unlocking it, will wipe your user partition, meaning you will lose any and all user data (images, videos, text messages, application data, etc.). I'm going to assume that you've already unlocked the bootloader, and are ready to proceed.

TL;DR

If you don't want to read the post, and know what you're doing, here's the short of it:

$ tar -xf shamu-mmb29k-factory-9a76896b.tgz
$ cd shamu-mmb29k
$ adb reboot bootloader
$ fastboot flash bootloader bootloader-shamu-moto-apq8084-71.15.img
$ fastboot reboot-bootloader
$ fastboot flash radio radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
$ fastboot reboot-bootloader
$ fastboot update image-shamu-mmb29k.zip
$ fastboot flash recovery twrp-2.8.7.1-shamu.img
$ fastboot reboot recovery
(reboot normally)
$ adb push UPDATE-SuperSU-v2.46.zip /sdcard/supersu.zip
$ adb reboot recovery
(install /sdcard/supersu.zip from TWRP)
(do not install TWRP root)
(reboot normally)
(install build.prop editor from Google Play)
(set "net.tethering.noprovisioning" to "true")

Otherwise ...

Getting the Google Nexus factory images

Navigate to https://developers.google.com/android/nexus/images#shamu and grab the version you are looking for. For example, I recently wanted to flash 6.0.1, so I grabbed the "MMB29K" image. Before flashing, I find it critical to verify the checksums. They are "27dde1258ccbcbdd3451d7751ab0259d" for MD5 and "9a76896bed0a0145dc71ff14c55f0a590b83525d" for SHA-1. So, after downloading, I pulled up a terminal, and verified them:

$ md5sum shamu-mmb29k-factory-9a76896b.tgz
27dde1258ccbcbdd3451d7751ab0259d  shamu-mmb29k-factory-9a76896b.tgz
$ sha1sum shamu-mmb29k-factory-9a76896b.tgz 
9a76896bed0a0145dc71ff14c55f0a590b83525d  shamu-mmb29k-factory-9a76896b.tgz

After examination, it's clear these checksums match, so I'm ready to flash.

Flashing the images

This step does not require root on your device. I'll need to connect my phone to my computer via USB, and verify that I can talk to it via adb(1). This means installing the Debian "android-tools-adb" and "android-tools-fastboot" packages if they're not already. After installed, I should be able to verify that I can talk to the phone:

$ sudo apt-get install android-tools-adb android-tools-fastboot
(...snip...)
$ adb devices
List of devices attached 
[serial number]      device

If your device is visible, we are ready to rock-n-roll. First, extract the tarball, and enter the directory:

$ tar -xf shamu-mmb29k-factory-9a76896b.tgz
$ cd shamu-mmb29k
$ ls -lh
total 2.3G
-rw-r--r-- 1 atoponce atoponce  124 Jan  1  2009 android-info.txt
-rw-r--r-- 1 atoponce atoponce 8.1M Jan  1  2009 boot.img
-rw-r----- 1 atoponce atoponce  11M Nov 18 16:59 bootloader-shamu-moto-apq8084-71.15.img
-rw-r--r-- 1 atoponce atoponce 6.2M Jan  1  2009 cache.img
-rw-r----- 1 atoponce atoponce  985 Nov 18 16:59 flash-all.bat
-rwxr-x--x 1 atoponce atoponce  856 Nov 18 16:59 flash-all.sh*
-rwxr-x--x 1 atoponce atoponce  814 Nov 18 16:59 flash-base.sh*
-rw-r----- 1 atoponce atoponce 964M Nov 18 16:59 image-shamu-mmb29k.zip
-rw-r----- 1 atoponce atoponce 113M Nov 18 16:59 radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
-rw-r--r-- 1 atoponce atoponce 8.8M Jan  1  2009 recovery.img
-rw-r--r-- 1 atoponce atoponce 2.0G Jan  1  2009 system.img
-rw-r--r-- 1 atoponce atoponce 136M Jan  1  2009 userdata.img

Notice a couple of things- first, there are shell scripts "flash-all.sh" and "flash-base.sh" for Unix-like systems. Also, notice the "bootloader-shamu-moto-apq8084-71.15.img" & "radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img" raw images, as well as the "image-shamu-mmb29k.zip". These are the only files we're going to concern ourselves with when flashing the phone.

However, we want to be careful that we don't flash "userdata.img". This will format your user partition and all user data will be wiped (see above). What we're going to do, is basically the same execution as the "flash-all.sh" shell script. However, we're going to make just one small modification. Further, we need our phone already booted into the bootloader. As such, here's what we're going to do:

$ adb reboot bootloader
$ fastboot flash bootloader bootloader-shamu-moto-apq8084-71.15.img
$ fastboot reboot-bootloader
$ fastboot flash radio radio-shamu-d4.01-9625-05.32+fsg-9625-02.109.img
$ fastboot reboot-bootloader
$ fastboot update image-shamu-mmb29k.zip

Notice that I removed -w from that last command (if you looked in the "flash-all.sh" shell script). That option wipes user data, which would be necessary if we wanted to return the phone back to factory state. We don't- we're just upgrading. Also, I don't see the need for "sleep 5". Just wait for the phone to successfully reboot before running the next command.

At this point, the phone is successfully updated. If you were to reboot the phone, it would be perfectly operational as if you did an OTA update, or purchased it from the store. However, we want root, so we have a few more steps to accomplish.

Getting and flashing TWRP

This step also does not require root on your phone. I prefer TWRP for my recovery on Android. It's touch-based, which sets the UI apart from the other recoveries, and it's Free Software, unlike ClockworkMod. Both of these are big wins for me. Grab the latest image at https://twrp.me/devices/motorolanexus6.html. I downloaded twrp-2.8.1.7-shamu.img. Unfortunately, I couldn't find any checksums to check to verify the download. So, I installed it anyway, knowing I could flash the stock "recovery.img" if something goes wrong. So far, things have been great, so I calculated the checksums for you:

$ md5sum twrp-2.8.7.1-shamu.img 
f040c3a26f71dfce2f04339f62e162b8  twrp-2.8.7.1-shamu.img
$ sha1sum twrp-2.8.7.1-shamu.img
40017e584879fad2be4043c397067fe4d2d76c88  twrp-2.8.7.1-shamu.img
$ sha256sum twrp-2.8.7.1-shamu.img
ebe5af833e8b626e478b11feb99a566445d5686671dcbade17fe39c5ce8517c7  twrp-2.8.7.1-shamu.img

If those checkout, you should be safe in flashing. Currently, the phone should already be booted into the bootloader. If not, make sure it is. Once in the bootloader, we can flash TWRP then reboot normally:

$ fastboot flash recovery twrp-2.8.7.1-shamu.img

Now, it's critical that we don't normally reboot the phone. If we do, recovery will be overwritten, and we'll have to reflash. So, while your phone is still booted into the bootloader, reboot it into recovery. You can do this by pressing the volume up/down arrows, until rebooting into recovery is available, and pressing the power button. This should boot you into TWRP. Now that you're there, you can reboot the phone normally.

WARNING
It is possible that while booting, your phone will notify you that the system cannot be verified. One of two things will happen: either the boot will pause, and not go further, or will boot without despite the warning. If you flashed these exact versions, my phone boots without the warning at all. However, don't panic if you see it. Remember, you have the factory images. Just reflash the recovery.img, and you will be just fine.

More info can be found at http://www.xda-developers.com/a-look-at-marshmallow-root-verity-complications/.

Getting and flashing SuperSU (getting root)

WARNING
At this point, the phone should be booted into its regular state. We are now ready to root the phone. This means getting the latest SuperSU package, and installing it through TWRP. However, I need to throw out another caution. We'll be installing a beta version of SuperSU to do something called "system-less root". This means that the package will only be modifying the bootloader image to get root, and will not be touching the system partition. This is both good, and bad. It's good in that we only need to reflash the bootloader to remove root. It's bad in that this is experimental software, and really not ready for production. Further, unlike TWRP, SuperSU is proprietary software, which sucks. It does make me a bit nervous, to be honest, to rely on non-free closed-source proprietary software, on such a critical piece of my life. Proceed at your own risk.

As of this writing, you'll need to get the SuperSU package from the XDA forums at http://forum.xda-developers.com/showpost.php?p=64161125&postcount;=3. I grabbed version "BETA-SuperSU-v2.64-20151220185127.zip". There may be updates since this post was published.

Unfortunately, again, I did not see any published checksums. So, I've installed it, with the knowledge of how to reflash my bootloader should I encounter problems.

$ md5sum UPDATE-SuperSU-v2.46.zip 
332de336aee7337954202475eeaea453  UPDATE-SuperSU-v2.46.zip
$ sha1sum UPDATE-SuperSU-v2.46.zip 
6135f9d0af28e02f4292c324bf5983998e7ae006  UPDATE-SuperSU-v2.46.zip
$ sha256sum UPDATE-SuperSU-v2.46.zip 
d44cdd09e99561132b2a4cd19d707f7126722a9c051dc23f065a948c7248dc4e  UPDATE-SuperSU-v2.46.zip

Provided these checksums match, we're good to go. We need to push the ZIP to our phone with the Android debugger, and reboot into the TWRP recovery:

$ adb push UPDATE-SuperSU-v2.46.zip /sdcard/supersu.zip
$ adb reboot recovery

From the TWRP interface, tap "Install" and install the /sdcard/supersu.zip package. When it finishes, tap "Reboot". TWRP will ask if you would like to install the root provided by the image. You do NOT want to install this root- you just flashed one.

The phone should boot normally.

Enable USB tethering and the wireless hotspot

This step requires root. Finally, we want to enable the hotspot and tethering. Google is bending to wireless carriers, forcing the user to prove that they are subscribing to a cellular service that allows them to use USB tethering or the wireless hotspot. Personally, I find this dirty, and unfortunate. Even worse, is the fact that cellular providers think they can get away by charging double for using your own data. Data is data; it shouldn't matter if it comes from your phone, or your laptop connected to your phone. If they want to charge for overages on caps, whatever. But charging double, just because you connected your phone via USB? Or setting up a hotspot in your grandma's house, because she doesn't have WiFi but you have cellular coverage? Please. This is clearly grandfathered from the days of feature phones, where you couldn't tether or hotspot. So, you purchased a USB dongle to enable the hotspot. Even then, it was dirty, but it's clear that this is a byproduct of days gone by.

To enable tethering and the hotspot, you just need to add one line to /system/build.prop config file. Unfortunately, /system/ is mounted read-only. So, you'll have to remount it as read-write and edit the file. However, every attempt I have made at modifying it has ended up with an empty file- IE: losing all its contents. So, rather than editing it manually, there is an app for that.

Install https://play.google.com/store/apps/details?id=com.jrummy.apps.build.prop.editor&hl=en. Add "net.tethering.noprovisioning" and set the property to "true", then reboot your phone. At that point, you should be able to USB tether and setup a wireless hotspot.

Conclusion

This wasn't for the faint of heart or for someone who doesn't care about gaining the necessary control over their Android phone that root would give them (setting up firewalls, ad blockers, tethering/hotspot, etc.). However, as mentioned earlier, it's getting fairly difficult to hard brick and Android phone these days. Even better, the steps are getting somewhat standardized. IE: flash factory images, flash custom recovery, install SuperSU, & optionally enable tethering/hotspot.