Image of the glider from the Game of Life by John Conway
Skip to content

Hello Morse Code

As many of you may know, I am a licensed Amateur Radio operator in the United States. Recently, I've taken up a desire to learn Morse Code at a full 25 WPM using the Koch method. I only started last week, and tonight I copied my first beginners code "A NOTE TO TENNESSEE", and other such silliness. I don't know how many of my readers are hams, and how many of them know their CW.

Some equipment that I'm practicing with:

  • Morse Code Trainer- An Android application for both sending (tapping the screen) and receiving. It's flexible in that you can choose what to listen to, your speed, as well as your tone frequency and volume. Currently, I'm using it to largely just receive.
  • MFJ-557 code oscillator with key. This is very much a beginners straight key, but it comes with its own speaker, so you can hear how you sound when you transmit.
  • Morse Code Reader- Another Android application, this time for using the MIC input to listen to outside noise, and translate that to letters. I've found it to be somewhat unreliable, even in a quiet room, with only code to be heard. With that said, during the beginning stages, it seems to be more reliable than me on what it picks up and translates. So, it's good to look back, and see what I got wrong, and where I need improvement.

I'm hoping by the end of the year, I can copy code at a full 25 WPM with 90% or better accuracy. I'm not actually on planning to work on transmitting and spacing until next year.

Goodbye Ubuntu

In 1999, I discovered GNU/Linux. Before then, I was a Solaris fanboy. Solaris could do no wrong, and it even took until about 2003 before I finally made the plunge, and removed Solaris off my Sun Ultra 1 (complete with 21" CRT monitor), and put Debian GNU/Linux on it. It was either Debian, or Gentoo that had SPARC support, and compiling software from source didn't sound like a lot of fun. I also had an HP laptop. It ran SUSE, Red Hat Linux, and various other distros, until it too settled on Debian. Then, in October of 2004, while at a local LUG meeting, I learned of this Debian fork called "Ubuntu".

I gave it a try. I switched from using Debian to Ubuntu on my laptop. I liked the prospects of using something that had more frequent stable releases. After which, I helped setup the Ubuntu Utah users group. We had install fests, meetings, and other activities. Our group grew fast and strong. Then, I helped to start the Ubuntu US Teams project, getting local state and regional groups, strong like the Utah group was. Eventually, I applied for Ubuntu membership, and in 2006, I got my membership, syndicated my blog to the Ubuntu Planet, and I have been here since.

Sometime around 2008, things started changing in the Ubuntu culture, and it was becoming difficult to enjoy working on it. I'm not going to list everything that Canonical has done to Ubuntu, but it's been steady. Not committing patches upstream to Linux mainline. Breaking ties with the Debian project, including rolling their own packages. Group development moved to centralized development. Copyright assignments. Switching from GNOME to Unity. Then Unity lenses and Amazon advertising. Over and over, things began changing, and as a result, the culture changed. I stopped really loving Ubuntu. Eventually, I went back to Debian for my servers, laptops and workstations. Ubuntu isn't Unix anymore. It's Apple, and I'm not sure I like the change.

Now, Micah Flee, who works for the EFF, put up a "sucks site" showing how to disable the privacy violations in Unity. Rather than take it in stride, Canonical has decided to abuse trademark law, and issue a cease and desist notice of the Fix Ubuntu site. United States courts have shown over and over than "sucks sites" are free speech, fair use, and do not infringe on the company mark. In fact, no where on the Fix Ubuntu site is the actual Ubuntu trademark. No logo, no marks, nothing. Just text. Yet, Canonical wants to silence their critics using a heavy hand. To be fair, their notice is less grumpy and bullying than most cease and desist notices. However, it doesn't change the principle.

I can't be associated with a project like this any longer. Effective immediately, my blog will no longer on the Ubuntu Planet. My Ubuntu Membership will be cancelled. My "UBUNTU" license plates, which have been on my car since August 2006, will be removed, in favor of my Amateur Radio callsign.

I wish everyone in the Ubuntu community the best of wishes. I also hope you have the power to change Ubuntu back to what it used to be. I have no ill feelings towards any person in the Ubuntu community. I just wish to now distance myself from Ubuntu, and no longer be associated with the project. Canonical's goals and visions do not align with something I think should be a Unix. Don't worry though- I'll keep blogging. You can't get that out of my blood. Ubuntu just isn't for me any longer.

Goodbye Ubuntu.

Real Life NTP

I've been spending a good amount of my spare time recently configuring NTP, reading the documentation, setting up both a stratum 1 and stratum 2 NTP server, and in general, just playing around with NTP. This post is meant to be a set of notes of what I've learned in the process, and hopefully, it can benefit you. It's not meant to be an exhaustive, or authoritative set of instructions on how you should configure your own NTP installation.

Strata
Before getting into the client configuration, we need to understand how NTP serves time to clients. We need to understand the concept of "strata" or "stratum". An authoritative time source, such as GPS satellites, cesium atomic fountains, WWVB radio waves, and so forth, are referred to as "stratum 0" clocks. They are authoritative, because they have some way of maintaining extremely accurate timekeeping. Any time source will suffice, including a standard quartz oscillating clock. However, knowing that quartz based clocks can gain or lose up to 15 seconds per month, we don't generally use them as time sources. Instead, we're interested in time sources that don't gain or lose a second in 300,000 years, as an example.

Computers that connect to these accurate time sources to set their local time are referred to as "stratum 1" time sources. Because there is some inherent latencies involved with connecting to the stratum 0 time source, and the latencies involved with setting the time, as well as the drift that the stratum 1 clocks will exhibit, these stratum 1 computers may not be as accurate as their stratum 0 neighbors. In real life, the clocks on good stratum 1 computers will probably drift enough that their time will be off by a couple microseconds, compared to the stratum 0 source that their are getting their time.

Computers that connect to stratum 1 computers to synchronize their clocks are referred to as "stratum 2" time sources. Again, due to many latencies involved, stratum 2 clocks may not be as accurate as their stratum 1 neighbors, and even worse compared to the further upstream stratum 0 time sources. In practice, your stratum 2 server will probably be off from its stratum 1 upstream server by anywhere from a few microseconds to a few milliseconds. Many factors come into play in how this is calculated, but realize that stratum 2 computers, in practice, are probably the furthest time source from stratum 0 that you want to synchronize your clocks with.

Image showing the top 3 stratum levels of NTP.

As you would expect, stratum 3 clocks are connected upstream to stratum 2 clocks. Stratum 4 clocks are connected upstream to stratum 3 clocks, and so forth. Once you reach the lowest level of stratum 16, the clock is now considered to be unsynchronized. So again, in practice, you probably don't want to sync your computers clock with any strata lower than 2, thus making your computer a stratum 3. At this point, you're far enough away from the "true time" source, that your computer could exhibit time offsets anywhere from a few milliseconds to several hundred milliseconds.

If your clock is off by 1000 seconds, NTP will refuse to synchronize your clock, and it will require manual intervention. If the upstream stratum from which you are synchronizing your clock is off by 1000 milliseconds, or 1 full second, that time source will not be used in synchronizing your clock, and others will be picked instead (this is to help weed out bad time sources).

Client
Debian, Ubuntu, Fedora, CentOS, and most operating system vendors, don't package NTP into client and server packages separately. When you install NTP, you've made your computer both a server, and a client simultaneously. If you don't want to serve NTP to the network, then don't open the port in your firewall. In this section, we'll assume that you're not going to use NTP as a server, but wish to use it as a client instead.

I'm not going to cover everything in the /etc/ntp.conf configuration file, which is generally the standard installation path. However, there are a few things I do want to cover. First, the "server" lines. You can have multiple server lines in for configuration file. NTP will actively use up to 10. However, how many do you add? Consider the following:

  1. If you only have one server configured, and that server begins to drift, then you will blindly follow the drift. If that server consistently gained 5 seconds every month, so would you.
  2. If you only have two servers configured, then both will be automatically assigned as "false tickers" by NTP. If one of the servers began to drift, NTP would not be able to tell which upstream server is correct, as there would not be a quorum.
  3. If you have three or more servers configured, then you can support "false tickers", and still have an agreement on the exact time. If you have five or six servers, then you can support two false tickers. If you have seven or eight servers, you can support three false tickers, and if you have nine or ten servers configured, then you can support up to four false tickers.

NTP Pool Project
As a client, rather than pointing your servers to static IP addresses, you may want to consider using the NTP pool project. Various people all over the world have donated their stratum 1 and stratum 2 servers to the pool, Microsoft, XMission, and even myself have offered their servers to the project. As such, clients can point their NTP configuration to the pool, which will round robin and load balance which server you will be connecting to.

There are a number of different domains that you can use for the round robin. For example, if you live in the United States, you could use:

  • 0.us.pool.ntp.org
  • 1.us.pool.ntp.org
  • 2.us.pool.ntp.org
  • 3.us.pool.ntp.org

There are round robin domains for each continent, minus Antarctica, and for many countries in each of those continents. There are also round robin servers for projects, such as Ubuntu and Debian:

  • 0.debian.pool.ntp.org
  • 1.debian.pool.ntp.org
  • 2.debian.pool.ntp.org
  • 3.debian.pool.ntp.org

ntpq(1)
NTP ships with a good client utility for querying NTP; it's the ntpq(1) utility. However, understanding the output of this utility, as well as its many subcommands, can be daunting. I'll let you read its manpage and documentation online. I do want to discuss its peering output in this blog post though.

On my public NTP stratum 2 server, I run the following command to see its status:

$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*198.60.22.240   .GPS.            1 u  912 1024  377    0.488   -0.016   0.098
+199.104.120.73  .GPS.            1 u   88 1024  377    0.966    0.014   1.379
-155.98.64.225   .GPS.            1 u   74 1024  377    2.782    0.296   0.158
-137.190.2.4     .GPS.            1 u 1020 1024  377    5.248    0.194   0.371
-131.188.3.221   .DCFp.           1 u  952 1024  377  147.806   -3.160   0.198
-217.34.142.19   .LFa.            1 u  885 1024  377  161.499   -8.044   5.839
-184.22.153.11   .WWVB.           1 u  167 1024  377   65.175   -8.151   0.131
+216.218.192.202 .CDMA.           1 u   66 1024  377   39.293    0.003   0.121
-64.147.116.229  .ACTS.           1 u   62 1024  377   16.606    4.206   0.216

We need to understand each of the columns, so we understand what this is saying:

  • remote- The remote server you wish to synchronize your clock with
  • refid- The upstream stratum to the remote server. For stratum 1 servers, this will be the stratum 0 source.
  • st- The stratum level, 0 through 16.
  • t- The type of connection. Can be "u" for unicast or manycast, "b" for broadcast or multicast, "l" for local reference clock, "s" for symmetric peer, "A" for a manycast server, "B" for a broadcast server, or "M" for a multicast server
  • when- The last time when the server was queried for the time. Default is seconds, or "m" will be displayed for minutes, "h" for hours and "d" for days.
  • poll- How often the server is queried for the time, with a minimum of 16 seconds to a maximum of 36 hours. It's also displayed as a value from a power of two. Typically, it's between 64 seconds and 1024 seconds.
  • reach- This is an 8-bit left shift octal value that shows the success and failure rate of communicating with the remote server. Success means the bit is set, failure means the bit is not set. 377 is the highest value.
  • delay- This value is displayed in milliseconds, and shows the round trip time (RTT) of your computer communicating with the remote server.
  • offset- This value is displayed in milliseconds, using root mean squares, and shows how far off your clock is from the reported time the server gave you. It can be positive or negative.
  • jitter- This number is an absolute value in milliseconds, showing the root mean squared deviation of your offsets.

Next to the remote server, you'll notice a single character. This character is referred to as the "tally code", and indicates whether or not NTP is or will be using that remote server in order to synchronize your clock. Here are the possible values:

  • " " Discarded as not valid. Could be that you cannot communicate with the remote machine (it's not online), this time source is a ".LOCL." refid time source, it's a high stratum server, or the remote server is using this computer as an NTP server.
  • "x" Discarded by the intersection algorithm.
  • "." Discarded by table overflow (not used).
  • "-" Discarded by the cluster algorithm.
  • "+" Included in the combine algorithm. This is a good candidate if the current server we are synchronizing with is discarded for any reason.
  • "#" Good remote server to be used as an alternative backup. This is only shown if you have more than 10 remote servers.
  • "*" The current system peer. The computer is using this remote server as its time source to synchronize the clock
  • "o" Pulse per second (PPS) peer. This is generally used with GPS time sources, although any time source delivering a PPS will do. This tally code and the previous tally code "*" will not be displayed simultaneously.

Lastly, in understanding the output, we need to understand the what is being used as a reference clock in the "refid" column.

  • IP address- The IP address of the remote peer or server.
  • .ACST.- NTP manycast server.
  • .ACTS.- Automated Computer Time Service clock reference from the American National Institute of Standards and Technology.
  • .AUTH.- Authentication error.
  • .AUTO.- Autokey sequence error.
  • .BCST.- NTP broadcast server.
  • .CHU.- Shortwave radio receiver from station CHU operating out of Ottawa, Ontario, Canada.
  • .CRYPT.- Autokey protocol error
  • .DCFx.- LF radio receiver from station DCF77 operating out of Mainflingen, Germany.
  • .DENY.- Access denied by server.
  • .GAL.- European Galileo satellite receiver.
  • .GOES.- American Geostationary Operational Environmental Satellite receiver.
  • .GPS.- American Global Positioning System receiver.
  • .HBG.- LF radio receiver from station HBG operating out of Prangins, Switzerland.
  • .INIT.- Peer association initialized.
  • .IRIG.- Inter Range Instrumentation Group time code.
  • .JJY.- LF radio receiver from station JJY operating out of Mount Otakadoya, near Fukushima, and also on Mount Hagane, located on Kyushu Island, Japan.
  • .LFx.- Generic LF radio receiver.
  • .LOCL.- The local clock on the host.
  • .LORC.- LF radio receiver from Long Range Navigation (LORAN-C) radio beacons.
  • .MCST.- NTP multicast server.
  • .MSF.- National clock reference from Anthorn Radio Station near Anthorn, Cumbria.
  • .NIST.- American National Institute of Standards and Technology clock reference.
  • .PPS.- Pulse per second clock discipline.
  • .PTB.- Physikalisch-Technische Bundesanstalt clock reference operating out of Brunswick and Berlin, Germany.
  • .RATE.- NTP polling rate exceeded.
  • .STEP.- NTP step time change. The offset is less than 1000 millisecends but more than 125 milliseconds.
  • .TDF.- LF radio receiver from station TéléDiffusion de France operating out of Allouis, France.
  • .TIME.- NTP association timeout.
  • .USNO.- United States Naval Observatory clock reference.
  • .WWV.- HF radio receiver from station WWV operating out of Fort Collins, Colorado, United States.
  • .WWVB.- LF radio receiver from station WWVB operating out of Fort Collins, Colorado, United States.
  • .WWVH.- HF radio receiver from station WWVH operating out of Kekaha, on the island of Kauai in the state of Hawaii, United States.

Client Best Practice
There seem to be a couple long standing myths out there about NTP configuration. The first is that you should only use stratum 1 NTP servers, because they are closest to the true time source. Well, this isn't always the case. Connecting to stratum 1 time servers that have high RTT latencies could exhibit large jitter and large offsets. Rather, you should find stratum 1 servers that are physically close to your client. Also, many stratum 1 servers might be overloaded, and finding less stressed stratum 2 servers might deliver more accurate results.

The other myth out there is that you should only connect to physically close NTP servers. This isn't necessarily true either. If the closest NTP servers to you only have one physical link, and that link goes down, you're sunk. Further, if the closest NTP servers to you are stratum 4 or 5 servers, you may exhibit high offsets from the upstream stratum 0 sources. There is a reason why the NTP Pool Project only lists public stratum 1 and stratum 2 servers, and there's a reason why stratum 16 is considered unsynchronized.

Point is, there is a balance in configuring NTP. If you have a large infrastructure, it would make sense for you to build and install a stratum 1 or stratum 2 source at each logically different location (geographically or VLAN'd), and have each server and workstation connect to that logically local NTP server. If it's just your personal computer, then it probably makes sense to just use the NTP Pool Project, and use the round robin domain names. You should keep efficiency and redundancy in mind.

So, you should probably consider the following best practices when configuring your NTP client:

  • Use at least 3 servers, and don't statically use busy servers.
  • Consider using the NTP pool project, if you will be operating as a client only.
  • If statically setting IP addresses for your servers, try to keep the following in mind:
    • Use servers that are physically close to your computer. These servers should have low ping latencies.
    • Use servers that are geographically separated across the globe. Just in case the trans-Atlantic cable is cut, you can still communicate to other servers.
    • Use servers that use different time sources. If all of your servers use GPS as their time source, and GPS goes offline, you will not have anything to synchronize your clocks against.
    • Consider using all 3 of the above on a single client.

Open Letter To All GNU/Linux and Unix Operating System Vendors

This is an open letter to all GNU/Linux and Unix operating system vendors.

Please provide some sort of RSS or Atom feed for just new releases. Nothing else. No package updates. No "community" posts. No extra fluff. It shouldn't include news about being included in the Google Summer of Code. It shouldn't provide a list of package security advisories. It shouldn't include why you think dropping one package for a fork is a good idea.

Did you just release "H4x0rz Linux 13.37"? Great! Publish that release news to a central spot, where only releases are posted, and give me a feed to parse. Still confused? Let me give you an example:

Perfect. Includes alpha releases, which are fine. But it focuses only on the new releases. No community news; that's what planets are for. No package updaets; I can figure those out in the OS itself. Just releases.

Here's a list of vendors that I would like to put in my feed reader, that I cannot find any such centralized feed source:

  • CentOS
  • Debian
  • Fedora
  • FreeBSD
  • Linux Mint
  • OpenBSD
  • OpenSUSE
  • Scientific Linux
  • Slackware

I know some projects have web forums, of which there may be a subforum dedicated to releases only. If that forum provides an RSS feed, perfect. I know some mailing list managers also provide RSS feeds for archives. That works too. I don't care where it comes from, just so long as there is a reliable source where I can get reliable, up-to-date news on just the latest release, nothing else.

If such feeds exist for these operating systems, please help me in the comments.

Thanks!

Masquerade Computer Network Interfaces

I just recently acquired a Raspberry Pi at SAINTCON 2013. I already had one, and forgot how much fun these little computers can be. I also forgot what a PITA they can be if you don't have your house hard wired to your switch for Internet access, and have to go into the basement to plug in. Plugging into a monitor and keyboard isn't a big deal for me, it's just the inconvenience of getting to the Internet. So, I downloaded Raspbian, ran through the initial config, including setting up an SSH server. The only thing left to do is get it online, and that will take a little config, which this post is about.

My laptop is connected wirelessly, so my ethernet port is available. So, I should be able to plug the Raspberry Pi into the laptop, and have it use the laptop's wireless connection. In other words, using my laptop as a router and a gateway. So, let's get started. Below is an image of what I am trying to accomplish:

Image showing the Raspberry Pi connected to the laptop, which in turn is connected to the Internet wirelessly.

The Raspberry Pi needs to be connected to the laptop via a standard twisted pair ethernet cable. The laptop will be connecting to the Internet wirelessly. So, while I still had my Raspberry Pi connected to the monitor and keyboard, while it is offline, I edit the /etc/network/interfaces file as follows:

# Raspberry Pi
iface eth0 inet static
    address 172.16.1.2
    netmask 255.255.255.252
    gateway 172.16.1.1

Then, on my laptop, I gave my ethernet port the address of "172.16.1.1" (mostly because no one ever uses this network- so it shouldn't conflict with your home/office). My laptop must be the gateway to the Internet for the Raspberry Pi. Notice that my laptop does not need a gateway for this interface. Instead, it's going to masquerade off of the "wlan0" interface, which already has a gateway configured:

# Laptop
iface eth0 inet static
    address 172.16.1.1
    netmask 255.255.255.252

Now, I need to make my laptop a router, so it can route packets from one network (172.16.1.0/30) to another (whatever the "wlan0" interface is connected to). As such, run the following command as root:

echo 1 > /proc/sys/net/ipv4/ip_forward

Now, that this point, the "eth0" and "wlan0" interfaces are logically disconnected. Any packets coming into the "eth0" device won't make it any further. So, we need to create a logical pairing, called a "masquerade". This will allow packets going in "eth0" to exit "wlan0", and vice versa. So, as root, pull up a terminal, and type the following:

iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
iptables -A FORWARD -i eth0 -j ACCEPT

If you have any firewall rules in your INPUT chain, you will need to open up access for the 172.16.1.0/30 network.

At this point, plug your Raspberry Pi into your laptop, SSH into the Pi, and see if you can ping out to the Internet.

NTP Drift File

Many things about NTP are elusive. At the casual user, there are a lot of things to understand: broacast, unicast, multicast, tally codes, servers, peers, stratum, delay, offset, jitter and so much more. Unless you setup your own NTP server, with the intent of providing accurate time keeping for clients, many of those terms can be discarded. However, one term you may want to be familiar with is "drift".

Clock drift is when clocks are either too fast or too slow compared to a reference clock. NTP version 4 has the ability to keep clocks accurate within 233 picoseconds (called "resolution"). Of course, to have this sort of accuracy, you need exceptionally low latency networks with specialized hardware. High volume stock exchanges might keep time accuracy at this level. Generally speaking, for the average NTP server and client on the Internet, comparing time in milliseconds is usually sufficient.

So, where does NTP keep track of the clock drift? For Debian/Ubuntu, you will find this in the /var/lib/ntp/ntp.drift file. In the file, you'll find either a positive or negative number. If it's positive, your clock is fast; if it's negative, your clock is slow. This number however, is not measured in seconds, milliseconds, nanoseconds or picoseconds. Instead, the number is measuring "parts per million", or PPM. It's still related to time, and you can convert this number to seconds, which I'll show you here.

There are 86,400 seconds in one day. If I were to divide that number into one million pieces, then there would be .0864 seconds per piece, or 86.4 milliseconds per piece.

86,400 s\div1,000,000=0.0864 s

My laptop connects to the standard NTP pool (0.us.pool.ntp.org, etc). I have a number of "3.322" in my drift file. This means that my laptop is fast by 3.322 PPM compared to the time source I am synchronizing my clock with (called the "sys_peer"). If I wanted to convert that to seconds, then:

0.0864s\times3.322=0.2870208s

My laptop is fast by roughly 287 milliseconds compared to my "sys_peer".

I just recently announced an open access NTP server. It was critical for me that this server be as accurate as possible with time keeping. So, all of the stratum 1 time servers that it connected to, had to have a ping latency of less than 10 milliseconds. Thankfully, I was able to find 3 servers with latencies less than 6 milliseconds, one of which is only 500 nanoseconds away. This became the preferred "sys_peer". The contents of its drift file currently is "-0.059". Again, converting this to seconds:

0.0864s\times-0.059=-0.0050976s

My NTP server is slow by roughly 5 milliseconds compared to the "sys_peer" time source at that specific moment.

Hopefully this clears up the NTP drift file, which I'm sure many of you have noticed. If you connect to NTP servers with very low latencies, then you'll notice that your drift file number approach zero. It's probably best to find 3 or 5 NTP servers that are physically close to you to keep those latencies low. If you travel a lot with your laptop, then connecting to the NTP pool would probably be best, so you don't need to constantly change the servers you're connecting to.

New Public NTP Server

I just assembled a public access NTP stratum 2 server. Feel free to use it, if you wish. It is considered "Open Access". It has a public webpage at http://jikan.ae7.st. This stratum 2 server has a few advantages over some others online:

  • It connects to three stratum 1 GPS time-sourced servers.
  • Each stratum 1 server is less than 6 milliseconds away.
  • The preferred stratum 1 server is about .5 milliseconds away.
  • Stratum 2 peering available- just contact me.
  • It has a 100 Mbit connection to the Internet.
  • The ISP sits behind four redundant upstream transit providers.
  • The ISP also peers on the Seattle Internet Exchange.

It is also available in the NTP pool at http://www.pool.ntp.org/en/. If you want to synchronize your computer with this server, then just add the following line in your /etc/ntp.conf configuration file:

server jikan.ae7.st

Eventually, I'll also offer encrypted NTP for those who wish to have encrypted NTP packets on the wire (only if it's possible to offer both encrypted and unencrypted NTP simultaneously- I think it is). I'm also currently working on finding some other stratum 2 peers that are less than 30 milliseconds away. If you're running an NTP server, and want to peer with me, just let me know.

Hopefully, this will be of some benefit to the community.

Pthree.org Is Now SSL Enabled

Just a quick update to say that I have enabled SSL, and forced it by default, for this blog. Given all the revelations about the NSA, the straw finally broke the camel's back, and we are now live with SSL. There may be some growing pains, seeing as though this will cost me a bit more on CPU, but I should be able to adjust the load with the growth.

If there are any oddities, or anything of concern that you notice regarding switching this blog to SSL, please let me know in the comments, email me, get me on IRC, or whatever. Thanks.

Sufficient Paranoia

With all the recent revelations about the NSA violating United States citizen's 4th amendment rights with their warrantless wiretapping, and now the news of Silk Road being taken down, and the NSA trying to crack Tor (it won't happen- I trust the mathematics), I thought now would be a good time to discuss the concept of healthy, or sufficient paranoia.

I am a system administrator by profession. I have certain levels of fears that make sure I don't make a mistake:

  • I assume that installing new software will break something.
  • I assume upgrading the BIOS will brick the hardware.
  • I assume the hardware firewall will fail.
  • I assume hard drives will fail.
  • I assume the janitors have installed a key logger on my machine.
  • I assume walking away from my machine, means my coworkers will want to hack my Gibson.
  • I assume backups aren't working

As such, I take the following measures:

  • I have a backup of the data.
  • I have a disaster recovery plan to take out the old drives, and put them into new hardware.
  • I have redundant software firewalls installed on all my boxes.
  • I have redundant drives, and I have a backup of the data on those drives.
  • I run visual checks to make sure no new hardware has been added.
  • I always lock my workstation. Always.
  • I test restoring data, even when I don't have to.

There's other paranoia that I have. These things keep me in check. They help me sleep at night. Once, I heard a story from my scout leader about always being prepared. He shared the story like this:

It was at the annual county fair, and farmers from far and near had come to exhibit their harvest and to engage hired hands for the next year. One prosperous farmer came across a husky lad and asked: "What can you do?" The answer: "I can sleep when the wind blows." With such an answer the farmer turned and started to walk away, perturbed at the impudence of the man. But he turned again and asked: What did you say?" "I can sleep when the wind blows." "Well," said the farmer, "I don't know what that means, but I'm going to hire you anyway."

Winter came, followed by the usual spring, and the new hired hand didn't show any particular signs of extra work, but filled the duties of his work as most others would have done. And then one night in early summer the farmer noticed a strong wind rising. He dashed to the hired hand's quarters to arouse him to see that all the stock was properly cared for. There he found the hired hand asleep. He was about to awaken him, when he remembered the boy's strange statement. He went to his barns and there found all his animals in their places, and the doors and windows securely locked. He found the haystack had been crisscrossed with heavy wires, anticipating such a night, and that it would weather the storm.

Then the farmer knew what his hired man meant when he gave as his only qualification, "I can sleep when the wind blows."

I'm sure you've heard similar versions of this story. It has a lot of applications, including sufficient paranoia. The hired assistant kept realized the fear of lost of dead animals. He understood the fear of haystacks blown away with the wind. He knew what flooded barns and stables meant. He had sufficient paranoia, that in the worst of cases, he was prepared. However, not only was he sufficiently paranoid, but his paranoia likely lead to a behavior that most would consider odd.

The same can be said for security. I cryptographically sign all of my emails with my GPG key. I have been doing this since 2005, and I don't see any need to stop now. I've been asked about it many times. My response is always the same: "If you receive an unsigned email from me, then you should question the authenticity of the sender." Of course, it's their duty to verify the signature is valid. I've done my duty by signing them. And what happens when I appear in front of a judge in a court of law, and an email claiming to be sent from me is called into question? I can show with unwavering consistency that I have signed every email since 2005, which would then call into doubt the email in question, if that email is not cryptographically signed. Innocent until proven guilty.

I recently did an audit on all my account passwords. Not only is every account a different, truly random password, but I make sure that the entropy of every passwords exceeds 120 bits, where possible. Further, every account uses a password I know from my password card, as well as a long password I don't know from my Yubikey. So, I have two-factor authentication for every account, where possible. Given what I know about password cracking, this is good security, for very little cost. Not even my wife knows my passwords (which could prove to be difficult if I die).

I even have a different SSH key for every computer, and each SSH key is encrypted with a different password. I encrypt the SSH key with SSL, instead of the default encryption OpenSSH uses, to slow down offline passphrase attacks.

I don't recycle my shredded paper. Instead, I use it as kindling for my parents fireplace during the winter. I've also used it as mulch for our small box garden in the back yard, and our flower garden in the front. If it gets thrown away, I do it in sections- thoroughly mix the shredded paper, and throw away 1/10th of it one month. Then 1/10th the next month, at a different location. Et cetera. I'm paranoid that someone at the land fill is going through the garbage, looking for freebies. The last thing I want is my bank account number found (although improbable given my super awesome paper shredder).

I use Ghostery and AdBlock as necessary extensions for my browsers. When I don't have control of the computer, or the network, I use a browser on a USB thumb drive, in private browsing mode, connected to either an SSH or Tor proxy, including proxying DNS, and I never view Flash media.

Whenever I walk away from my computer, I make sure I lock the screen, pull my Yubikey, and put it in my wallet. Yes, it's trivial for someone to take the contents of the key while I am away, and it's just as trivial for me to take my Yubikey with me when I leave the keyboard.

I run an encrypted filesystem on my computers and servers. For sensitive data, I keep those GPG-encrypted in an eCrypftFS mount, which is also two-factor password protected. I can give law enforcement what I know, without needing to tell them about what I have, without compromising the system.

There are many other things I do, such as not divulging private details of personal things over SMS or IM, or sometimes, even over voice. I always lock my doors, even if I'm occupying the space. When in crowded environments, I put my wallet in my front pocket, under my hand. I could go on and on.

I do these things, because I have what I call "sufficient paranoia". It's just good security practice. Does it make me look crazy, even to my coworkers? Of course. Am I worried that the NSA has bugged my house, or my wife is a secret spy? No. I maintain balance.

We don't know what the future will bring. We don't know if tomorrow, it can be proved that P = NP, and all cryptograhpy falls apart as a result. We don't know the full extent of the NSA illegal spying. We don't know when Google is breached, and all accounts are sold to the highest bidder. We can't control these things. What we can control is how to be prepared for them. We can control a certain level of paranoia that keeps everything in check.

Sufficient paranoia.

Identification Versus Authentication

Recently, Apple announced and released the iPhone 5S. Part of the hardware specifications on the phone is a new fingerprint scanner, coupled with their TouchID software. Immediately upon the announcement, I wondered how they would utilize the fingerprint. It is unfortunate, but not surprising, that they are using your fingerprint incorrectly.

To understand how, we first need to understand the difference between "identification" and "authentication". Your fingerprint should be used as an identifying token, and not an authenticating one. Unfortunately, most fingerprint scanner vendors don't follow this advice. In other words, when you scan your fingerprint, the software should identify you from a list of users. After identifying who you are, you then provide the token to authenticate that you are indeed the correct person. This is generally how usernames and passwords work. You provide a username to the login form to claim that you are indeed the correct person. Then you provide a password or some other token to prove that is the case. Your figerprint should be used as the identifying token, such as a username in a login form, rather than as the authenicating token, such as a password.

Why? Here's some concerns with using fingerprints as authentication tokens:

  • Fingerprints can't be changed easily. Once someone has compromised your account by lifting your print off of a surface, you can't just "change your fingerprint".
  • Fingerprints are easy low-hanging fruit for Big Brother. If faced in a situation where you must turn over your authentication tokens, it's much easier for Big Brother to get your fingerprint, than it is to get a long password.
  • Lifting fingerprints is easily hacked. They provide very little security. Further, your fingerprints are everywhere, especially on your phone. If you lost your iPhone 5S, or it's stolen, the bad guys now have your fingerprints.

To illustrate how easy that last bullet point is, the Chaos Computer Club posted a YouTube video on breaking the TouchID software with little difficulty. And they're hardly the first. Over, and over, and over again, fingerprint scanners are quickly broken. While the tech is certainly cool, it's hardly secure.

While I like to throw jabs and punches an Apple, Inc., I expected much more from them. This seems like such a n00b mistake, it's almost hard to take seriously. A fingerprint scanner on a phone would make sense where multiple users could use the device, independent of each other, such as the release of Android 4.2, where multiuser support was added. Scanning your finger would identify you to the device, and present a password, pattern or PIN entry dialog, asking you to authenticate. That's appropriate use of a fingerprint scanner.

RIP Microsoft Tag

So yesterday, Microsoft announced that it is ending support for Tag, (on Facebook?) it's proprietary barcode format. To me, this doesn't come as a major surprise. Here's why:

  1. Tag arrived very late in the game, after QR Codes were pretty much establishing themselves as "the norm".
  2. Tag is a proprietary barcode. No way around it.
  3. Tag is a specific implementation of HCCB, and the specification for HCCB was never released to developers.
  4. Tag requires a data connection to retrieve the data out of the barcode.
  5. Tag requires a Microsoft account to create Tags.
  6. Tag requires a commercial non-free license for using Tags in a commercial space, or for selling products as a result of scanning the Tag.
  7. Tags can only be scanned using Microsoft Tag readers.
  8. Generating a Tag barcode means accepting its Terms of Service. A barcode has a TOS.

Just over a year ago, I gave credit where credit is due. Microsoft had something awesome with HCCB. It's technically superior to most 2D barcodes in many ways. But, Microsoft never made that specification to create HCCB codes available. In fact, I even had an email discussion with a Microsoft engineer regarding HCCB and Tag:

From: Aaron Toponce
To: Microsoft Tag Support
Subject: HCCB Specification

I'm familiar with Microsoft Tag, an HCCB implementation, but I'm interested in the specifications for HCCB. Specifically, I am interested in generating HCCB codes that are not Microsoft Tag. I have an account on http://tag.microsoft.com, but don't see any way to generate HCCB codes outside of Microsoft Tags.

Any help would be great.

--------------------
From: Microsoft Tag Support
To: Aaron Toponce
Subject: Re: HCCB Specification

Hello Aaron,

Thank you for contacting Microsoft Tag Support. Could you please elaborate the issue as we are not able to understand the below request. High Capacity Color Barcode (HCCB) is the name coined by Microsoft for its technology of encoding data in a 2D barcode using clusters of colored triangles instead of the square pixels. Microsoft Tag is an implementation of HCCB using 4 colors in a 5 x 10 grid. Apart from Tag barcode you can also create QR code and NFC URL using Microsoft Tag Manager.

For more information http://tag.microsoft.com/what-is-tag/home.aspx

Attached is the document on specification of Tag, hope this will help you.

If you have any questions, comments or suggestions, please let us know.

Thank you for your feedback and your interest in Microsoft Tag.

--------------------
From: Aaron Toponce
To: Microsoft Tag Support
Subject: Re: HCCB Specification

I'm not interested in Tag. I'm only interested in the High Capacity Color Barcode specification, that Tag uses. For example, how do I create offline HCCB codes? How do I implement digital signatures? How do I implement error correction? I would be interested in developing a Python library to create and decode HCCB codes.

I am already familiar with HCCB, what it is, how it works, and some of the features. I want to develop libraries for creating and decoding them. But I can't seem to find any APIs, libraries or documentation in encoding and decoding HCCB.

I never got a followup reply. Basically, I'm allowed to create Tags, but not HCCB codes. I'm sure I'm not the only developer denied access to HCCB. I've thoroughly scanned the Internet looking for anything regarding building HCCB barcodes. It just doesn't exist. Plenty of sites detailing how to create a Microsoft account to create Tags, but nothing for standard offline HCCB codes.

So, citing the reasons above it's no surprise to me that Tag failed. Maybe Microsoft will finally release the specifications of HCCB to the market, so people can create their own offline HCCB codes, and develop apps for scanning, encoding and decoding them. Time will tell, and I'm not losing any sleep over it.

RIP Tag.

The NSA and Number Stations- An Historical Perspective

With all the latest news about PRISM and the United States government violating citizen's 4th amendment rights, I figured I would throw in a blog post about it. However, I'm not going to add anything really new about how to subvert the warantless government spying. Instead, I figured I would throw in an historical perspective on how some avoid being spied on.

The One-time Pad

In order to understand this post, we first need to understand the One-time Pad, or OTP for short. The OTP is a mathematically unbreakable encryption algorithm, which uses a unique and different random key for every message sent. The OTP must be the same length as the message being sent, or longer. The plaintext is then XOR'd with the OTP to create the ciphertext. The recipient on the other end has a copy of the OTP, which is used to XOR the ciphertext, and get back to the original plaintext. The system is extremely elegant, but it's not without its flaws.

First, the OTP must be communicated securely with the recipient. One argument against the OTP is if you can communicate your key securely, then why not just communicate the message in that manner? That's a fine question, except it misses one critical point: more than one OTP can be communicated at first meeting. The recipient might have 20 or 50 OTPs in their possession, knowing the order in which they are used.

Second, if the same OTP key is used for two or more messages, and those messages are intercepted, they can be used to derive the private key! It is exceptionally critical that every message be encrypted with its own unique and random OTP. This is not trivial.

One major advantage of the OTP is the lack of incriminating evidence. OTPs have been found on rice paper, bars of soap, microfilm, or hidden in plain sight, such as using words from a book or a crossword puzzle. One the key has been used, it can be destroyed with minimal effort. Compared to destroying data on a computer, which is much more difficult, than say, burning the rice paper, or shuffling a deck of cards.

Field Agents

Enter spies and field agents. Suppose a government wishes to communicate with a field agent in a remote country. The message they wish to send is "ATTACK AT DAWN". How do you get this message delivered to your agent securely and anonymously? More importantly, how can your field agent intercept the message without raising suspicion, or without any incriminating evidence against them?

This turns out to be a difficult problem to solve. If you meet at a specific location at a specific time, how do you communicate it without raising suspicion? Maybe you mail a package or envelope to your agent, but then how do you know it won't be intercepted and examined? Many totalitarian states, such as North Korea, examine all inbound and outbound mail.

Numbers Stations

Enter radio. First, in developed countries, just about everyone owns a radio. You can purchase them just about everywhere, and carrying one around, or having one in your room, is not incriminating enough to convict you as a spy. Second, your field agent already has a set of OTPs on hand. So, transmitting the encrypted message over the air isn't a problem for interception.

So, roughly around the time of World War 2, governments started communicating with field agents on the radio. Now, this can neither be confirmed, nor denied, but numbers stations have been on the air for decades. Numbers stations are illegal transmissions, usually on the edges of short wave bands. Typically, this is referred to as "pirate radio", and governments are very effective at finding them. Most of these numbers stations have very rigid schedules; so rigid, you could set your watch to them. If they are not transmitted by government agencies, they would be shut down fast. Given the length they've been on the air, the sheer number of them, and their rigid schedules, tells us that government agencies are the best bet for the source of the transmission.

So, what does a numbers station sound like? Typically, most of them have some sort of "header" transmission, before getting into the "body" of the encrypted text. This header could be a series of digits repeated over and over, a musical melody, a sequence of tones, or nothing. Then the body is delivered. Typically, it's given in sets of 5 numbers, which is common in cryptography circles. Something like "51237 65500 81734", etc. The transmissions are usually short, roughly 3-5 minutes in length. Some transmissions will end with a "footer", like "000 000" or "end transmission" for the agent to identify the transmission is over. There is never any sort of station identification. They are one-way anonymous transmissions. Almost always, the voice reading the numbers is computer generated. They can be transmitted in many different languages: Spanish, English, German, Chinese, etc. And if that's not enough, some are verbally spoken, some in Morse code, some digital.

Want to hear what one sounds like? Here is a transmission from the "Lincolnshire Poacher" (Wikipedia page, found in the article). Some numbers stations have been given names by their enthusiasts, who listen and record them frequently. In this case, named after an English folk-song, because it is played as the header to every transmission. However, the station didn't exist in England. Rather, it was stationed in Cyprus.

Don't think that sounds eerie enough? There is a German numbers station called the "Swedish Rhapsody", where it starts by ringing church bells for the header. Then, a female child voice reads the numbers. You swear this could be something out of a horror movie.

Not all stations stay on the air either. Many disappear over time, some quickly, some after many years. The Licolnshire Poacher numbers station was on the air for about 20 years, before it went silent. Numbers stations also don't always have rigid schedules. Some will just appear seemingly out of nowhere, and never come back online. And because these are on shortwave bands, they can travel hundreds and thousands of miles, so your field agent could literally be anywhere in the world. So long as he has his radio with him, a decent antenna, and a clear sky overhead, he'll pick it up.

The NSA

So, where does that bring us? Well, with the NSA spying us, numbers stations sound like an attractive alternative to phone and email conversations. Now, as already mentioned, numbers stations are illegal, especially in the United States. So, it doesn't seem like an attractive alternative, even if they are still on the air.

However, the OTP can be an effective and practical way to send messages securely. I mentioned almost a year ago, of a way to create a USB hard drive with a OTP on the drive. Both the sender and the recipient have an exact copy of the drive, along with a software utility necessary for encrypting and decrypting the data, as well as destroying the bits used for the OTP. Once the bits on the drives are all used up, the sender and the recipient meet to rebuild the OTP on the drive.

OpenSSH Keys and The Drunken Bishop

Introduction
Have you ever wondered what the "randomart" or "visual fingerprint" is all about when creating OpenSSH keys or connecting to OpenSSH servers? Surely, you've seen them. When generating a key on OpenSSH version 5.1 or later, you will see something like this:

$ ssh-keygen -f test-rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in test-rsa.
Your public key has been saved in test-rsa.pub.
The key fingerprint is:
18:ff:18:d7:f4:a6:d8:ce:dd:d4:07:0e:e2:c5:f8:45 aaron@kratos
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|      .     . E  |
|       +   = o   |
|      . S + = =  |
|         * * * ..|
|        . + + . +|
|           o . o.|
|            o . .|
+-----------------+

I'm sure you've noticed this, and probably thought, "What's the point?" or "What's the algorithm in generating the visual art?" Well, I'm going to answer those questions for you in this post.

This post is an explanation of the algorithm as explained by Dirk Loss, Tobias Limmer, and Alexander von Gernler in their PDF "The drunken bishop: An analysis of the OpenSSH fingerprint visualization algorithm". You can find their PDF at http://www.dirk-loss.de/sshvis/drunken_bishop.pdf‎. In the event that link is no longer available, I've archived the PDF at http://aarontoponce.org/drunken_bishop.pdf.

Motivations

Bishop Peter finds himself in the middle of an ambient atrium. There are walls on all four sides and apparently there is no exit. The floor is paved with square tiles, strictly alternating between black and white. His head heavily aching—probably from too much wine he had before—he starts wandering around randomly. Well, to be exact, he only makes diagonal steps—just like a bishop on a chess board. When he hits a wall, he moves to the side, which takes him from the black tiles to the white tiles (or vice versa). And after each move, he places a coin on the floor, to remember that he has been there before. After 64 steps, just when no coins are left, Peter suddenly wakes up. What a strange dream!

When creating OpenSSH key pairs, or when connecting to an OpenSSH server, you are presented with the fingerprint of the keypair. It may look something like this:

$ ssh example.com
The authenticity of host 'example.com (10.0.0.1)' can't be established.
RSA key fingerprint is d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9.
Are you sure you want to continue connecting (yes/no)?

At this point, as a responsible citizen of the community, you call up the system administrator of the host "example.com", and verify that the fingerprint you are being presented with is the same fingerprint he has on the server for the RSA key. If the fingerprints match, you type "yes", and continue the connection. If the fingerprints do not match, you suspect a man-in-the-middle attack, and type "no". If the server is a server under your control, then rather than calling up the system administrator for that domain, you physically go to the box, pull up a console, and print the server's RSA fingerprint.

In either case, verifying a 32-character hexadecimal string is cumbersome. If we could have a better visual on the fingerprint, it might be easier to verify that we've connected to the right server. This is where the "randomart" comes from. Now, when connecting to the server, I can be presented with something like this:

The authenticity of host 'example.com (10.0.0.1)' can't be established.
RSA key fingerprint is d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9.
+--[ RSA 2048]----+
|             o . |
|         . .o.o .|
|        . o .+.. |
|       .   ...=o+|
|        S  . .+B+|
|            oo+o.|
|           o  o. |
|          .      |
|           E     |
+-----------------+
Are you sure you want to continue connecting (yes/no)?

Because I have a visual representation of the server's fingerprint, it will be easier for me to verify that I am connecting to the correct server. Further, after connecting to the server many times, the visual fingerprint will become familiar. So, upon connection, when the visual fingerprint is displayed, I can think "yes, that is the same picture I always see, this must be my server". If a man-in-the-middle attack is in progress, a different visual fingerprint will probably be displayed, at which point I can avoid connecting, because I have noticed that the picture changed.

The picture is created by applying an algorithm to the fingerprint, such that different fingerprints should display different pictures. Turns out, there can be some visual collisions that I'll lightly address at the end of this post. However, this visual display should work in "most cases", and cause you to start verifying fingerprints of OpenSSH keys.

The Board
Because the bishop finds himself in a room, with no exits, and only walls, we need to create a visually square room on the terminal. This is done by creating a room with 9 rows and 17 columns, creating a total of 153 total squares the bishop can travel. The bishop must start in the exact center of the room, thus the reason for odd-numbered rows and columns.

Our board setup then looks like this, where "S" is the starting location of the drunk bishop:

             1111111
   01234567890123456
  +-----------------+x (column)
 0|                 |
 1|                 |
 2|                 |
 3|                 |
 4|        S        |
 5|                 |
 6|                 |
 7|                 |
 8|                 |
  +-----------------+
  y
(row)

Each square on the board can be thought as a numerical position from Cartesian coordinates. As mentioned, there are 153 squares on the board, so each square gets a numerical value through the equation "p = x + 17y". So, p=0, for (0,0); p=76 for (8,4), the starting location of our bishop; and p=152, for (16,8), the lower right-hand corner of the board. Having a unique numerical value for each position on the board will allow us to do some simple math when the bishop begins his random walk.

The Movement
In order to define movement, we need to understand the fingerprint that is produced from an OpenSSH key. An OpenSSH fingerprint is an MD5 checksum. As such, it has a 16-byte output. An example fingerprint could be "d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9".

Because the bishop can only move one of four valid ways, we can represent this in binary.

  • "00" means our bishop takes one move diagonally to the north-west.
  • "01" means our bishop takes one move diagonally to the north-east.
  • "10" means our bishop takes one move diagonally to the south-west.
  • "11" means our bishop takes one move diagonally to the south-east.

With the bishop in the center of the room, his first move will take him off square 76. After his first move, his new position will be as follows:

  • "00" will place him on square 58, a difference of -18.
  • "01" will place him on square 60, a difference of -16.
  • "10" will place him on square 92, a difference of +16.
  • "11" will place him on square 94, a difference of +18.

We must now convert our hexadecimal string to binary, so we can begin making movements based on our key. Our key:

d4:d3:fd:ca:c4:d3:e9:94:97:cc:52:21:3b:e4:ba:e9

would be converted to

11010100:11010100:11111101:11001010:...snip...:00111011:11100100:10111010:11101001

When reading the binary, we read each binary word (8-bits) from left-to-right, but we read each bit-pair in each word right-to-left (little endian). Thus our bishop's first 16 moves would be:

00 01 01 11 00 01 01 11 01 11 11 11 10 10 00 11

Or, you could think of it in terms of steps, if looking at the binary directly:

4,3,2,1:8,7,6,5:12,11,10,9:16,15,14,13:...snip...:52,51,50,49:56,55,54,53:60,59,58,57:64,63,62,61

Board Coverage
All is well and good if our drunk bishop remains in the center of the room, but what happens when he slams into the wall, or walks himself into a corner? We need to take into account these situations, and how to handle them in our algorithm. First, let us define every square on the board:

+-----------------+
|aTTTTTTTTTTTTTTTb|   a = NW corner
|LMMMMMMMMMMMMMMMR|   b = NE corner
|LMMMMMMMMMMMMMMMR|   c = SW corner
|LMMMMMMMMMMMMMMMR|   d = SE corner
|LMMMMMMMMMMMMMMMR|   T = Top edge
|LMMMMMMMMMMMMMMMR|   B = Bottom edge
|LMMMMMMMMMMMMMMMR|   R = Right edge
|LMMMMMMMMMMMMMMMR|   L = Left edge
|cBBBBBBBBBBBBBBBd|   M = Middle pos.
+-----------------+

Now, let us define every move for every square on the board:

Pos Bits Heading Adjusted Offset
a 00 NW No Move 0
01 NE E +1
10 SW S +17
11 SE SE +18
b 00 NW W -1
01 NE No Move 0
10 SW SW +16
11 SE S +17
c 00 NW N -17
01 NE NE -16
10 SW No Move 0
11 SE E +1
d 00 NW NW -18
01 NE N -17
10 SW W -1
11 SE No Move 0
T 00 NW W -1
01 NE E +1
10 SW SW +16
11 SE SE +18
B 00 NW NW -18
01 NE NE -16
10 SW W -1
11 SE E +1
R 00 NW NW -18
01 NE N -17
10 SW SW +16
11 SE S +17
L 00 NW N -17
01 NE NE -16
10 SW S +17
11 SE SE +18
M 00 NW NW -18
01 NE NE -16
10 SW SW +16
11 SE SE +18

How much of the board will our bishop walk? Well, with our fingerprints having a 16-byte output, that means there are 64 total moves the bishop can walk. As such, the most board a bishop could cover, is if each square was only visited once. Thus 65/153 ~= 42.48%, which is less than half of the board.

Position Values
Remember that our bishop is making a random walk around the room, dropping coins on every square he's visited. If he's visited a square in the room more than once, we need a way to represent that in the art. As such, we will use a different ASCII character as the count increases.

Unfortunately, in my opinion, I think the OpenSSH developers picked the wrong characters. They mention in their PDF that the intention of the characters they picked, was to increase the density of the characters as the visitation count to a square increases. Personally, I don't think these developers have spent much time working on ASCII art. Had I been on the development team, I would have picked a different set. However, here is the set they picked for each count:

Freq 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Char . o + = * B O X @ % & # / ^ S E

The special characters "S" and "E" are to identify the starting and ending location of the bishop respectively.

Additional Thoughts
Now that you know how the random art is generated for a given key, you can begin to ask yourself some questions:

  • When addressing picture collisions, how many fingerprints produce the same picture (same values for all positions)?
  • How many fingerprints produce the same shape (same visited squares with different values)?
  • How many different visualizations can the algorithm produce?
  • Can a fingerprint be derived by looking only at the random art?
  • How many different visualizations can a person easily distinguish?
  • What happens to the visualizations when changing the board size, either smaller or larger?
  • What visualizations can be produced with different chess pieces? How would the rules change?
  • What visualizations can be produced if the board were a torus (like Pac-man)?
  • Could this be extended to other fingerprints, such as OpenPGP keys? If so, could it simplify verifying keys at keysigning parties (and as a result, speed them up)?

Conclusion
Even though this post discussed the algorithm in generating the random art, it did not address the security models of those visualizations. There are many questions that need answers. Obviously, there are collisions. So, how many collisions are there? Can it be discovered to predict the number of visual collisions based on the cryptographic hash, and size of the board? Does this weaken security when verifying OpenSSH keys, and if so, how?

These questions, and many others, should be addressed. But, for the time being, it seems to be working "well enough", and most people using OpenSSH probably only ever see the visualization when creating their own private and public key pair. You can enable it for every OpenSSH server you connect to, by setting "VisualHostKey yes" in your ssh_config(5).

In the meantime, I'll be working on a Python implementation for this on OpenPGP keys. It will use a larger board (11x19), and a different set of characters for the output, but the algorithm will remain the same. I'm interested to see if this can improve verifying people have the right OpenPGP key, by just checking the ASCII art, rather than reading out a 20-byte random string using the NATO alphabet. Keep an eye on https://github.com/atoponce/scripts/blob/master/art.py.

Strengthen Your Private Encrypted SSH Keys

Recently, on Hacker News, a post came through about improving the security of your encrypted private OpenSSH keys. I want to re-blog that post here (I'm actually jealous he blogged it first), in my own words, and provide a script at the end that will automate the process for you.

First off, Martin goes into great detail about the storage format of your unencrypted private OpenSSH keys. The unencrypted key is stored in a format known as Abstract Syntax Notation One (ASN.1) (for you web nerds, it's similar in function to JSON). However, when you encrypt the key with your passphrase, it is no longer valid ASN.1. So, Martin then takes you through the process of how the key is encrypted. The big take-away from that introduction is the following, that by default:

  • Encrypted OpenSSH keys use MD5- a horribly broken cryptographic hash.
  • OpenSSH keys are encrypted with AES-128-CBC, which is fast, fast, fast.

It would be nice if our OpenSSH keys used a stronger cryptographic hash like SHA1, SHA2 or SHA3 in the encryption process, rather than MD5. Further, it would be nice if we could cause attackers who get our private encrypted OpenSSH keys to expend more computing resources when trying to brute force our passphrase. So, rather than using the speedy AES algorithm, how about 3DES or Blowfish?

This is where PKCS#8 comes into play. "PKCS" stands for "Public-key cryptography standards". There are currently 15 standards, with 2 withdrawn and 2 under development. Standard #8 defines how private key certificates are to be handled, both in unencrypted and encrypted form. Because OpenSSH use public key cryptography, and private keys are stored, it would be nice if it adhered to the standard. Turns out, it does. From the ssh-keygen(1) man page:

     -m key_format
             Specify a key format for the -i (import) or -e (export) conver‐
             sion options.  The supported key formats are: “RFC4716” (RFC
             4716/SSH2 public or private key), “PKCS8” (PEM PKCS8 public key)
             or “PEM” (PEM public key).  The default conversion format is
             “RFC4716”.

As mentioned, the supported key formats are RFC4716, PKCS8 and PEM. Seeing as though PKCS#8 is supported, it seems like we can take advantage of it in OpenSSH. So, the question then comes, what does PKCS#8 offer me in terms of security that I don't already have? Well, Martin answers this question in his post as well. Turns out, there are 2 versions of PKCS#8 that we need to address:

  • The version 1 option specifies a PKCS#5 v1.5 or PKCS#12 algorithm to use. These algorithms only offer 56-bits of protection, since they both use DES.
  • The version 2 option specifies that PKCS#5 v2.0 algorithms are used which can use any encryption algorithm such as 168 bit triple DES or 128 bit RC2.

As I mentioned earlier, we want SHA1 (or better) and 3DES (or slower). Turns out, the OpenSSL implementation of PKCS#8 version 2 uses the following algorithms:

  • PBE-SHA1-RC4-128
  • PBE-SHA1-RC4-40
  • PBE-SHA1-3DES
  • PBE-SHA1-2DES
  • PBE-SHA1-RC2-128
  • PBE-SHA1-RC2-40

PBE-SHA1-3DES is our target. So, the only question remaining, is can we convert our private OpenSSH keys to this format? If so, how? Well, because OpenSSH relies heavily on OpenSSL, we can use the openssl(1) utility to make the conversion to the new format, and due to the ssh-keygen(1) manpage quoted above, we know OpenSSH supports the PKCS#8 format for our private keys, so we should be good.

Before we go further though, why 3DES? Why not stick with the default AES? DES is slow, slow, slow. 3DES is DES chained together 3 times. Compared to AES, it's a snail racing a hare. With 3DES, the data is encrypted with a first 56-bit DES key, then encrypted with a second 56-bit DES key, the finally encrypted with a third 56-bit DES key. The result is an output that has 168-bits of security. There are no known practical attacks against 3DES, and NIST considers it secure through 2030. It's certainly appropriate to use as an encrypted storage for our private OpenSSH keys.

To convert our private key, all we need to do is rename it, run openssl(1) on the keys, then test. Here are the steps:

$ mv ~/.ssh/id_rsa{,.old}
$ umask 0077
$ openssl pkcs8 -topk8 -v2 des3 -in ~/.ssh/id_rsa.old -out ~/.ssh/id_rsa     # dsa and ecdsa are also supported

Now login to a remote OpenSSH server where the public portion of that key is installed, and see if it works. If so, remove the old key. To simplify the process, I created a script where you provide your private OpenSSH key as an argument, and it does the conversion for you. You can find that script at https://github.com/atoponce/scripts/blob/master/ssh-to-pkcs8.zsh

What's the point? Basically, you should think of it the following way:

  • We're using SHA1 rather than MD5 as part of the encryption process.
  • By using 3DES rather than AES, we've slowed down brute force attacks to a crawl. This should buy us 2-3 extra characters of entropy in our passphrase.
  • Using PKCS#8 gives us the flexibility to use other algorithms in the future, as old ones are replaced.

I agree with Martin that it's a shame OpenSSH isn't using this by default. Why stick with the original OpenSSH storage format? Compatibility isn't a concern, as the support relies solely on the client, not the server. Because every client should have a different keypair installed, there is no worry about new versus old client. Extra security is purchased through the use of SHA1 and 3DES. Computing time to create the keys was trivial, and the performance difference when using them is not noticeable compared to the traditional format. Of course, if your passphrase protecting your keys is strong, with lots and lots of entropy, then an attacker will be foiled with a brute force attack anyway. Regardless, why not make it more difficult for him by slowing him down?

Martin's post is a great read, and as such, I've converted my OpenSSH keys to the new format. I'd encourage you to do the same.

ZFS Administration, Appendix B- Using USB Drives

Table of Contents

Zpool Administration ZFS Administration Appendices
0. Install ZFS on Debian GNU/Linux 9. Copy-on-write A. Visualizing The ZFS Intent Log (ZIL)
1. VDEVs 10. Creating Filesystems B. Using USB Drives
2. RAIDZ 11. Compression and Deduplication C. Why You Should Use ECC RAM
3. The ZFS Intent Log (ZIL) 12. Snapshots and Clones D. The True Cost Of Deduplication
4. The Adjustable Replacement Cache (ARC) 13. Sending and Receiving Filesystems
5. Exporting and Importing Storage Pools 14. ZVOLs
6. Scrub and Resilver 15. iSCSI, NFS and Samba
7. Getting and Setting Properties 16. Getting and Setting Properties
8. Best Practices and Caveats 17. Best Practices and Caveats

Introduction

This comes from the "why didn't I think of this before?!" department. I have lying around my home and office a ton of USB 2.0 thumb drives. I have six 16GB drives and eight 8GB drives. So, 14 drives in total. I have two hypervisors in a GlusterFS storage cluster, and I just happen to have two USB squids, that support 7 USB drives each. Perfect! So, why not put these to good use, and add them as L2ARC devices to my pool?

Disclaimer

USB 2.0 is limited to 40 MBps per controller. A standard 7200 RPM hard drive can do 100 MBps. So, adding USB 2.0 drives to your pool as a cache is not going to increase the read bandwidth. At least not for large sequential reads. However, the seek latency of a NAND flash device is typically around 1 milliseconds to 3 milliseconds, whereas a platter HDD is around 12 milliseconds. If you do a lot of small random IO, like I do, then your USB drives will actually provide an overall performance increase that HDDs cannot provide.

Also, because there are no moving parts with NAND flash, this is less data that needs to be read from the HDD, which means less movement of the actuator arm, which means consuming less power in the long term. So, not only are they better for small random IO, they're saving you power at the same time! Yay for going green!

Lastly, the L2ARC should be read intensive. However, it can also be write intensive if you don't have enough room in your ARC and L2ARC to store all the requested data. If this is the case, you'll be constantly writing to your L2ARC. For USB drives without wear leveling algorithms, you'll chew through the drive quickly, and it will be dead in no time. If this is your case, you could store only metadata, rather than the actual data block pages in the L2ARC. You can do this with the following:

# zfs set secondarycache=metadata pool

You can set this pool-wide, or per dataset. In the case outlined above, I would certainly do it pool-wide, which each dataset will inherit by default.

Implementation

To this up, it's rather straight forward. Just identify what the drives are, by using their unique identifiers, then add them to the pool:

# ls /dev/disk/by-id/usb-* | grep -v part
/dev/disk/by-id/usb-Kingston_DataTraveler_G3_0014780D8CEBEBC145E80163-0:0@
/dev/disk/by-id/usb-Kingston_DataTraveler_SE9_00187D0F567FEC2090007621-0:0@
/dev/disk/by-id/usb-Kingston_DataTraveler_SE9_00248121ABD5EC2070002E70-0:0@
/dev/disk/by-id/usb-Kingston_DataTraveler_SE9_00D0C9CE66A2EC2070002F04-0:0@
/dev/disk/by-id/usb-_USB_DISK_Pro_070B2605FA99D033-0:0@
/dev/disk/by-id/usb-_USB_DISK_Pro_070B2607A029C562-0:0@
/dev/disk/by-id/usb-_USB_DISK_Pro_070B2608976BFD58-0:0@

So, there are my seven drives that I outlined at the beginning of the post. So, to add them to the system as L2ARC drives, just run the following command:

# zpool add -f pool cache usb-Kingston_DataTraveler_G3_0014780D8CEBEBC145E80163-0:0\
usb-Kingston_DataTraveler_SE9_00187D0F567FEC2090007621-0:0\
usb-Kingston_DataTraveler_SE9_00248121ABD5EC2070002E70-0:0\
usb-Kingston_DataTraveler_SE9_00D0C9CE66A2EC2070002F04-0:0\
usb-_USB_DISK_Pro_070B2605FA99D033-0:0\
usb-_USB_DISK_Pro_070B2607A029C562-0:0\
usb-_USB_DISK_Pro_070B2608976BFD58-0:0

Of course, these are the unique identifiers for my USB drives. Change them as necessary for your drives. Now that they are installed, are they filling up?

# zpool iostat -v
pool                                                          alloc   free   read  write   read  write
------------------------------------------------------------  -----  -----  -----  -----  -----  -----
pool                                                           695G  1.13T     21     59  53.6K   457K
  mirror                                                       349G   579G     10     28  25.2K   220K
    ata-ST1000DM003-9YN162_S1D1TM4J                               -      -      4     21  25.8K   267K
    ata-WDC_WD10EARS-00Y5B1_WD-WMAV50708780                       -      -      4     21  27.9K   267K
  mirror                                                       347G   581G     11     30  28.3K   237K
    ata-WDC_WD10EARS-00Y5B1_WD-WMAV50713154                       -      -      4     22  16.7K   238K
    ata-WDC_WD10EARS-00Y5B1_WD-WMAV50710024                       -      -      4     22  19.4K   238K
logs                                                              -      -      -      -      -      -
  mirror                                                         4K  1016M      0      0      0      0
    ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part1                  -      -      0      0      0      0
    ata-OCZ-REVODRIVE_OCZ-X5RG0EIY7MN7676K-part1                  -      -      0      0      0      0
cache                                                             -      -      -      -      -      -
  ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part2                52.2G    16M      4      2  51.3K   291K
  ata-OCZ-REVODRIVE_OCZ-X5RG0EIY7MN7676K-part2                52.2G    16M      4      2  52.6K   293K
  usb-Kingston_DataTraveler_G3_0014780D8CEBEBC145E80163-0:0    465M  6.80G      0      0    319  72.8K
  usb-Kingston_DataTraveler_SE9_00187D0F567FEC2090007621-0:0  1.02G  13.5G      0      0  1.58K  63.0K
  usb-Kingston_DataTraveler_SE9_00248121ABD5EC2070002E70-0:0  1.17G  13.4G      0      0    844  72.3K
  usb-Kingston_DataTraveler_SE9_00D0C9CE66A2EC2070002F04-0:0   990M  13.6G      0      0  1.02K  59.9K
  usb-_USB_DISK_Pro_070B2605FA99D033-0:0                      1.08G  6.36G      0      0  1.18K  67.0K
  usb-_USB_DISK_Pro_070B2607A029C562-0:0                      1.76G  5.68G      0      1  2.48K   109K
  usb-_USB_DISK_Pro_070B2608976BFD58-0:0                      1.20G  6.24G      0      0    530  38.8K
------------------------------------------------------------  -----  -----  -----  -----  -----  -----

Something important to understand here, is the drives do not need to be all the same size. You can mix and match as you have on hand. Of course, the more space you can give to the cache, the better off you'll be.

Conclusion

While this certainly isn't designed for speed, it can be used for lower random IO latencies, and it well reduce power in the datacenter. Further, what else are you going to do with those USB devices just lying around? Might as well put them to good use. Definitely seeing as though "the cloud" is making it trivial to get all of your files online.

Switch to our mobile site