Image of the glider from the Game of Life by John Conway
Skip to content

How Much Swap?

I've been in the UNIX and GNU/Linux world since 1999. Back then, hard drives were barely passing double digits in GB, and RAM was PC100 speed at roughly 128 MB max. In fact, it wasn't uncommon for most systems to have 32 MB of RAM with an 8 GB hard drive. And we ran GNOME, which had barely released, and KDE on these machines!

However, when installing the operating system, is was a general rule of thumb that the size of your swap file should be 2x the amount of RAM. So for a 32 MB RAM system, this meant dedicating 64 MB of disk to swap. This wasn't a big deal back then. After all, hard drives were 8-10 GB in size. What's 64 MB?

Fast forward just a few years, and it wasn't long before this recommendation became unreasonable. After all, if you had 4 GB of RAM in your laptop (which I had in my Lenovo T61 for years) and a 100 GB hard drive, this meant dedicating 8 GB of hard drive space to swap- roughly 1/10 the drive size! So, the recommendation shifted a bit to 1.5x the amount of RAM, or 6 GB of swap in my T61.

Now look at to today with DDR3 RAM sizes and the popularity of SSDs. My server has 32 GB of RAM, and 2x60 GB SSD drives in a Linux software RAID mirror. So, given the recommendation of 2x the amount of RAM, this would mean 64 GB of swap on a 60 GB disk. Given the recommendation of 1.5x the amount of RAM, this would mean 48 GB of swap on a 60 GB disk. Something tells me we need to re-evaluate the recommendation.

When I started training for Guru Labs, I started re-evaluating the recommendation. In fact, I started debating if I even NEEDED swap. After all, swap is nothing more than a physical extension of the installed RAM on your system, it just resides on disk, which is slow. So, for my laptop, I didn't bother with it. Instead, I upgraded my RAM to 8 GB, which the board supported. I then started telling students my own personal recommendation on swap sizes:

Unless you have an application that has advanced swapping algorithms, and needs more swap, 1-2 GB of swap should be more than sufficient for most situations. Otherwise, install more RAM, and don't even bother with it at all.

Now, there are some caveats. If you would like to hibernate your system, you may need as much swap as you have RAM, seeing as though hibernation implementations typically flush all of the contents of RAM into the swap/paging file. This could be useful for laptops. Also, there are some software applications out there which can swap in and out very efficiently, leaving active RAM available for the running software. The Oracle database software comes to mind here.

Swap can be part of a healthy system, and having it installed isn't "a bad thing", unless the system is thrashing of course. But it also isn't required, which I think some people aren't aware. You don't NEED swap. It's just like the laundry basket in your house. Rather than leaving dirty clothes all over the house, you can put them neatly in one place, out of the way. But the laundry basket isn't required to do the laundry.

Announcing Hundun

Per my last post, I decided to setup an entropy server that the community could use. So, I've done just that. That server uses 5 entropy keys from Simtec Electronics in the U.K. as its hardware true random number generators. It hands out high quality randomness for the most critical cryptographic applications. The purpose, is to keep your entropy pools filled.

The server was inspired by http://random.org. There, he uses atmospheric noise to create true randomness, and provides a convenient web site to play games with this randomness, such as rolling dice. He does provide an API that you can take advantage of for automated clients to use the randomness. I decided to take a different approach, and give you the raw bits themselves, and let you do with them as you please.

If you're interested in these raw bits, visit http://hundun.ae7.st. The web page is very minimalist right now. I'll get to that later. However, it contains all the instructions necessary to take advantage of those random bits. The bits are encrypted on the wire using stunnel, to ensure their integrity.

If this is useful to you, feel free to comment below, drop me a line via email on that site, or using any of the methods found at http://pthree.org/author-colophon, and let me know.

The Entropy Server

With my last post about the entropy key hardware true random number generator (TRNG), I was curious if I could set this up as a server. Basically, bind to a port that spits out true random bits over the internet, and allow clients to connect to it to fill their own entropy pools. One of the reasons I chose the entropy key from Simtec was because they provide Free Software client and server applications to make this possible.

So, I had a mission- make it happen and cost effective as possible. To be truly cost effective, the server the keys are plugged into cannot consume a great amount of wattage. It should be headless, not needing a monitor. It shouldn't be a large initial cost up front either. The application should be lightweight, and the client should only request the bits when needed, rather than consuming a constant stream. Thankfully, that last requirement is already met in the client software provided by Simtec. However, to reach the other points was a bit of a challenge, until I stumbled upon the Raspberry Pi. For only $35, and consuming no more than 1Whr, I found my target.

Unfortunately, the "Raspbian" operating system uses some binary blobs as part of the ARM architecture where source code is not available. Further, they chose HDMI over the cheaper and less proprietary DisplayPort as the digital video out. However, it's only $35, so meh. After getting the operating system installed on the SD card, I installed ekeyd per my previous post, and setup the entropy keys. Now, at this point, it's not bound to a TCP port. That needs to be enabled. Further, if binding to a TCP port, the data will be unencrypted. Because the data is whitened and true random noise, it would prefer to keep it that way, and not hav ethe data biased on the wire. So, I would also need to setup stunnel.

First, to install the packages:

$ sudo aptitude install ekeyd stunnel4

Now, you'll also need to setup your entropy keys. That won't be covered here. You do need to configure ekeyd to bind to a port on localhost. We'll use stunnel to bind to an external port. Edit the /etc/entropykey/ekeyd.conf, and comment out the following line:

TCPControlSocket "1234"

The default port of 1234 is fine, as it's local. If it's already in use, you may want to choose something else. Whatever you do use, this is what stunnel will connect to. So, let's edit the /etc/stunnel/stunnel.conf file, and setup the connection. To make sure you understand this, stunnel will be acting as a client to ekeyd. Further, stunnel will be acting as a server to the network. Stunnel clients will be connecting on this external port.

cert = /etc/stunnel/ekeyd.pem
[ekeyd]
accept=8086
connect=1234

This configuration says that stunnel will connect locally on port 1234 and serve the resulting data on port 8086, encrypted with the /etc/stunnel/ekeyd.pem SSL certificate. Notice that we are actually using a PEM key and certificate for handling the encrypted bits. This can be signed by a CA authority for your domain already, or it can be self-signed. In my case, I went with self-signed and issued the following command, making the certificate good for 10 years:

# openssl req -new -out mail.pem -keyout /etc/stunnel/ekeyd.pem -nodes -x509 -days 3650

When finished with creating the SSL certificate, we are ready to start serving the bits. Start up the ekeyd server, then the stunnel one:

$ sudo /etc/init.d/ekeyd restart
$ sude /etc/init.d/stunnel4 restart

You can now verify that everything is setup correctly by verifying the connections on the box. You should see both ports bound, and waiting for connections based on our configuration above:

$ netstat -tan | awk '/LISTEN/ && /(8086|1234)/'
tcp        0      0 127.0.0.1:1234          0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:8086            0.0.0.0:*               LISTEN

At this point, the only thing left to do is to poke a hole in your firewall for port 8086 (not port 1234), and allow stunnel clients to connect. The clients will also need to install the ekeyd-egd-linux and stunnel4 packages. A separate post on this is forth-coming.

Obsure Email Addresses In HTML

I recently put up a web page with my email address. I'm confident in email provider's ability to filter spam, so I don't worry about it too much, to be honest. However, I started thinking about different ways I could obscure the email address in the source. Of course, this isn't offering any sort of security, and any bot worth its weight in spam, will have functions to detect the obscurity, and get to the address. Regardless, I figured this would be an interesting problem. Here are some ways of obscuring it I thought up quickly:

  • Replace the '@' and '.' characters.
  • Use an image.
  • Use plus-addressing.
  • Add non-sensical HTML in the source. IE: aa<i></i>ron@foo<u></u>.com.
  • Crafty CSS tricks.
  • Crafty JavaScript tricks.
  • Use a contact form and POST.
  • Obfuscate using ASCII values.
  • Some crazy combination of the above.

I'm sure there are other ways, some which may be more effective than others. However, it seemed easy enoguh to obscure the email using ASCII obfuscation. Further, it's trivial to code in Python. Case in point, suppose I'm in the Python REPL:

>>> import sys
>>> for char in 'aaron@example.com':
...    sys.stdout.write('&{};'.format(ord(char)))
&97;&97;&114;&111;&110;&64;&101;&120;&97;&109;&112;&108;&101;&46;&99;&111;&109;>>>

Add the above string to your HTML, and the browser will display the valid ASCII characters, even though the code is using the ASCII values. Again, as already mentioned, I'm not expecting this to provide any sort of security, but I would be willing to bet that most spam bots aren't as sophisticated as you would like to think. This may just do the trick at fooling some of them. It may not. But, I have full faith in my mail provider to properly identify spam, and send it to my spam folder. So whether I put the raw email address in the form, obscure it with ASCII values, or use fancy CSS/JavaScript, it doesn't matter.

The Entropy Key

Recently, I purchased 5 entropy keys from http://entropykey.co.uk. They are hardware true random number generators using reverse bias P-N junctions. Basically, they time when electrons jump a depeltion zone in the junction, where a voltage is applied to the point of near breakdown. Basically, taking advantage of the random characteristics of quantum mechanics.

There are a lot of really interesting things about the keys that brought me to purchase them. First, is the fact that they use two junctions along with the XOR function to test for corellations. If corellations exist, then the keys shutdown. Second, they use skein 256 internally as an encryption mechanism to deliver the random bits to the kernel, where they are decrypted. The bits are encrypted to ensure that an attacker on the box cannot manipulate the bits. Third, the keys are protected physically against tampering. If the temperature is outside of operating range, the keys will shut down, and if the keys are opened, epoxy will spill out on the board, shorting the board, preventing operation.

Tollef Fog Heen is seeing about 4 KBps with his key. I hooked my keys up to my Raspberry Pi, and I'm only seeing about 2 KBps per key. However, when connected to my T61, or other computers, I see about 3.5-4 KBps. Installing haveged along side with the entropy keys on my Raspberry Pi, and I have a total throughput of about 150 KBps of random data.

To setup the entropy key, you will receive a package with a DVD, the entropy key, and a paper for the master password to decrypt the bits from the key. To get the keys setup, you will need to match the key ID "EKXXXXX" with the same ID on the paper, if you have multiple keys. Then, in Debian/Ubuntu, you can install the daemon to talk to the keys:

sudo aptitude install ekeyd

You will now have the userspace utility to talk to the keys. As you plugin each key into your computer, the ekeyd daemon will setup a device in /dev/entropykey/. My five keys show me:

$ ls -l /dev/entropykey/
total 0
lrwxrwxrwx 1 root root 10 Oct  4 07:04 Sf9sBkiDUkkSGBSH -> ../ttyACM0
lrwxrwxrwx 1 root root 10 Oct  4 07:04 Sf9sBkiDUkkTJRSH -> ../ttyACM4
lrwxrwxrwx 1 root root 10 Oct  4 07:04 Sf9tBkiDUkkHNBWH -> ../ttyACM1
lrwxrwxrwx 1 root root 10 Oct  4 07:04 Sf9tBkiDUkkJOBWH -> ../ttyACM2
lrwxrwxrwx 1 root root 10 Oct  4 07:04 Sf9vBkiDUkkTNBSH -> ../ttyACM3

Further, when using the ekeydctl userspace utility, you can see the status of each key:

$ ekeydctl list
NR,OK,Status,Path,SerialNo
1,NO,Long-Term-Key is bad,/dev/entropykey/Sf9sBkiDUkkSGBSH,Sf9sBkiDUkkSGBSH
2,NO,Long-Term-Key is bad,/dev/entropykey/Sf9tBkiDUkkHNBWH,Sf9tBkiDUkkHNBWH
3,NO,Long-Term-Key is bad,/dev/entropykey/Sf9tBkiDUkkJOBWH,Sf9tBkiDUkkJOBWH
4,NO,Long-Term-Key is bad,/dev/entropykey/Sf9vBkiDUkkTNBSH,Sf9vBkiDUkkTNBSH
5,NO,Long-Term-Key is bad,/dev/entropykey/Sf9sBkiDUkkTJRSH,Sf9sBkiDUkkTJRSH

So, we need to setup the keys, and make them live, so we can start using the entropy that comes from them. By using the ekey-rekey userspace utility, I can setup each key with the master psasword sent on my card:

# ekey-rekey 'Sf9sBkiDUkkSGBSH' 'ajx1 b52m cHvd d51n e4tS fYPs g2p3 xDAZ IeYf jCYM kqWi'

Of course, I changed my master key in the example above. I only wish to show that the spaces are important when setting up your keys. Do this for each of your keys, and you should see something to the effect:

NR,OK,Status,Path,SerialNo
1,YES,RUNNING OK,/dev/entropykey/Sf9sBkiDUkkSGBSH,Sf9sBkiDUkkSGBSH
2,YES,RUNNING OK,/dev/entropykey/Sf9tBkiDUkkHNBWH,Sf9tBkiDUkkHNBWH
3,YES,RUNNING OK,/dev/entropykey/Sf9tBkiDUkkJOBWH,Sf9tBkiDUkkJOBWH
4,YES,RUNNING OK,/dev/entropykey/Sf9vBkiDUkkTNBSH,Sf9vBkiDUkkTNBSH
5,YES,RUNNING OK,/dev/entropykey/Sf9sBkiDUkkTJRSH,Sf9sBkiDUkkTJRSH

Your entropy pool is now being filled with real, true, random numbers. You can test this by looking at the raw random data, and when exhausting the pool, you can see how quickly it fills:

$ dd if=/dev/random count=1 | xxd
0+1 records in
0+1 records out
128 bytes (128 B) copied, 8.6482e-05 s, 1.5 MB/s
0000000: d932 fc37 089e 4229 81cd d433 4e62 472a  .2.7..B)...3NbG*
0000010: 3383 1f64 9b33 5797 f001 aa9a b15d 6581  3..d.3W......]e.
0000020: 758f cb1c a797 a39a 37c8 db67 ae0b ff19  u.......7..g....
0000030: bf0e 891d 702e 2f58 cfd8 963d e499 13db  ....p./X...=....
0000040: 5f48 f7d3 cdcc 2e52 e2fc 4685 ad38 68bd  _H.....R..F..8h.
0000050: 6de3 917b 4627 4695 3371 3335 9304 0f7a  m..{F'F.3q35...z
0000060: a540 62aa 01a6 1006 84b2 1cb5 23ce 790e  .@b.........#.y.
0000070: 12fb 8edc 78a2 13bf 1780 eb7e 1fbf a400  ....x......~....
$ pv -a < /dev/random > /dev/null
[18.5 kB/s]

Lastly, it's probably a good idea to test the quality of bias existing in the random stream using the dieharder suite of tests, or the FIPS 140-2 tests with rngtest. I have run both. The tests are slow, so let them run for a while to collect a lot of data. However, after about 3-4 minutes, here is the output from rngtest. You will certainly want to let it run longer, like an hour or so:

# rngtest < /dev/random 
rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
^Crngtest: bits received from input: 27814504
rngtest: FIPS 140-2 successes: 1389
rngtest: FIPS 140-2 failures: 1
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 1
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=34.735; avg=79.156; max=675.401)Kibits/s
rngtest: FIPS tests speed: (min=968.716; avg=4301.815; max=5622.121)Kibits/s
rngtest: Program run time: 349417245 microsecond

Automated Diceware Passwords

For those unfamiliar, Diceware.com is a way of picking truly random passphrases from a predefined dictionary list of words. The idea is that each word has a 5 digit number attached to it. Each digit in the number holds the values 1-6, in numerical order. So, the first password starts with 11111, then 11112, 11113, 11114, 11115, 11116, then moving to 11121, etc. There are 7,776 words in the list. You can find the list at http://world.std.com/~reinhold/diceware.wordlist.asc. Now, take 5 fair 6-sided dice, and throw them one at a time. Take note of the numbers that fall out. After all 5 are thrown, find the corrensponding word in the dictionary list. This becomes the first word of your passphrase. Continue in like manner 5, 6 or more times, until you have a long passphrase.

For example, suppose you rolled in succession, 12345 64213 43526 13243 44615. In this case, your passphrase would be "apathywildninebalepabst". If you're curious about the entropy size this password belongs to, you can calculate it this way:

H = L * log2(N)
H = entropy in binary bits
L = length of your password
log2(N) = log(n)/log(2)
N = full size of the set(s) in your passwords

So N in our case is 7776, seeing as though each word belongs to that set, and is equally likely to appear and L = 5 (the "length" of your passphrase, in this case counting the number of words you picked). This, your entropy is 64 bits. In fact, each word in the diceware list comes with about 12.9248 bits itself. 64 bits is okay, but 77 bits, or 6 words, is much better. For comparison sake, compare http://stats.distributed.net/projects.php?project_id=8, which is a distributed computing project working at brute forcing a 72-bit entropy private RSA key. Look at their current pace of ~390 billion keys per second. To have a 100% guarantee they have found the key, it will take about 250 years to exhaust the entire key space at that pace, which is mostly GPU clients. The $500 video cards on the market today can do about 5 billion RSA keys per second. Thus, 78 of those cards, at a cost of $39,000 would maintain the pace that distributed computing project is seeing.

So, seeing as though diceware is a great way to create awesome passwords, I thought "why not develop this in a shell script?" So, I did so. See the script below. It does require that the diceware word liste in the same directory as the shell script. It can take a numeric argument at the number of words that should be included into the passphrase, or it can be run without an argument, in which case it will default to 6 words (77 bits of entropy).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/zsh

# ZSH script to create true random diceware passphrases. Requires
# diceware.wordlist.asc to be present in the same directory as the script.
# Can be found at http://world.std.com/~reinhold/diceware.wordlist.asc
#
# Author: Aaron Toponce <aaron.toponce@gmail.com>
# Date: Sept 20, 2012
# License: Public Domain

BASEDIR="$(echo "${0%/*}")"
WORDLIST="$BASEDIR/diceware.wordlist.asc"

if [[ ! -f "$WORDLIST" ]]; then
    echo "The diceware.wordlist.asc file must be present in the same"
    echo "directory as the diceware.zsh script."
    echo # Blank line
    echo "http://world.std.com/~reinhold/diceware.wordlist.asc"
    exit 1
fi

# Function to generate each Diceware word from the list
function five-dice-roll {
    echo -n $(< /dev/random tr -dc 1-6 | head -c 5)
}

# Function to find the Diceware word based on our dice roll
function diceware-word {
    awk "/$(five-dice-roll)/ {print \$2}" "$WORDLIST"
}

if [[ "$1" = <-> ]]; then NUM="$1"; else NUM=6; fi

for i in {1.."$NUM"}; do
    DICEPASS="${DICEPASS}$(diceware-word)"
done

echo "$DICEPASS"

Here is an example of using the shell script to generate 7 passphrases. And yes, it is using true random numbers for the fair 6-sided die, which is completely unweighted (better than you can say for any physical fabricated die):

$ for i in {1..7}; do echo -n "$i: "; ./diceware.zsh; done
1: remanerosphonytuftcopewaggle
2: paeandickalecktggoodell
3: nodalknickameshfpatpiotr
4: epiclostgamarabsiftmarx
5: pourpaxvalueaorta9ve
6: apexantileas'ssteelefrye
7: smeltgorseimpocwhiffgray

Haveged Continued

I noticed that on my machine, my entropy was staying high, then falling off. Then, at what appeared to be some arbitrary point, it would fill back up, in a very periodic manner. This is, of course, after running haveged in the background. Curious, I started looking into it. It took a while to find. Then noticed it. It was obvious. The "write_wakeup_threshold" is what is telling the daemon to fill the entropy pool with more data.

$ cat /proc/sys/kernel/random/write_wakeup_threshold
1024

This is default, after installing haveged. But, the poolsize is 4096. It sure would be nice if the write_wakeup_threshold was 4096, rather than 1024. Well, you have two options to set it: you can use sysctl, or you can use haveged. Let's look at both (I prefer the latter). With sysctl, you just need to edit the /etc/sysctl.conf file, and add the following lines:

## Keep the entropy at full up
kernel.random.write_wakeup_threshold = 4096

Then run:

# sysctl -p
kernel.random.write_wakeup_threshold = 4096

Or, haveged ships with a configuration file to set this automatically when the daemon starts, and this should probably be the preferred way for setting it. Change the /etc/default/haveged file to use 4096 instead of 1024:

# Configuration file for haveged

# Options to pass to haveged:
#   -w sets low entropy watermark (in bits)
DAEMON_ARGS="-w 4096"

Then restart haveged:

# /etc/init.d/haveged restart
 * Restarting entropy daemon haveged
    ...done.

Now, check your Munin graphs (or whatever), and notice that your entropy never deviates from full up. Rawk.

Haveged - A True Random Number Generator

I admit that my last post sucked. I've been working on a few things that I want to blog about, but it's going to take time to get all my ducks in a row. So, that post was mostly "filler". Read as "I haven't blogged in a while, and should probably put something up".

Sorry.

However, I finally have something you can sink your teeth into: true random number generation. Consider the following scenario: you wish to generate a long-term cryptographic key. Maybe an OpenPGP key, an SSH key, an SSL key, OTR, whatever. What sucks, is these utilities rely on the Linux kernel software device /dev/random as a source of true randomness to get their seed or numbers. However, /dev/random will block, or stop responding, when the system has run out of entropy. So, you sit and wait, maybe moving your mouse around and mashing your keyboard, to generate more entropy for the system.

I personally keep an encrypted file with all the passwords for all the servers, web sites, and accounts that I login to. On a regular basis, I'll rotate through the passwords. So, I typically will use my Password Card as a source for these passwords. But, suppose I wanted to use a password generator instead. The "apg" tool is a good solution.

On your system, run the following:

$ for i in {1..100}; do apg -a 0 -m 12 -x 16 -n 1; done

This will generate 100 passwords of varying length between 12 and 16 characters. Further, it will use a new seed for each of the newly generated passwords. Compare this to passing the '-n 100' switch, which will use a single seed for all 100.

You'll likely notice that the generation is slow. If you walk away from your computer, this could take 15 minutes, at least, probably much longer. If you babysit the process, you could probably get it finished in a minute or two. Regardless, it sucks.

We are draining our entropy pool faster than we can keep it filled. So, we must use external, unpredictable events, such as mouse movements, to generate entropy faster. Now, there are hardware true random number generators we could deploy, such as the Entropy Key from Simtec Electronics in the United Kingdom. However, there is a software daemon we can start that can use our existing hardware as a source of true randomness.

That software is haveged. I won't go into the nitty-gritty about haveged. However, I have run this against a slew of random number tests, and it has passed with flying colors every time. And, as you can write to the /dev/random device, haveged can keep the pool filled. So, the question is, how full can the pool be?

$ cat /proc/sys/kernel/random/poolsize
4096
$ cat /proc/sys/kernel/random/entropy_avail
109

So, by default, the entropy poolsize is 4096 bits, and I only have 109 bits available to the system. Not much. Haveged can fix this:

$ sudo aptitude install haveged
$ cat /proc/sys/kernel/random/entropy_avail
3966

Much better. Now, how quickly will my previous command of generating 100 password execute? With haveged running, it took about 2 seconds. And, our entropy remains filled.

Try doing this with generating a 4096-bit RSA GnuPG keypair. Without something keeping the entropy pool filled, this can take hours. With haveged running, I can generate it in a minute flat on this wimpy netbook.

Now, if you're skeptical of haveged, and you should be, you can test it against dieharder. However, it will take a while to run. I walked away and did other things while it was running. When comming back, I see that every test passed, except one, which came back as "WEAK". So, from my perspective, haveged produced very high quality random bits.

Now, what are the benefits? Do you run a lot of SSH connections to a single server? How about doing regular backups with rsync over SSH? Maybe you have a busy VPN or HTTPS server. It's not uncommon for these real world, common scenarios to produce timeouts or hang, because the entropy pool has been exhausted, the system must wait for more before continuing. Using haveged fixes this without issue. Even better, haveged barely uses system resources. Your idle ssh connection is likely adding more to the load than haveged. And it's true random bits.

A Messaging Hub

I figured I would throw up a quick post about what I'm doing with my IRC client, and how it's turned into the de-facto messaging hub for just about everything in my online life. If it could handle SMTP/IMAP and RSS, it would be a done deal.

My main client is WeeChat. It connects to my ZNC bouncer. I run it both locally and remotely behind Tmux (the remote client is useful for things like Windows and Mac OS X). In turn, ZNC connects to "standard" IRC servers such as Freenode, OFTC and XMission, as well as Bitlbee. Bitlbee in turn connects to my Google Talk account via XMPP and my Identica and TWitter accounts. Lastly, thanks to Google Voice, I can write an IRC bot that can both send and receive SMS notifications (code still in progress). This means I can interact with mobile phones through my IRC client.

So, in a nutshell, here are all the pieces put together:

  • WeeChat - Main IRC client which connects to my bouncer.
  • ZNC - IRC bouncer responsible for connecting to all the IRC servers.
  • Bitlbee - IRC to IM gateway responsible for XMPP, Twitter and Identica (also supports other chat protocols).
  • zmq_notify.rb - WeeChat Ruby script sending away highlights to a ZMQ socket.
  • ircsms - Python script that subscribes to the zmq_notify.rb ZMQ socket, and sends an email to mobile provider email-to-sms gateway.
  • gvbot - Google Voice IRC bot allowing direct SMS & voicemail interaction through Google Voice (code still in progress).

If you think about it, this means that I can interact with others using the following protocols:

  • IRC
  • XMPP (and everything else Bitlbee supports)
  • HTTP (Twitter/Identica)
  • SMS

Of course, a screenshot would be nice, but there are plenty of those online. Instead, why not just put it together? :)

IRC Notifications Over SMS

Recently, I pinged my boss about a networking question, and got the following response:

I am away but your message is being sent to my phone.

Well, there are a few things he could be doing here:

  1. Logged in 24/7 with a local IRC client on his phone. Easiest, but will drain his battery quickly.
  2. Using an IRC script to send away messages to email. Pull notifications could be slow.
  3. Using an IRC script to send away messages to SMS. Snappy push notifications.

I've done both #1 and #2, but have never attempted #3, so I thought I'd give it a go. That way, the alert would be the most responsive, and I could login to IRC with a local client on my phone to address the issue, if it's important enough.

Now, to be clear, I'm not using Irssi. This may come as a shock to many of you, but last April, I discovered ZNC and WeeChat, and I haven't looked back. I REALLY like this setup. So, that means no more Irssi posts for this blog. It's time for WeeChat posts. Hopefully, I can do as good of a job. Further, this post addresses a script that must be running in WeeChat, not ZNC.

First, install http://weechat.org/scripts/source/stable/zmq_notify.rb.html/ in WeeChat. If running Debian/Ubuntu, this will mean installing the "built-essential", "ruby1.8" and "ruby1.8-dev" packages, then running "gem install zmq" to get the 0mq Ruby modules installed. Then restart WeeChat (yes, this is necessary- trust me) and load the script, and you should be good to go. If not, troubleshoot.

Now, the script sends YAML through the 0mq socket. Unfortunately, the YAML is not syntactically correct, and it's delivering base64-encoded binary. Meh. We can handle that. So, we need to connect to the 0mq socket that the script sets up, parse the YAML, then send the message as an email to an email-to-sms gateway. If you have a mobile phone, then the major phone providers have likely already set this up for you: See https://en.wikipedia.org/wiki/List_of_SMS_gateways for a fairly comprehensive list.

So, what does the script look like?

UPDATE: I created a Github project: https://github.com/atoponce/ircsms

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/python

import base64
import email.utils
import re
import smtplib
import yaml
import zmq
from email.mime.text import MIMEText

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.setsockopt(zmq.SUBSCRIBE, '')
socket.connect('tcp://127.0.0.1:2428')

while True:
    f = open('/var/log/0mq.log','a')
    msg = socket.recv()
    msg = re.sub('\n:', '\n', msg)
    msg = re.sub('^---| !binary \|-\n','',msg)
    y = yaml.load(msg)

    f.write(msg)
    f.close()

    # Ignore client events that aren't PUBLIC
    if not y['tags']:
        continue

    server = base64.b64decode(y['server'])
    channel = base64.b64decode(y['channel'])
    nick = base64.b64decode(y['tags'][3])
    nick = re.sub('^nick_','',nick)
    message = base64.b64decode(y['message'])

    # If sending messages to the channel while away, it shows up as
    # "prefix_nick_white". This can change it to your nick.
    if nick == 'prefix_nick_white':
        nick = 'eightyeight'

    # Change your email-to-sms address as provided by your mobile provider
    fromaddr = 'weechat@irc.example.com'
    toaddr = '1234567890@messaging.sprintpcs.com'
    msg = MIMEText("{0}/{1}: <{2}> {3}".format(server, channel, nick, message))
    msg['To'] = email.utils.formataddr(('eightyeight', toaddr))
    msg['From'] = email.utils.formataddr(('WeeChat', fromaddr))

    s = smtplib.SMTP('localhost')
    s.sendmail(fromaddr, [toaddr], msg.as_string())
    s.quit()

Place the code in your /etc/rc.local, or create a valid init script for it, and you're ready to go. Next time you set your status to away on IRC, you'll get an SMS alert every time someone highlights you in an IRC channel, or your receive a private message. If you're running WeeChat behind a terminal multiplexor, like GNU Screen or tmux, then you could also install an away script that sets your status away automatically when you detach from your session. Of course, if WeeChat disconnects from your bouncer, then this won't do much good for you.

Here is an example of an SMS alert from a private message from a bot:

weechat@irc.example.com said:
Subject: IRC Notification
21:59:34 freenode/ibot: <ibot> for heaven's sake, eightyeight, don't do that!

There are some outstanding bugs, and I'm working them out. In the meantime, if this interests you, then it should get you started with basic functionality. Happy IRC over SMS!

The One-Time Pad Hard Drive

I devised a system to use the one-time pad (OTP) using nothing more than a hard drive. It goes something like this:

  1. Meet in person with identical size hard drives.
  2. Encrypt the hard drive.
  3. File the drive with random keys of incrementing size.
  4. Devise an alorithm for using the keys.
  5. Unmount the drive.
  6. Enjoy the OTP for encryption/decryption.
  7. Profit.

Step number 3 requires creating files of incrementing size 1 byte at a time from 1 byte until the drive is filled. If your drive is 8 GB in size, then this should be approximately 131,000 files, with the smallest file as 1 byte, and the largest file as 131,000 bytes. Further, the keys should be filled with cryptographically secure random data. Using /dev/random would be preferred, however it will block when entropy is depleted. Using a hardware true random number generator to keey the entropy pool filled, like the Entropy Key from Simtec Electronics would work. Last, the hard drives must have the exact same keys on both. So, it would probably be best to create the random keys on one drive first, then rsync its contents to the second drive.

Step number 4 is important, as you want to make sure that the recipient of your message uses the same keys you did to encrypt the message. So, an algorithm needs to be devised for using the keys. The OTP requires that the key be the same length or longer than the message wishing to be encrypted/decrypted. Because the keys are of incrementing size from 1 byte on up, you should be able to choose a key of matching size for your plaintext. If not, you can combine different sized keys until the full length of the message is met with the keys. That combination of keys becomes the OTP. So, it would probably be best to have a computer program or script find the right keys for the job. Thus, both the sender and recipient will be using the same keys. The OTP uses the XOR operation to encrypt and decrypt the messages. XOR works, because it completely undoes what is done, provided the same key is used on both messages, and it's fast and clean.

Step number 6 implies the encrypting and decrypting of messages using the OTP keys "out in the field". However, after the keys have been used, THEY MUST NEVER BE USED AGAIN. You could simply delete the file(s), or securely shred the file(s), if you have your tinfoil hat on. Regardless, their intent is to be thrown away after use. This is because if the key(s) are used more than once, then the key(s) can be derived from the multiple encrypted messages that shared the key(s). So, encrypt/decrypt the message, then remove the key(s) from the drive. After the keys have been used up on the hard drive, meet in person again to refill the drives.

This works, because Claude Shannon proved that the OTP contains perfect secrecy, meaning that there is no information contained in the ciphertext that will give you any clues as to how it was derived, such as no patterns or structures in the data. This means that the encrypted text cannot be decrypted unless the key is known. This assumes that the key is truly random, the key is never used again, and the secrecy of the key is kept in tact. It's clean, it works, and it's practical enough to use day-to-day. So, if you want to test out using the OTP to encrypt and decrypt messages, this is your tool.

Encrypted ZFS Filesystems On Linux

This is just a quick post about getting a fully kernel-space encrypted ZFS filesystem setup with GNU/Linux, while still keeping all the benefits of what ZFS offers. Rather than using dmcrypt and LUKS, which would bypass a lot of the features ZFS brings to the table, encryptfs is our ticket. The reason this is so elegant, is because Oracle has not released the source code to ZFS after version 28. Version 32 contains the code to create native ZFS encrypted filesystems. So, we need to rely on a 3rd party utility.

First, create your ZPOOL:

# zpool create rpool raidz1 sdb sdc sdd sde sdf

Then create your ZFS filesystem:

# zfs create rpool/private

Lastly, install the ecryptfs software, and make the encrypted filesystem by mounting it, and follow the prompts:

# mount -t ecryptfs /rpool/private /rpool/private
Select key type to use for newly created files: 
 1) tspi
 2) passphrase
Selection: 2
Passphrase: 
Select cipher: 
 1) aes: blocksize = 16; min keysize = 16; max keysize = 32
 2) blowfish: blocksize = 8; min keysize = 16; max keysize = 56
 3) des3_ede: blocksize = 8; min keysize = 24; max keysize = 24
 4) twofish: blocksize = 16; min keysize = 16; max keysize = 32
 5) cast6: blocksize = 16; min keysize = 16; max keysize = 32
 6) cast5: blocksize = 8; min keysize = 5; max keysize = 16
Selection [aes]: 
Select key bytes: 
 1) 16
 2) 32
 3) 24
Selection [16]: 
Enable plaintext passthrough (y/n) [n]: 
Enable filename encryption (y/n) [n]: y
Filename Encryption Key (FNEK) Signature [53aad9b192678a8a]: 
Attempting to mount with the following options:
  ecryptfs_unlink_sigs
  ecryptfs_fnek_sig=53aad9b192678a8a
  ecryptfs_key_bytes=16
  ecryptfs_cipher=aes
  ecryptfs_sig=53aad9b192678a8a
Mounted eCryptfs

Notice that I enabled filename encryption, as I don't want anyone getting any of my USB drives to decipher what I'm trying to hide. This will mount the encrypted filesystem "on top" of the ZFS filesystem, allowing you to keep all the COW and error correcting goodness, while keeping your data 100% safe:

# mount | grep rpool
rpool on /pool type zfs (rw,relatime,xattr)
rpool/private on /rpool/private type zfs (rw,relatime,xattr)
/rpool/private on /rpool/private type ecryptfs (rw,relatime,ecryptfs_fnek_sig...(snip))

Works like a charm.

Appropriate Use Of "kill -9 "

There are times when "kill -9" is the only time you can kill a PID that is behaving badly. However, it's usually not needed if you know your signals. When I encounter a badly behaving program, here is the procedure I usually take.

First, I'll send a SIGTERM (kill -15) to the PID. Sometimes this works, sometimes this doesn't. However, SIGTERM is a clean shutdown of the program. It will flush any data to disk that needs to be written, clean up the memory registers, and close the PID.

If that doesn't work, then I will send a SIGHUP (kill -1). This will generally cause the program to restart. Restarting the program will flush any data to disk that needs to be written, and cleans up the memory registers, then restarts the program.

If that doesn't work, then I will send a SIGINT (kill -2). This is an interrupt from the keyboard signal. This is equivalent to sending a CTRL-C to the PID. This is useful when the PID is not in the terminal foreground, but has been backgrounded as a daemon process.

If that doesn't work, then I will send a SIGSEGV (kill -11). This causes the program to experience a segmentation fault, and close the PID. It won't flush any data that needs to be written to disk, but it may create a core dump file that could be useful in debugging, on learning why the program was behaving the way it was. The logging facility for that core dump is through the kernel.

At this point, if none of the above signals have worked, only then will I issue a SIGKILL (kill -9). However, using this signal is potentially dangerous, as it is equivalent to ripping its feet out from under it. It will not sync any data, at all, to disk. No unwritten data, no debugging data, no logging data, no nothing. It's equivalent to using a sledge hammer to sink a nail.

Further, at this point, if I haven't been able to clean up the PID with the previous commands, then a SIGKILL usually won't clean up the PID either. Using SIGTERM, SIGHUP, SIGINT and SIGSEGV usually clean up 80% of the PIDs I wish to clean. When I need to issue SIGKILL, I'm met with maybe 50% success.

High Capacity Color Barcode

Microsoft Tag of http://pthree.org

I've been reading up on how to actually create physical QR Codes, complete with error correction. It's been very enlightening, and has grown a deeper fascination on the symbology of barcodes in general for me. While I'm not in 100% agreement with how they are currently being used (marketing, mailing lists, coupons, etc.), I do believe there does exist a market for mobile barcodes. Heck, I'm using them on this site for each and every post!

Regardless, I started reading up on High Capacity Color Barcodes that was developed by Gavin Jancke at Microsoft. It's important to separate HCCB codes from Microsoft Tag, which is a specific implementation of HCCB. There are some misconceptions about HCCB, and probably some things that you haven't thought about. Further, there are technical advantages that usually aren't mentioned. So, I'll discuss that here.

Here are some advantages that I think HCCB Has over QR Codes, and I personally think they hold quite a bit of weight:

  • Unlike Microsoft Tag, HCCB can store a lot of data offline. QR codes can do this as well.
  • HCCB codes can be digitally signed with either Elliptic Curve Cryptography (ECC) or RSA, and the reader should verify the signature (while this is technically possible with any code, current readers in the "app store" markets don't verify them).
  • The HCCB specification was designed to take advantage of poor cellphone autofocus lenses.
  • HCCB was also designed to take advantage of varying light conditions.
  • Due to the triangle shapes of the HCCB symbols, a black-and-white HCCB code can fit the same amount of data in a smaller space.
  • Due to the 4 and 8 color palette options, higher compression can be achieved with the same data in the physical HCCB code.
  • HCCB codes can store multiple payloads, such as a URL, text and a vcard in a single code.

Now, don't get me wrong. I'm not advocating HCCB codes by any stretch of the imagination. Here are a few major disadvantages:

  • The license to print HCCB codes, or develop readers to decode the codes, is 100% proprietary, and requires a Microsoft account.
  • Microsoft Tag, by far the most common HCCB implementation, requires a data connection to access the data in the code. Think of Microsoft Tags as a short URL redirector.
  • There are currently no HCCB decoding apps in any "app store" for the common mobile devices outside of the Microsoft Tag implementation.
  • HCCB code generation requires Microsoft libraries and software to create the codes. There are no 3rd party libraries, open source or otherwise.
  • Large-scale printing is an issue, due to offsetting the colors, so they align perfectly. This is also expensive, compared to black-and-white printing (Microsoft has also released a black-and-white version of the HCCB spec).
  • Due to 3rd party reliance on Microsoft software, most barcode decoding apps do not have the ability to decode HCCB codes, thus requiring multiple decoding apps to be installed on the device. This requires users to know the difference between barcodes, and which app to use for which code.

To me, a few of those disadvantages are heavy hitters. Now, Gavin has stated that they were not designed to replace, or even compete with QR code or other symbologies. Instead, it's trying to fill a niche market. I don't know if it's succeeding in that niche, or not. However, I do believe that it won't reach the general population on large scale terms, due to some of the disadvantages listed above. With that said, I am willing to give credit where credit is due, and recognize the technically superior specifications in the code. It was well researched, and well executed, and I'm bothered that there does not exist an "open source" specification that meets or exceeds these specs.

I have the Microsoft Tag reader installed on my phone to handle decoding them anyway.

Network Gotcha

So, we were just recently troubleshooting a connectivity issue at the office. Some things seemed slow, other things seemed fast. It was hard putting a finger on it. So, we starting pinging random stuff, trying to figure out exactly what was going on. Is it routing? Is in DNS? Is it layer 2? What gives?

Well, it turns out it was DNS. We were able to track it down, restart the DNS services, and everything was smoothing sailing after that. But, the interesting thing is, the GNU ping(1) utility was telling us this all along, and we ignored it. Consider the following ping(1) to this blog, and its relevant packet capture:

% ping -c 4 pthree.org              
PING pthree.org (166.70.136.38) 56(84) bytes of data.
64 bytes from tao.ae7.st (166.70.136.38): icmp_req=1 ttl=62 time=0.341 ms
64 bytes from tao.ae7.st (166.70.136.38): icmp_req=2 ttl=62 time=0.349 ms
64 bytes from tao.ae7.st (166.70.136.38): icmp_req=3 ttl=62 time=0.318 ms
64 bytes from tao.ae7.st (166.70.136.38): icmp_req=4 ttl=62 time=0.334 ms

--- pthree.org ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.318/0.335/0.349/0.021 ms

While in another terminal running tcpdump(1), we see the following output:

# tcpdump -ni eth0 port 53     
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
11:17:50.452323 IP 199.104.120.74.56928 > 198.60.22.2.53: 48462+ A? pthree.org. (28)
11:17:50.452925 IP 198.60.22.2.53 > 199.104.120.74.56928: 48462 1/5/21 A 166.70.136.38 (496)
11:17:50.453579 IP 199.104.120.74.43665 > 198.60.22.2.53: 46650+ PTR? 38.136.70.166.in-addr.arpa. (44)
11:17:50.454236 IP 198.60.22.2.53 > 199.104.120.74.43665: 46650 1/3/3 PTR tao.ae7.st. (181)
11:17:51.454981 IP 199.104.120.74.40485 > 198.60.22.2.53: 26170+ PTR? 38.136.70.166.in-addr.arpa. (44)
11:17:51.455430 IP 198.60.22.2.53 > 199.104.120.74.40485: 26170 1/3/3 PTR tao.ae7.st. (181)
11:17:52.455999 IP 199.104.120.74.51659 > 198.60.22.2.53: 31631+ PTR? 38.136.70.166.in-addr.arpa. (44)
11:17:52.456500 IP 198.60.22.2.53 > 199.104.120.74.51659: 31631 1/3/3 PTR tao.ae7.st. (181)
11:17:53.457183 IP 199.104.120.74.55632 > 198.60.22.2.53: 24544+ PTR? 38.136.70.166.in-addr.arpa. (44)
11:17:53.457632 IP 198.60.22.2.53 > 199.104.120.74.55632: 24544 1/3/3 PTR tao.ae7.st. (181)
^C
10 packets captured
11 packets received by filter
0 packets dropped by kernel

This output tells me that ping(1) does an initial forward lookup, then caches the resulting IP. Then, FOR EACH ICMP PACKET SENT, it does a reverse lookup on the IP. This is actually shown in the output, if you pay attention. I wasn't paying attention. Of course, according to the ping(1) manpage, you can pass the "-n" switch to prevent ping(1) from doing reverse lookups.

At any event, I'm not sure if you were aware of this or not, but I usually get so caught up in latencies and paths, that I'm not always paying attention to DNS. I guess ping(1) can be a great diagnostic tool for DNS as well. Learn something new each day.

Switch to our mobile site