Image of the glider from the Game of Life by John Conway
Skip to content

{ pts“” } Search Results

ZFS Administration, Part VIII- Zpool Best Practices and Caveats

Table of Contents

Zpool Administration ZFS Administration Appendices
0. Install ZFS on Debian GNU/Linux 9. Copy-on-write A. Visualizing The ZFS Intent Log (ZIL)
1. VDEVs 10. Creating Filesystems B. Using USB Drives
2. RAIDZ 11. Compression and Deduplication C. Why You Should Use ECC RAM
3. The ZFS Intent Log (ZIL) 12. Snapshots and Clones D. The True Cost Of Deduplication
4. The Adjustable Replacement Cache (ARC) 13. Sending and Receiving Filesystems
5. Exporting and Importing Storage Pools 14. ZVOLs
6. Scrub and Resilver 15. iSCSI, NFS and Samba
7. Getting and Setting Properties 16. Getting and Setting Properties
8. Best Practices and Caveats 17. Best Practices and Caveats

We now reach the end of ZFS storage pool administration, as this is the last post in that subtopic. After this, we move on to a few theoretical topics about ZFS that will lay the groundwork for ZFS Datasets. Our previous post covered the properties of a zpool. Without any ado, let's jump right into it. First, we'll discuss the best practices for a ZFS storage pool, then we'll discuss some of the caveats I think it's important to know before building your pool.

Best Practices

As with all recommendations, some of these guidelines carry a great amount of weight, while others might not. You may not even be able to follow them as rigidly as you would like. Regardless, you should be aware of them. I'll try to provide a reason why for each. They're listed in no specific order. The idea of "best practices" is to optimize space efficiency, performance and ensure maximum data integrity.

  • Only run ZFS on 64-bit kernels. It has 64-bit specific code that 32-bit kernels cannot do anything with.
  • Install ZFS only on a system with lots of RAM. 1 GB is a bare minimum, 2 GB is better, 4 GB would be preferred to start. Remember, ZFS will use 1/2 of the available RAM for the ARC.
  • Use ECC RAM when possible for scrubbing data in registers and maintaining data consistency. The ARC is an actual read-only data cache of valuable data in RAM.
  • Use whole disks rather than partitions. ZFS can make better use of the on-disk cache as a result. If you must use partitions, backup the partition table, and take care when reinstalling data into the other partitions, so you don't corrupt the data in your pool.
  • Keep each VDEV in a storage pool the same size. If VDEVs vary in size, ZFS will favor the larger VDEV, which could lead to performance bottlenecks.
  • Use redundancy when possible, as ZFS can and will want to correct data errors that exist in the pool. You cannot fix these errors if you do not have a redundant good copy elsewhere in the pool. Mirrors and RAID-Z levels accomplish this.
  • For the number of disks in the storage pool, use the "power of two plus parity" recommendation. This is for storage space efficiency and hitting the "sweet spot" in performance. So, for a RAIDZ-1 VDEV, use three (2+1), five (4+1), or nine (8+1) disks. For a RAIDZ-2 VDEV, use four (2+2), six (4+2), ten (8+2), or eighteen (16+2) disks. For a RAIDZ-3 VDEV, use five (2+3), seven (4+3), eleven (8+3), or nineteen (16+3) disks. For pools larger than this, consider striping across mirrored VDEVs. UPDATE: The "power of two plus parity" is mostly a myth. See http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ for more info.
  • Consider using RAIDZ-2 or RAIDZ-3 over RAIDZ-1. You've heard the phrase "when it rains, it pours". This is true for disk failures. If a disk fails in a RAIDZ-1, and the hot spare is getting resilvered, until the data is fully copied, you cannot afford another disk failure during the resilver, or you will suffer data loss. With RAIDZ-2, you can suffer two disk failures, instead of one, increasing the probability you have fully resilvered the necessary data before the second, and even third disk fails.
  • Perform regular (at least weekly) backups of the full storage pool. It's not a backup, unless you have multiple copies. Just because you have redundant disk, does not ensure live running data in the event of a power failure, hardware failure or disconnected cables.
  • Use hot spares to quickly recover from a damaged device. Set the "autoreplace" property to on for the pool.
  • Consider using a hybrid storage pool with fast SSDs or NVRAM drives. Using a fast SLOG and L2ARC can greatly improve performance.
  • If using a hybrid storage pool with multiple devices, mirror the SLOG and stripe the L2ARC.
  • If using a hybrid storage pool, and partitioning the fast SSD or NVRAM drive, unless you know you will need it, 1 GB is likely sufficient for your SLOG. Use the rest of the SSD or NVRAM drive for the L2ARC. The more storage for the L2ARC, the better.
  • Keep pool capacity under 80% for best performance. Due to the copy-on-write nature of ZFS, the filesystem gets heavily fragmented. Email reports of capacity at least monthly.
  • If possible, scrub consumer-grade SATA and SCSI disks weekly and enterprise-grade SAS and FC disks monthly. Depending on a lot factors, this might not be possible, so your mileage may vary. But, you should scrub as frequently as possible, basically.
  • Email reports of the storage pool health weekly for redundant arrays, and bi-weekly for non-redundant arrays.
  • When using advanced format disks that read and write data in 4 KB sectors, set the "ashift" value to 12 on pool creation for maximum performance. Default is 9 for 512-byte sectors.
  • Set "autoexpand" to on, so you can expand the storage pool automatically after all disks in the pool have been replaced with larger ones. Default is off.
  • Always export your storage pool when moving the disks from one physical system to another.
  • When considering performance, know that for sequential writes, mirrors will always outperform RAID-Z levels. For sequential reads, RAID-Z levels will perform more slowly than mirrors on smaller data blocks and faster on larger data blocks. For random reads and writes, mirrors and RAID-Z seem to perform in similar manners. Striped mirrors will outperform mirrors and RAID-Z in both sequential, and random reads and writes.
  • Compression is disabled by default. This doesn't make much sense with today's hardware. ZFS compression is extremely cheap, extremely fast, and barely adds any latency to the reads and writes. In fact, in some scenarios, your disks will respond faster with compression enabled than disabled. A further benefit is the massive space benefits.

Caveats

The point of the caveat list is by no means to discourage you from using ZFS. Instead, as a storage administrator planning out your ZFS storage server, these are things that you should be aware of, so as not to catch you with your pants down, and without your data. If you don't head these warnings, you could end up with corrupted data. The line may be blurred with the "best practices" list above. I've tried making this list all about data corruption if not headed. Read and head the caveats, and you should be good.

  • Your VDEVs determine the IOPS of the storage, and the slowest disk in that VDEV will determine the IOPS for the entire VDEV.
  • ZFS uses 1/64 of the available raw storage for metadata. So, if you purchased a 1 TB drive, the actual raw size is 976 GiB. After ZFS uses it, you will have 961 GiB of available space. The "zfs list" command will show an accurate representation of your available storage. Plan your storage keeping this in mind.
  • ZFS wants to control the whole block stack. It checksums, resilvers live data instead of full disks, self-heals corrupted blocks, and a number of other unique features. If using a RAID card, make sure to configure it as a true JBOD (or "passthrough mode"), so ZFS can control the disks. If you can't do this with your RAID card, don't use it. Best to use a real HBA.
  • Do not use other volume management software beneath ZFS. ZFS will perform better, and ensure greater data integrity, if it has control of the whole block device stack. As such, avoid using dm-crypt, mdadm or LVM beneath ZFS.
  • Do not share a SLOG or L2ARC DEVICE across pools. Each pool should have its own physical DEVICE, not logical drive, as is the case with some PCI-Express SSD cards. Use the full card for one pool, and a different physical card for another pool. If you share a physical device, you will create race conditions, and could end up with corrupted data.
  • Do not share a single storage pool across different servers. ZFS is not a clustered filesystem. Use GlusterFS, Ceph, Lustre or some other clustered filesystem on top of the pool if you wish to have a shared storage backend.
  • Other than a spare, SLOG and L2ARC in your hybrid pool, do not mix VDEVs in a single pool. If one VDEV is a mirror, all VDEVs should be mirrors. If one VDEV is a RAIDZ-1, all VDEVs should be RAIDZ-1. Unless of course, you know what you are doing, and are willing to accept the consequences. ZFS attempts to balance the data across VDEVs. Having a VDEV of a different redundancy can lead to performance issues and space efficiency concerns, and make it very difficult to recover in the event of a failure.
  • Do not mix disk sizes or speeds in a single VDEV. Do mix fabrication dates, however, to prevent mass drive failure.
  • In fact, do not mix disk sizes or speeds in your storage pool at all.
  • Do not mix disk counts across VDEVs. If one VDEV uses 4 drives, all VDEVs should use 4 drives.
  • Do not put all the drives from a single controller in one VDEV. Plan your storage, such that if a controller fails, it affects only the number of disks necessary to keep the data online.
  • When using advanced format disks, you must set the ashift value to 12 at pool creation. It cannot be changed after the fact. Use "zpool create -o ashift=12 tank mirror sda sdb" as an example.
  • Hot spare disks will not be added to the VDEV to replace a failed drive by default. You MUST enable this feature. Set the autoreplace feature to on. Use "zpool set autoreplace=on tank" as an example.
  • The storage pool will not auto resize itself when all smaller drives in the pool have been replaced by larger ones. You MUST enable this feature, and you MUST enable it before replacing the first disk. Use "zpool set autoexpand=on tank" as an example.
  • ZFS does not restripe data in a VDEV nor across multiple VDEVs. Typically, when adding a new device to a RAID array, the RAID controller will rebuild the data, by creating a new stripe width. This will free up some space on the drives in the pool, as it copies data to the new disk. ZFS has no such mechanism. Eventually, over time, the disks will balance out due to the writes, but even a scrub will not rebuild the stripe width.
  • You cannot shrink a zpool, only grow it. This means you cannot remove VDEVs from a storage pool.
  • You can only remove drives from mirrored VDEV using the "zpool detach" command. You can replace drives with another drive in RAIDZ and mirror VDEVs however.
  • Do not create a storage pool of files or ZVOLs from an existing zpool. Race conditions will be present, and you will end up with corrupted data. Always keep multiple pools separate.
  • The Linux kernel may not assign a drive the same drive letter at every boot. Thus, you should use the /dev/disk/by-id/ convention for your SLOG and L2ARC. If you don't, your zpool devices could end up as a SLOG device, which would in turn clobber your ZFS data.
  • Don't create massive storage pools "just because you can". Even though ZFS can create 78-bit storage pool sizes, that doesn't mean you need to create one.
  • Don't put production directly into the zpool. Use ZFS datasets instead.
  • Don't commit production data to file VDEVs. Only use file VDEVs for testing scripts or learning the ins and outs of ZFS.

If there is anything I missed, or something needs to be corrected, feel free to add it in the comments below.

ZFS Administration, Part VI- Scrub and Resilver

Table of Contents

Zpool Administration ZFS Administration Appendices
0. Install ZFS on Debian GNU/Linux 9. Copy-on-write A. Visualizing The ZFS Intent Log (ZIL)
1. VDEVs 10. Creating Filesystems B. Using USB Drives
2. RAIDZ 11. Compression and Deduplication C. Why You Should Use ECC RAM
3. The ZFS Intent Log (ZIL) 12. Snapshots and Clones D. The True Cost Of Deduplication
4. The Adjustable Replacement Cache (ARC) 13. Sending and Receiving Filesystems
5. Exporting and Importing Storage Pools 14. ZVOLs
6. Scrub and Resilver 15. iSCSI, NFS and Samba
7. Getting and Setting Properties 16. Getting and Setting Properties
8. Best Practices and Caveats 17. Best Practices and Caveats

Standard Validation

In GNU/Linux, we have a number of filesystem checking utilities for verifying data integrity on the disk. This is done through the "fsck" utility. However, it has a couple major drawbacks. First, you must fsck the disk offline if you are intending on fixing data errors. This means downtime. So, you must use the "umount" command to unmount your disks, before the fsck. For root partitions, this further means booting from another medium, like a CDROM or USB stick. Depending on the size of the disks, this downtime could take hours. Second, the filesystem, such as ext3 or ext4, knows nothing of the underlying data structures, such as LVM or RAID. You may only have a bad block on one disk, but a good block on another disk. Unfortunately, Linux software RAID has no idea which is good or bad, and from the perspective of ext3 or ext4, it will get good data if read from the disk containing the good block, and corrupted data from the disk containing the bad block, without any control over which disk to pull the data from, and fixing the corruption. These errors are known as "silent data errors", and there is really nothing you can do about it with the standard GNU/Linux filesystem stack.

ZFS Scrubbing

With ZFS on Linux, detecting and correcting silent data errors is done through scrubbing the disks. This is similar in technique to ECC RAM, where if an error resides in the ECC DIMM, you can find another register that contains the good data, and use it to fix the bad register. This is an old technique that has been around for a while, so it's surprising that it's not available in the standard suite of journaled filesystems. Further, just like you can scrub ECC RAM on a live running system, without downtime, you should be able to scrub your disks without downtime as well. With ZFS, you can.

While ZFS is performing a scrub on your pool, it is checking every block in the storage pool against its known checksum. Every block from top-to-bottom is checksummed using an appropriate algorithm by default. Currently, this is the "fletcher4" algorithm, which is a 256-bit algorithm, and it's fast. This can be changed to using the SHA-256 algorithm, although it may not recommended, as calculating the SHA-256 checksum is more costly than fletcher4. However, because of SHA-256, you have a 1 in 2^256 or 1 in 10^77 chance that a corrupted block hashes to the same SHA-256 checksum. This is a 0.00000000000000000000000000000000000000000000000000000000000000000000000000001% chance. For reference, uncorrected ECC memory errors will happen on about 50 orders of magnitude more frequently, with the most reliable hardware on the market. So, when scrubbing your data, the probability is that either the checksum will match, and you have a good data block, or it won't match, and you have a corrupted data block.

Scrubbing ZFS storage pools is not something that happens automatically. You need to do it manually, and it's highly recommended that you do it on a regularly scheduled interval. The recommended frequency at which you should scrub the data depends on the quality of the underlying disks. If you have SAS or FC disks, then once per month should be sufficient. If you have consumer grade SATA or SCSI, you should do once per week. You can schedule a scrub easily with the following command:

# zpool scrub tank
# zpool status tank
  pool: tank
 state: ONLINE
 scan: scrub in progress since Sat Dec  8 08:06:36 2012
    32.0M scanned out of 48.5M at 16.0M/s, 0h0m to go
    0 repaired, 65.99% done
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0

errors: No known data errors

As you can see, you can get a status of the scrub while it is in progress. Doing a scrub can severely impact performance of the disks and the applications needing them. So, if for any reason you need to stop the scrub, you can pass the "-s" switch to the scrub subcommand. However, you should let the scrub continue to completion.

# zpool scrub -s tank

You should put something similar to the following in your root's crontab, which will execute a scrub every Sunday at 02:00 in the morning:

0 2 * * 0 /sbin/zpool scrub tank

Self Healing Data

If your storage pool is using some sort of redundancy, then ZFS will not only detect the silent data errors on a scrub, but it will also correct them if good data exists on a different disk. This is known as "self healing", and can be demonstrated in the following image. In my RAIDZ post, I discussed how the data is self-healed with RAIDZ, using the parity and a reconstruction algorithm. I'm going to simplify it a bit, and use just a two way mirror. Suppose that an application needs some data blocks, and in those blocks, on of them is corrupted. How does ZFS know the data is corrupted? By checking the SHA-256 checksum of the block, as already mentioned. If a checksum does not match on a block, it will look at my other disk in the mirror to see if a good block can be found. If so, the good block is passed to the application, then ZFS will fix the bad block in the mirror, so that it also passes the SHA-256 checksum. As a result, the application will always get good data, and your pool will always be in a good, clean, consistent state.

Image showing the three steps ZFS would take to deliver good data blocks to the application, by self-healing the data.
Image courtesy of root.cz, showing how ZFS self heals data.

Resilvering Data

Resilvering data is the same concept as rebuilding or resyncing data onto the new disk into the array. However, with Linux software RAID, hardware RAID controllers, and other RAID implementations, there is no distinction between which blocks are actually live, and which aren't. So, the rebuild starts at the beginning of the disk, and does not stop until it reaches the end of the disk. Because ZFS knows about the the RAID structure and the filesystem metadata, we can be smart about rebuilding the data. Rather than wasting our time on free disk, where live blocks are not stored, we can concern ourselves with ONLY those live blocks. This can provide significant time savings, if your storage pool is only partially filled. If the pool is only 10% filled, then that means only working on 10% of the drives. Win. Thus, with ZFS we need a new term than "rebuilding", "resyncing" or "reconstructing". In this case, we refer to the process of rebuilding data as "resilvering".

Unfortunately, disks will die, and need to be replaced. Provided you have redundancy in your storage pool, and can afford some failures, you can still send data to and receive data from applications, even though the pool will be in "DEGRADED" mode. If you have the luxury of hot swapping disks while the system is live, you can replace the disk without downtime (lucky you). If not, you will still need to identify the dead disk, and replace it. This can be a chore if you have many disks in your pool, say 24. However, most GNU/Linux operating system vendors, such as Debian or Ubuntu, provide a utility called "hdparm" that allows you to discover the serial number of all the disks in your pool. This is, of course, that the disk controllers are presenting that information to the Linux kernel, which they typically do. So, you could run something like:

# for i in a b c d e f g; do echo -n "/dev/sd$i: "; hdparm -I /dev/sd$i | awk '/Serial Number/ {print $3}'; done
/dev/sda: OCZ-9724MG8BII8G3255
/dev/sdb: OCZ-69ZO5475MT43KNTU
/dev/sdc: WD-WCAPD3307153
/dev/sdd: JP2940HD0K9RJC
/dev/sde: /dev/sde: No such file or directory
/dev/sdf: JP2940HD0SB8RC
/dev/sdg: S1D1C3WR

It appears that /dev/sde is my dead disk. I have the serial numbers for all the other disks in the system, but not this one. So, by process of elimination, I can go to the storage array, and find which serial number was not printed. This is my dead disk. In this case, I find serial number "JP2940HD01VLMC". I pull the disk, replace it with a new one, and see if /dev/sde is repopulated, and the others are still online. If so, I've found my disk, and can add it to the pool. This has actually happened to me twice already, on both of my personal hypervisors. It was a snap to replace, and I was online in under 10 minutes.

To replace an dead disk in the pool with a new one, you use the "replace" subcommand. Suppose the new disk also identifed itself as /dev/sde, then I would issue the following command:

# zpool replace tank sde sde
# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h2m, 16.43% done, 0h13m to go
config:

        NAME          STATE       READ WRITE CKSUM
        tank          DEGRADED       0     0     0
          mirror-0    DEGRADED       0     0     0
            replacing DEGRADED       0     0     0
            sde       ONLINE         0     0     0
            sdf       ONLINE         0     0     0
          mirror-1    ONLINE         0     0     0
            sdg       ONLINE         0     0     0
            sdh       ONLINE         0     0     0
          mirror-2    ONLINE         0     0     0
            sdi       ONLINE         0     0     0
            sdj       ONLINE         0     0     0

The resilver is analagous to a rebuild with Linux software RAID. It is rebuilding the data blocks on the new disk until the mirror, in this case, is in a completely healthy state. Viewing the status of the resilver will help you get an idea of when it will complete.

Identifying Pool Problems

Determining quickly if everything is functioning as it should be, without the full output of the "zpool status" command can be done by passing the "-x" switch. This is useful for scripts to parse without fancy logic, which could alert you in the event of a failure:

# zpool status -x
all pools are healthy

The rows in the "zpool status" command give you vital information about the pool, most of which are self-explanatory. They are defined as follows:

  • pool- The name of the pool.
  • state- The current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level.
  • status- A description of what is wrong with the pool. This field is omitted if no problems are found.
  • action- A recommended action for repairing the errors. This field is an abbreviated form directing the user to one of the following sections. This field is omitted if no problems are found.
  • see- A reference to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated, and should always be referenced for the most up-to-date repair procedures. This field is omitted if no problems are found.
  • scrub- Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub in progress, or if no scrubbing was requested.
  • errors- Identifies known data errors or the absence of known data errors.
  • config- Describes the configuration layout of the devices comprising the pool, as well as their state and any errors generated from the devices. The state can be one of the following: ONLINE, FAULTED, DEGRADED, UNAVAILABLE, or OFFLINE. If the state is anything but ONLINE, the fault tolerance of the pool has been compromised.

The columns in the status output, "READ", "WRITE" and "CHKSUM" are defined as follows:

  • NAME- The name of each VDEV in the pool, presented in a nested order.
  • STATE- The state of each VDEV in the pool. The state can be any of the states found in "config" above.
  • READ- I/O errors occurred while issuing a read request.
  • WRITE- I/O errors occurred while issuing a write request.
  • CHKSUM- Checksum errors. The device returned corrupted data as the result of a read request.

Conclusion

Scrubbing your data on regular intervals will ensure that the blocks in the storage pool remain consistent. Even though the scrub can put strain on applications wishing to read or write data, it can save hours of headache in the future. Further, because you could have a "damaged device" at any time (see http://docs.oracle.com/cd/E19082-01/817-2271/gbbvf/index.html about damaged devices with ZFS), properly knowing how to fix the device, and what to expect when replacing one, is critical to storage administration. Of course, there is plenty more I could discuss about this topic, but this should at least introduce you to the concepts of scrubbing and resilvering data.

A Messaging Hub

I figured I would throw up a quick post about what I'm doing with my IRC client, and how it's turned into the de-facto messaging hub for just about everything in my online life. If it could handle SMTP/IMAP and RSS, it would be a done deal.

My main client is WeeChat. It connects to my ZNC bouncer. I run it both locally and remotely behind Tmux (the remote client is useful for things like Windows and Mac OS X). In turn, ZNC connects to "standard" IRC servers such as Freenode, OFTC and XMission, as well as Bitlbee. Bitlbee in turn connects to my Google Talk account via XMPP and my Identica and TWitter accounts. Lastly, thanks to Google Voice, I can write an IRC bot that can both send and receive SMS notifications (code still in progress). This means I can interact with mobile phones through my IRC client.

So, in a nutshell, here are all the pieces put together:

  • WeeChat - Main IRC client which connects to my bouncer.
  • ZNC - IRC bouncer responsible for connecting to all the IRC servers.
  • Bitlbee - IRC to IM gateway responsible for XMPP, Twitter and Identica (also supports other chat protocols).
  • zmq_notify.rb - WeeChat Ruby script sending away highlights to a ZMQ socket.
  • ircsms - Python script that subscribes to the zmq_notify.rb ZMQ socket, and sends an email to mobile provider email-to-sms gateway.
  • gvbot - Google Voice IRC bot allowing direct SMS & voicemail interaction through Google Voice (code still in progress).

If you think about it, this means that I can interact with others using the following protocols:

  • IRC
  • XMPP (and everything else Bitlbee supports)
  • HTTP (Twitter/Identica)
  • SMS

Of course, a screenshot would be nice, but there are plenty of those online. Instead, why not just put it together? 🙂

IRC Notifications Over SMS

Recently, I pinged my boss about a networking question, and got the following response:

I am away but your message is being sent to my phone.

Well, there are a few things he could be doing here:

  1. Logged in 24/7 with a local IRC client on his phone. Easiest, but will drain his battery quickly.
  2. Using an IRC script to send away messages to email. Pull notifications could be slow.
  3. Using an IRC script to send away messages to SMS. Snappy push notifications.

I've done both #1 and #2, but have never attempted #3, so I thought I'd give it a go. That way, the alert would be the most responsive, and I could login to IRC with a local client on my phone to address the issue, if it's important enough.

Now, to be clear, I'm not using Irssi. This may come as a shock to many of you, but last April, I discovered ZNC and WeeChat, and I haven't looked back. I REALLY like this setup. So, that means no more Irssi posts for this blog. It's time for WeeChat posts. Hopefully, I can do as good of a job. Further, this post addresses a script that must be running in WeeChat, not ZNC.

First, install http://weechat.org/scripts/source/stable/zmq_notify.rb.html/ in WeeChat. If running Debian/Ubuntu, this will mean installing the "built-essential", "ruby1.8" and "ruby1.8-dev" packages, then running "gem install zmq" to get the 0mq Ruby modules installed. Then restart WeeChat (yes, this is necessary- trust me) and load the script, and you should be good to go. If not, troubleshoot.

Now, the script sends YAML through the 0mq socket. Unfortunately, the YAML is not syntactically correct, and it's delivering base64-encoded binary. Meh. We can handle that. So, we need to connect to the 0mq socket that the script sets up, parse the YAML, then send the message as an email to an email-to-sms gateway. If you have a mobile phone, then the major phone providers have likely already set this up for you: See https://en.wikipedia.org/wiki/List_of_SMS_gateways for a fairly comprehensive list.

So, what does the script look like?

UPDATE: I created a Github project: https://github.com/atoponce/ircsms

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/python

import base64
import email.utils
import re
import smtplib
import yaml
import zmq
from email.mime.text import MIMEText

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.setsockopt(zmq.SUBSCRIBE, '')
socket.connect('tcp://127.0.0.1:2428')

while True:
    f = open('/var/log/0mq.log','a')
    msg = socket.recv()
    msg = re.sub('\n:', '\n', msg)
    msg = re.sub('^---| !binary \|-\n','',msg)
    y = yaml.load(msg)

    f.write(msg)
    f.close()

    # Ignore client events that aren't PUBLIC
    if not y['tags']:
        continue

    server = base64.b64decode(y['server'])
    channel = base64.b64decode(y['channel'])
    nick = base64.b64decode(y['tags'][3])
    nick = re.sub('^nick_','',nick)
    message = base64.b64decode(y['message'])

    # If sending messages to the channel while away, it shows up as
    # "prefix_nick_white". This can change it to your nick.
    if nick == 'prefix_nick_white':
        nick = 'eightyeight'

    # Change your email-to-sms address as provided by your mobile provider
    fromaddr = 'weechat@irc.example.com'
    toaddr = '1234567890@messaging.sprintpcs.com'
    msg = MIMEText("{0}/{1}: <{2}> {3}".format(server, channel, nick, message))
    msg['To'] = email.utils.formataddr(('eightyeight', toaddr))
    msg['From'] = email.utils.formataddr(('WeeChat', fromaddr))

    s = smtplib.SMTP('localhost')
    s.sendmail(fromaddr, [toaddr], msg.as_string())
    s.quit()

Place the code in your /etc/rc.local, or create a valid init script for it, and you're ready to go. Next time you set your status to away on IRC, you'll get an SMS alert every time someone highlights you in an IRC channel, or your receive a private message. If you're running WeeChat behind a terminal multiplexor, like GNU Screen or tmux, then you could also install an away script that sets your status away automatically when you detach from your session. Of course, if WeeChat disconnects from your bouncer, then this won't do much good for you.

Here is an example of an SMS alert from a private message from a bot:

weechat@irc.example.com said:
Subject: IRC Notification
21:59:34 freenode/ibot: <ibot> for heaven's sake, eightyeight, don't do that!

There are some outstanding bugs, and I'm working them out. In the meantime, if this interests you, then it should get you started with basic functionality. Happy IRC over SMS!

Encrypted ZFS Filesystems On Linux

This is just a quick post about getting a fully kernel-space encrypted ZFS filesystem setup with GNU/Linux, while still keeping all the benefits of what ZFS offers. Rather than using dmcrypt and LUKS, which would bypass a lot of the features ZFS brings to the table, encryptfs is our ticket. The reason this is so elegant, is because Oracle has not released the source code to ZFS after version 28. Version 32 contains the code to create native ZFS encrypted filesystems. So, we need to rely on a 3rd party utility.

First, create your ZPOOL:

# zpool create rpool raidz1 sdb sdc sdd sde sdf

Then create your ZFS filesystem:

# zfs create rpool/private

Lastly, install the ecryptfs software, and make the encrypted filesystem by mounting it, and follow the prompts:

# mount -t ecryptfs /rpool/private /rpool/private
Select key type to use for newly created files: 
 1) tspi
 2) passphrase
Selection: 2
Passphrase: 
Select cipher: 
 1) aes: blocksize = 16; min keysize = 16; max keysize = 32
 2) blowfish: blocksize = 8; min keysize = 16; max keysize = 56
 3) des3_ede: blocksize = 8; min keysize = 24; max keysize = 24
 4) twofish: blocksize = 16; min keysize = 16; max keysize = 32
 5) cast6: blocksize = 16; min keysize = 16; max keysize = 32
 6) cast5: blocksize = 8; min keysize = 5; max keysize = 16
Selection [aes]: 
Select key bytes: 
 1) 16
 2) 32
 3) 24
Selection [16]: 
Enable plaintext passthrough (y/n) [n]: 
Enable filename encryption (y/n) [n]: y
Filename Encryption Key (FNEK) Signature [53aad9b192678a8a]: 
Attempting to mount with the following options:
  ecryptfs_unlink_sigs
  ecryptfs_fnek_sig=53aad9b192678a8a
  ecryptfs_key_bytes=16
  ecryptfs_cipher=aes
  ecryptfs_sig=53aad9b192678a8a
Mounted eCryptfs

Notice that I enabled filename encryption, as I don't want anyone getting any of my USB drives to decipher what I'm trying to hide. This will mount the encrypted filesystem "on top" of the ZFS filesystem, allowing you to keep all the COW and error correcting goodness, while keeping your data 100% safe:

# mount | grep rpool
rpool on /pool type zfs (rw,relatime,xattr)
rpool/private on /rpool/private type zfs (rw,relatime,xattr)
/rpool/private on /rpool/private type ecryptfs (rw,relatime,ecryptfs_fnek_sig...(snip))

Works like a charm.

Ramadan, Take Two

Two years ago, I participated in the Islamic holy month of Ramadan. I blogged about my experiences, and you can read them here: Looking forward to Ramadan, Ramadan - Week One, Ramadan - Week Two, Ramadan - Week Three, An Open Letter to Pastor Terry Jones, and Ramadan - Week Four. Well, I intend to participate again this year, and I intend to blog about my experiences in a similar format as I did two years ago- once per week, summarizing how the week went.

Two years ago, I had three reasons for participating:

  1. Raise awareness about the Islam faith and promote religious tolerance.
  2. Grow closer to my God.
  3. Turn my personal weaknesses into strengths.

This year will be no different for me. The same three reasons above will apply. However, instead of reading the Holy Quran, as Muslims typically do, I will be reading one of my holy books, The Book of Mormon, from cover to cover. As I mentioned two years ago, I am a Christian belonging to The Church of Jesus Christ of Latter-day Saints. The Book of Mormon, along with the Holy Bible, we regard has Holy Scripture. The Book of Mormon is a bit more lengthy than the Quran, with approximately 270,000 words, where as the Quran has approximately 80,000 words. I felt rushed reading the Quran in one month, so I can only imagine how I'm going to feel reading 3x the amount of literature in the same time span. Should be interesting. I will still make attempts to attend the local mosque in Salt Lake City, at least once per week.

One interesting side note, when I fasted two years ago, I did it from sunrise to sunset. It wasn't until later that I learned that I am supposed to be fasting from dawn until sunset, which is about 30 minutes longer each day. Knowing this, I'll make sure to follow this a bit better. Also, last time, I chewed gum during the month to keep my breath smelling fresh. I learned that this was breaking the fast, so no chewing gum this year. Because the month is July 19 through August 18 this year, the days are longer than they were two years ago, which means this will certainly be more challenging. We start with a fast lasting roughly 16 hours on July 19.

See you then.

Mount Raw Images

Just recently, I needed to mount a KVM raw image file, because it was depending on a network mount that was no longer accessible, and any attempts to interact with the boot process failed. So, rather than booting off a live CD, or some other medium, I decided to mount the raw image file. After all, it is ext4.

However, mounting an image file means knowing where the root filesystem begins, which means knowing how to offset the mount, so you can access your data correctly. I used the following:

First, I setup a loop back device, so I could gather information about its partition setup:

# losetup /dev/loop0 virt01.img
# fdisk -l /dev/loop0

Disk /dev/loop0: 21.5 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009bdb7

      Device Boot      Start         End      Blocks   Id  System
/dev/loop0p1        37943296    41940991     1998848   82  Linux swap / Solaris
/dev/loop0p2   *        2048    37943295    18970624   83  Linux

Partition table entries are not in disk order

In this case, the virtual machine filesystem is 21.5 GB in size, in reads and writes in 512 byte blocks. Further, it appears as though swap occupies the second partition, while the ext4 root filesystem occupies the first, and begins at sector 2048, or byte 2048*512=1048576.

So, now I just need to tear down the loop back device, and create it again with an offset of 1048576 bytes, at which point, I should be able to mount the device:

# losetup -d /dev/loop0
# losetup /dev/loop0 virt01.img -o 1048576
# mount /dev/loop0 /mnt
# ls /mnt
bin/   home/            lib32/       mnt/   run/      sys/  vmlinuz@
boot/  initrd.img@      lib64/       opt/   sbin/     tmp/  vmlinuz.old@
dev/   initrd.img.old@  lost+found/  proc/  selinux/  usr/
etc/   lib/             media/       root/  srv/      var/

At this point, I can edit my problematic /mnt/etc/fstab file to fix the troubled boot, and boot it up.

Setup Network Interfaces in Debian

If you're not using NetworkManager or Wicd, or some other similar tool to automatically manage your network interfaces for you, this post is for you. In the Debian world, you have a single file that manages your network interfaces. It can manage VLANs, bonded interfaces, virtual interfaces and more. You can establish rules on what the interface should do before brought online, what it can do while online, and what it can do after online. These same rules could be applied for taking the interface down as well. Let's look at some of these.

First, let's look at the basic setup for getting an interface online with DHCP. The file we'll be looking at this entire time is the /etc/network/interfaces file:

auto eth0
allow-hotplug eth0
iface eth0 inet dhcp

The first line tells the kernel to bring the "eth0" interface up when the system boots. The second line tells the kernel to start the interface if a "hotplug" event is triggered. The third line defines the configuration of the "eth0" interface. In this case, it should use IPv4, and should request an IP address from a DHCP server. A static configuration could look like this:

auto eth0
allow-hotplug eth0
iface eth0 inet static
    address 10.19.84.2
    network 10.19.84.0
    gateway 10.19.84.1
    netmask 255.255.255.0

The first two lines remain the same. In the third line, we have decided to use static addressing, rather than dynamic. Then, we followed through by configuring the interface. It's important to note that the indentation is not required. I only indented it for my benefit.

What about bonding? Simple enough. Suppose you have 2 NICs, one on the motherboard, and other in a PCI slot, and you want to ensure high availability, should the PCI card die. Then you could do something like this:

auto eth0
iface eth0 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down

auto eth1
iface eth1 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down

auto bond0
iface bond0 inet static
    bond-slaves eth0 eth1
    # LACP configuration
    bond_mode 802.3ad
    bond_miimon 100
    bond_lcap_rate faste
    bond_xmit_hash_policy layer2+3
    address 10.19.84.2
    network 10.19.84.0
    gateway 10.19.84.1
    netmask 255.255.255.0

Technically, I don't need to tell the kernel to bring up interfaces eth0 and eth1, if I tell the kernel to bring up bond0, and slave the eth0 and eth1 interfaces. But, this configuration illustrates some points. First, there are the pre-up, up, post-up, pre-down, down, and post-down commands that you can use in your network interfaces(5) file. Each does something to the interface at different times during the configuration. Also notice I'm using the $IFACE variable. There are others that exist, that allow you to create scripts for your interfaces. See http://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_scripting_with_the_ifupdown_system for more information.

On the bonded interface, I'm putting in two slaves, then setting some bonding configuration that I want, such as using 802.3ad mode. Of course, the interface is static, so I provided the necessary information.

What if we wanted to add our bonded interface to a VLAN? Simple. Just append a dot "." and the VLAN number you want the interface in. Like so:

auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    # LACP configuration
    bond_mode 802.3ad
    bond_miimon 100
    bond_lcap_rate faste
    bond_xmit_hash_policy layer2+3

auto bond0.42
iface bond0.42 inet static
    address 10.19.84.2
    network 10.19.84.0
    gateway 10.19.84.1
    netmask 255.255.255.0
    # necessary due to a bonding bug in vlan tools
    vlan-raw-device bond0

Bring the interface up, the verify that the kernel has assigned it to the right VLAN:

$ sudo cat /proc/net/vlan/config
VLAN Dev name    | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
bond0.42        | 42  | bond0

Notice that I specified "vlan-raw-device bond0". This is due to a bonding bug in the VLAN tools, where merely specifying which VLAN the interface should be in by its name is not enough. You must also tell the kernel the bonded interface that the VLAN interface should be in.

How about bridged devices:

auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    # LACP configuration
    bond_mode 802.3ad
    bond_miimon 100
    bond_lcap_rate faste
    bond_xmit_hash_policy layer2+3

auto bond0.42
iface bond0.42 inet manual
    post-up ifconfig $IFACE up
    pre-down ifconfig $IFACE down
    # necessary due to a bonding bug in vlan tools
    vlan-raw-device bond0

auto br42
iface br42 inet static
    bridge_ports bond0.42
    address 10.19.84.1
    netmask 255.255.255.0
    network 10.19.84.0
    gateway 10.19.84.1

The only new thing here is the "bridge_ports" command. In this case, our bridged device is bridging our bond0.42 interface, which is in VLAN 42. Imagine having a KVM or Xen hypervisor that has a guest that needs to be in several VLANs. How would you setup all those bridges? Simple. Just create a VLAN interface for each VLAN, then create a bridge for each bonded interface in that VLAN.

Lastly, what about virtual IPs? I've heard that you can assign multiple IP addresses to a single NIC. How do you set that up? Simple. Just add a colon ":" the append a unique number. For example, say I have only one NIC, but wish to have 2 IP addresses, each in different networks:

auto eth0
iface eth0 inet static
    address 10.19.84.2
    netmask 255.255.255.0
    network 10.19.84.0
    gateway 10.19.84.1

auto eth0:1
iface eth0:1 inet static
    address 10.13.37.2
    netmask 255.255.255.0
    network 10.13.37.0

It's important to note that you generally only need one default gateway to get out. Your kernel will route packets accordingly. If you must specify multiple gateways, then you must manually make edits to the kernel's routing table, if everything isn't setup correctly.

Of course, we could combine everything we learned here. See if you can make out what each interface is doing:

auto eth0
iface eth0 inet manual
    pre-up ifconfig $IFACE up
    post-down ifconfig $IFACE down

auto eth1
iface eth1 inet manual
    pre-up ifcanfig $IFACE up
    post-down ifconfig $IFACe down

auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1 eth2 eth3
    # LACP configuration
    bond_mode 802.3ad
    bond_miimon 100
    bond_lacp_rate faste
    bond_xmit_hash_policy layer2+3

auto bond0.42
iface bond0.42 inet static
    address 10.19.84.2
    netmask 255.255.255.0
    netwark 10.19.84.0
    gateway 10.19.84.1
    # necessary due to a bonding up in vlan tools
    vlan-raw-device bond0

auto bond0.42:1
iface bond0.42:1 inet manual
    pre-up ifconfig $IFACE up
    post-down ifconfig $IFACE down
    # necessary due to a bonding bug in vlan tools
    vlan-raw-device bond0

auto br42
iface br42 inet static
    bridge_ports bond0.42:1
    address 10.13.37.2
    netmask 255.255.255.0
    network 10.13.37.0

Lastly, MTU. There is a lot of misinformation out there about frame size. In my professional experience, setting the MTU to 9000 bytes does not result in improved performance. Not noticeably at least. But it does have an effect on the CPU. Setting a larger frame size can result in much lower CPU usage, both on the switch, and in your box. However, some protocols, such as UDP, might break with a 9k MTU. So, use appropriately. At any event, here is how I generally set my MTU when dealing with multiple interfaces:

auto eth0
iface eth0 inet manual
    pre-up ifconfig $IFACE up
    post-down ifconfig $IFACE down
    mtu 9000

auto eth1
iface eth1 inet manual
    pre-up ifcanfig $IFACE up
    post-down ifconfig $IFACe down
    mtu 9000

auto bond0
iface bond0 inet manual
    bond-slaves eth0 eth1
    # LACP configuration
    bond_mode 802.3ad
    bond_miimon 100
    bond_lacp_rate faste
    bond_xmit_hash_policy layer2+3
    mtu 9000

auto bond0.42
iface bond0.42 inet static
    address 10.19.84.2
    netmask 255.255.255.0
    network 10.19.84.0
    gateway 10.19.84.1
    mtu 9000
    # necessary due to a bug in vlan tools
    vlan-raw-device bond0

auto bond0.43
iface bond0.43 inet static
    address 10.13.37.2
    netmask 255.255.255.0
    network 10.13.37.0
    mtu 1500
    # necessary due to a bug in vlan tools
    vlan-raw-device bond0

Note that I set the MTU to 9000 on all interfaces except for bond0.43, which is 1500. This is perfectly acceptable. In all reality, setting the MTU to 1500 on bond0.43 is just capping what bond0 can really do. But, it is important to set the MTU on each interface, otherwise the frame size of 1500 bytes will get set, and you'll end up chopping up your packets anyway. You must also set the MTU to 9000 on the switch ports as well, and any other server and interfaces that you want jumbo frames on.

Randomize First, Then Encrypt Your Block Device

This blog post is in continuation of the previous post, where I showed why you should not use ECB when encrypting your data. Well, when putting down an encrypted filesystem, such as LUKS, you've probably been told that you should put random data down on the partition first BEFORE encrypting the disk. Well, this post will illustrate why, and it's simple enough to do on your own GNU/Linux system.

I'll be using bitmaps in this example, as I did in the previous, except I'll use a different image. First, let's create a "random filesystem". Encrypted data should appear as nothing more than random data to the casual eye. This will be our target image for this exercise.

$ dd if=/dev/urandom of=target.bmp bs=1 count=480054
$ dd if=glider.bmp of=target.bmp bs=1 count=54 conv=notrunc

Here is what my target "encrypted filesystem" should look like (converting to GIF format for this post). Click to zoom:

Plaintext image Target filesystem

Now let's create a file full of binary zeros. This file will be the basis for our block device, and imitates an unused hard drive quite well. I have chosen ext2 over other filesystems, mostly because the size restriction with these files. Feel free to increase the file sizes, and use ext3, ext4, XFS, JFS, or whatever you want.

The file "400x400.bmp" is a white bitmap that is 400x400 pixels in size, rather than the 200x200 pixel "glider.bmp". This is to accommodate for the larger filesystems used in this post, and make the illustrations more clear. For your convenience, download the 400x400.bmp and glider.bmp for this exercise.

In these commands, "$" means running the command as an unprivileged user, "#" means running as root.

$ dd if=/dev/zero of=plain-zero-ext2.bmp bs=1 count=480054
# losetup /dev/loop0 plain-zero-ext2.bmp
# mkfs.ext2 /dev/loop0
# mount /dev/loop0 /mnt
# cp glider.bmp /mnt
# umount /mnt
# losetup -d /dev/loop0
$ dd if=400x400.bmp of=plain-ext2.bmp bs=1 count=54 conv=notrunc

This should give us a reference image to see what a "plaintext" filesystem would look like with our file copied to it. Now, let's setup two encrypted filesystems, one using ECB and the other using CBC, and we'll compare the three files together:

First the ECB filesystem:

$ dd if=/dev/zero of=ecb-zero-ext2.bmp bs=1 count=480054
# losetup /dev/loop0 ecb-zero-ext2.bmp
# cryptsetup -c aes-ecb create ecb-disk /dev/loop0
# mkfs.ext2 /dev/mapper/ecb-disk
# mount /dev/mapper/ecb-disk /mnt
# cp glider.bmp /mnt
# umount /mnt
# dmsetup remove ecb-disk
# losetup -d /dev/loop0
$ dd if=400x400.bmp of=cbc-zero-ext2.bmp bs=1 count=54 conv=notrunc

Now the CBC filesystem:

$ dd if=/dev/zero of=cbc-zero-ext2.bmp bs=1 count=480054
# losetup /dev/loop0 cbc-zero-ext2.bmp
# cryptsetup create cbc-disk /dev/loop0
# mkfs.ext2 /dev/mapper/cbc-disk
# mount /dev/mapper/cbc-disk /mnt
# cp glider.bmp /mnt
# umount /mnt
# dmsetup remove cbc-disk
# losetup -d /dev/loop0
$ dd if=400x400.bmp of=ecb-zero-ext2.bmp bs=1 count=54 conv=notrunc

What do we have? Here are the results of my filesystems. Click to zoom:

Plaintext filesystem ECB filesystem CBC filesystem

How do they compare to our target filesystem? Well, not close really. Even when using CBC mode with AES, we can clearly see where the encrypted data resides, and where it doesn't. Now, rather than filling our disk with zeros, let's fill it with random data, and go through the same procedure as before:

First the "plaintext" filesystem:

$ dd if=/dev/urandom of=plain-urandom-ext2.bmp bs=1 count=480054
# losetup /dev/loop0 plain-urandom-ext2.bmp
# mkfs.ext2 /dev/loop0
# mount /dev/loop0 /mnt
# cp glider.bmp /mnt
# umount /mnt
# losetup -d /dev/loop0
$ dd if=400x400.bmp of=plain-urandom-ext2.bmp bs=1 count=54 conv=notrunc

Now the ECB filesystem:

$ dd if=/dev/urandom of=ecb-urandom-ext2.bmp bs=1 count=480054
# losetup /dev/loop0 ecb-urandom-ext2.bmp
# cryptsetup -c aes-ecb create ecb-disk /dev/loop0
# mkfs.ext2 /dev/mapper/ecb-disk
# mount /dev/mapper/ecb-disk /mnt
# cp glider.bmp /mnt
# umount /mnt
# dmsetup remove ecb-disk
# losetup -d /dev/loop0
$ dd if=400x400.bmp of=cbc-urandom-ext2.bmp bs=1 count=54 conv=notrunc

Finally, the CBC filesystem:

$ dd if=/dev/urandom of=cbc-urandom-ext2.bmp bs=1 count=480054
# losetup /dev/loop0 cbc-urandom-ext2.bmp
# cryptsetup create cbc-disk /dev/loop0
# mkfs.ext2 /dev/mapper/cbc-disk
# mount /dev/mapper/cbc-disk /mnt
# cp glider.bmp /mnt
# umount /mnt
# dmsetup remove cbc-disk
# losetup -d /dev/loop0
$ dd if=400x400.bmp of=ecb-urandom-ext2.bmp bs=1 count=54 conv=notrunc

Check our results. Click to zoom:

Plaintext filesystem ECB filesystem CBC filesystem

Much better! By filling the underlying disk with (pseudo)random data first, then encrypting the filesystem with AES using CBC, we have a hard time telling the difference between it and our target filesystem, which was our main goal.

So, please, for the love of security, before putting down an encrypted filesystem on your disk, make sure you fill it with random data FIRST! The Debian installer, and many others, does this by default. Let it run to completion, even if it takes a few hours.

ECB vs CBC Encryption

This is something you can do on your computer fairly easily, provided you have OpenSSL installed, which I would be willing to bet you do. Take a bitmap image (any image will work fine, I'm just going to use bitmap headers in this example), such as the Ubuntu logo, and encrypt it with AES in ECB mode. Then encrypt the same image with AES in CBC mode. Apply the 54-byte bitmap header to the encrypted files, and open up in an image viewer. Here are the commands I ran:

$ openssl enc -aes-256-ecb -in ubuntu.bmp -out ubuntu-ecb.bmp
$ openssl enc -aes-256-cbc -in ubuntu.bmp -out ubuntu-cbc.bmp
$ dd if=ubuntu.bmp of=ubuntu-ecb.bmp bs=1 count=54 conv=notrunc
$ dd if=ubuntu.bmp of=ubuntu-cbc.bmp bs=1 count=54 conv=notrunc

Now, open all three files, ubuntu.bmp, ubuntu-ecb.bmp and ubuntu-cbc.bmpp, and see what you get. Here are my results with the password "chi0eeMieng7Ohe8ookeaxae6ieph1":

Plaintext ECB Encrypted CBC Encrypted

Feel free to play with different passwords, and notice the colors change. Or use a different block cipher such as "bf-ecb", "des-ecb", or "rc2-ecb" with OpenSSL, and notice details change.

What's going on here? Why can I clearly make out the image when encrypted with EBC? Well, EBC, or electronic codeblock, is a block cipher that operates on individual blocks at a time. ECB does not use an initialization vector to kickstart the encryption. So, each block is encrypted with the same algorithm. If any underlying block is the same as another, then the encrypted output is exactly the same. Thus, all "#000000" hexadecimal colors in our image, for example, will have the same encrypted output, per block (thus, why you see stripes).

Compare this to CBC, or cipher-block chaining. An initialization vector must be used before the encryption can begin. The password in our case is our initialization vector. It is hashed to provide a 256-bit output, then AES encrypts the hash, plus the first block to provide a 512-bit output, 256-bits for the next vector, and 256-bits encrypted output. That vector is then used to encrypt the next 256-bits. This chaining algorithm continues to the end of the file. This ensures that every "#000000" hexadecimal color will have a different output, thus causing the file to appear as random (I have an attacking algorithm to still leak information out of a CBC-encrypted file, but that will be for another post).

Hopefully, this simple illustration convinces you to use CBC, or at least to not use ECB, when encrypting data that might be public.

Expand URLs in Irssi

If you're an IRC junkie, and spend hours a day in Irssi, then this post might be useful to you.

It's all the rage these days to shorten URLs with fancy URL shortening services. Heck, even I have one. They are certainly nice to have, when links are exceptionally long, such as search result URLs, and just the mere wrapping from one line to the next breaks the URL (not to mention, any additional characters added in the line break, such as spaces, other characters, etc.). I've used, and still use, link shortening services for IM, IRC, email, Identi.ca, Twitter, etc., only when I suspect the link could break as a result of line wrapping. I use them sparingly, and only use them if they provide a preview feature, giving the link to the preview.

While they have their advantages, they certainly come with a cost. Link rot is a very real concern, should the link shortening service go offline. You can nest shortened links in each other, concealing JavaScript/CSS mouse hovers. They can contain all sorts of nasties, and you don't know what you're getting into, unless you use some sort of software to expand the URL for you, before you actually follow the link. I've already blogged about using a simple shell function to expand shortened URLs (post at http://pthree.org/2011/10/18/use-wget1-to-expand-shortened-urls/). Well, now it's time for Irssi to automatically provide the function for me.

Presenting https://github.com/jcande/Expand-URLs. This is a simple Irssi script that will identify URLs in a given notice, whether in private or in public, and expand them using the http://longurl.org service (I think a patch for doing the lookup without a 3rd party should probably be submitted, as any 3rd party expanding service might go offline).

For me, this script is exceptionally valuable, because I connect to a local Bitlbee instance with Irssi, and use Bitlbee to connect to Twitter. Unfortunately, Twitter wants to track your clicks with their http://t.co service. Every link longer than 19 characters (20 for HTTPS) submitted to Twitter is automatically shortened with this wrapper. They claim that the service is to identify malicious links, and prevent them from being posted, should one be identified. But certainly, a company the size of Twitter can do so much more with this new "service". They could track what links are clicked and when. They can use this information to identify what stuff you're interested in, and when you use the service. They can track who clicks the link by IP or ISP. Of course, it would be foolish to not sell this information to advertisers, to target additional advertising on Twitter or other sites, based on this info.

At any event, this is one of the few Irssi scripts that I find really, really useful for day-to-day. It makes the Twitter timeline a bit chatty, now that lengthy URLs are being shown, and a few break due to line wrapping. And that is a pain, no doubt. But, the vast majority of links don't break, and it's nice seeing where I'll be taken when visiting the link. Keeping Twitter from tracking me, despite the occasional link breakage, is worth it.

P.S.: There is also a WeeChat script at http://www.weechat.org/files/scripts/expand_url.pl.

Use wget(1) To Expand Shortened URLs

I'm a fan of all things microblogging, but let's face it: until URLs become part of the XML, and not part of your character count (which is ridiculous anyway), shortened URLs are going to be a way of life. Unfortunately, those shortened URLs can be problematic. They could host malicious scripts and/or software that could infect your browser and/or system. They could lead you to an inappropriate site, or just something you don't want to see. And because these URLs are a part of our microblogging lives, they've also become a part of our email, SMS, IM, IRC, lives as well as other online aspects.

So, the question is: do you trust the short URL? Well, I've generally gotten into the habit of asking people to expand the shortened url for me if on IRC, email or IM, and it's worked just fine. But, I got curious if there was a way to do it automagically, and thankfully, you can use wget(1) for this very purpose. Here's a "quick and dirty" approach to expanding shortened URLs (emphasis mine):

$ wget --max-redirect=0 -O - http://t.co/LDWqmtDM
--2011-10-18 07:59:53--  http://t.co/LDWqmtDM
Resolving t.co (t.co)... 199.59.148.12
Connecting to t.co (t.co)|199.59.148.12|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://is.gd/jAdSZ3 [following]
0 redirections exceeded.

So, in this case "http://t.co/LDWqmtDM" is pointing to "http://is.gd/jAdSZ3", another shortened URL (thank you Twitter for shortening what is already short (other services are doing this too, and it's annoying- I'm looking at you StatusNet)). So, let's increase our "--max-redirect" (again, emphasis mine):

$ wget --max-redirect=1 -O - http://t.co/LDWqmtDM
--2011-10-18 08:02:12--  http://t.co/LDWqmtDM
Resolving t.co (t.co)... 199.59.148.12
Connecting to t.co (t.co)|199.59.148.12|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://is.gd/jAdSZ3 [following]
--2011-10-18 08:02:13--  http://is.gd/jAdSZ3
Resolving is.gd (is.gd)... 89.200.143.50
Connecting to is.gd (is.gd)|89.200.143.50|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://wiki.ubuntu.com/UbuntuOpenWeek [following]
1 redirections exceeded.

So, in this case, the link finally points to https://wiki.ubuntu.com/UbuntuOpenWeek. I'm familiar enough with the Ubuntu Wiki, that I know I should be safe visiting the initial shortened URL. If you want to add this to a script or shell function, then you can get a bit more fancy:

$ expandurl() { wget -O - --max-redirect=$2 $1 2>&1 | grep ^Location; }
$ expandurl http://t.co/LDWqmtDM 1
Location: http://is.gd/jAdSZ3 [following]
Location: https://wiki.ubuntu.com/UbuntuOpenWeek [following]

In this case, our "expandurl()" function takes two arguments: the first being the URL you wish to expand, and the second being the max redirects. You'll notice further that I added "-0 -" to print to STDERR. This is just in case you give too many redirects, it will print the content of the page's HTML to the terminal, rather than saving to a file. Because you're grepping for "^Location", and sending the HTML to your terminal anyway, technically you could get rid of the "--max-redirects" altogether. But, keeping it in play does seriously increase the time it takes to get the locations. Whatever works for you.

UPDATE (Oct 18, 2011): After some comments have come in on the post, and some discussion on IRC, there is a better way to handle this. According to the wget(1) manpage, "-S" or "--server-response" will print the headers and responses printed by the FTP/HTTP servers. So, here's the updated function that you might find to be less chatty, and faster to execute as well:

$ expandurl() { wget -S $1 2>&1 | grep ^Location; }
$ expandurl http://t.co/LDWqmtDM
Location: http://is.gd/jAdSZ3 [following]
Location: https://wiki.ubuntu.com/UbuntuOpenWeek [following]

Perfect.

Avoid Using which(1)

This post comes from BashFAQ/081 on Greg's Wiki. He argues why you should not be using which(1) to determine if a command is in your $PATH at the end of the page. I'll put that argument at the front:

The command which(1) (which is often a csh script, although sometimes a compiled binary) is not reliable for this purpose. which(1) may not set a useful exit code, and it may not even write errors to stderr. Therefore, in order to have a prayer of successfully using it, one must parse its output (wherever that output may be written).

Note that which(1)'s output when a command is not found is not consistent across platforms. On HP-UX 10.20, for example, it prints "no qwerty in /path /path /path ..."; on OpenBSD 4.1, it prints "qwerty: Command not found."; on Debian (3.1 through 5.0 at least) and SuSE, it prints nothing at all; on Red Hat 5.2, it prints "which: no qwerty in (/path:/path:...)"; on Red Hat 6.2, it writes the same message, but on standard error instead of standard output; and on Gentoo, it writes something on stderr.

(Quotation and manpage reference additions mine). So, if which(1) is bad news, then what is the "proper" way to determine if a command is in your $PATH? Well POSIX has an answer, and not surprisingly, the command to use is "command":

1
2
3
4
5
6
# POSIX
if command -v qwerty >/dev/null; then
  echo qwerty exists
else
  echo qwerty does not exist
fi

The "command" built-in also returns true for shell built-ins. If you absolutely must check only PATH, the only POSIX way is to iterate over it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# POSIX
IsInPath ()
(
  [ $# -eq 1 ] && [ "$1" ] || return 2
  set -f; IFS=:
  for dir in $PATH; do
    [ -z "$dir" ] && dir=. # Legacy behaviour
    [ -x "$dir/$1" ] && return
  done
  return 1
)

if IsInPath qwerty; then
  echo qwerty exists
else
  echo qwerty does not exist
fi

There are also Bash built-ins that can be used, should you have Bash installed on your system:

1
2
3
4
5
6
# Bash using the 'hash' built-in
if hash qwerty 2>/dev/null; then
  echo qwerty exists
else
  echo qwerty does not exist
fi

Or:

1
2
3
4
5
6
7
# Bash using the 'type' built-in
# type -P forces a PATH search, skipping builtins and so on
if type -P qwerty >/dev/null; then
  echo qwerty exists
else
  echo qwerty does not exist
fi

If you prefer the ZSH (my addition not present in the wiki), as I do, then you can look in the $commands associative array:

1
2
3
4
5
6
# ZSH using the $commands associative array
if [[ $commands[qwerty] >/dev/null ]]; then
    echo qwerty exists
else
    echo qwerty does not exist
fi

I like that at the end of the FAQ, he gives a shell script for using which(1) should it be absolutely necessary. Not only do you have to test for exit code, but you also have to test for common strings in the output, seeing as though which(1) doesn't always use exit codes properly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Bourne.  Last resort -- using which(1)
tmpval=`LC_ALL=C which qwerty 2>&1`
if test $rc -ne 0; then
  # FOR NOW, we'll assume that if this machine's which(1) sets a nonzero
  # exit status, that it actually failed.  I've yet to see any case where
  # which(1) sets an erroneous failure -- just erroneous "successes".
  echo "qwerty is not installed.  Please install it."

else
    # which returned 0, but that doesn't mean it succeeded.  Look for known error strings.
    case "$tmpval" in
      *no\ *\ in\ *|*not\ found*|'')
        echo "qwerty is not installed.  Please install it."
        ;;
      *)
        echo "Congratulations -- it seems you have qwerty (in $tmpval)."
        ;;
    esac
fi

CONCLUSION:
You have many options to find whether or not a command exists in your $PATH, some POSIX, some proper built-ins. Regardless, you should be able to build platform-independent scripts using the proper tools, and using which(1) is not the right tool for the job. Hopefully, this has convinced you of that.

Pimp My Irssi - Part 2

It's been over 3 years since I wrote the original article about pimping out Irssi, with various themes, scripts, aliases, etc. Well, I figure it's probably time for an update. After all, if you know anything about me, you know I'm an IRC junkie with Irssi, and currently having a love affair with Bitlbee.

Aliases
I've only added two aliases to Irssi since we've last met. The first alias comes from the main Irssi.org website. If you scroll to the bottom of the page, there is a tip close to the end for adding all open channels to your "/channel list". The only thing it was missing was saving the window layout to the config, so next time you open your Irssi, not only will you be rejoined to all of your channels, but they'll all be in the same place they were last time. Here's the alias:

/alias CHANNADD script exec foreach my \$channel (Irssi::channels()) { Irssi::command("channel add -auto \$channel->{name} \$channel->{server}->{tag} \$channel->{key}")\;}; layout save; save

This next alias will effectively "mark all as read" for your channels. For whatever reason, you may like to have a clean slate for channel activity in your statusbar, rather than all the channels showing some form of activity. This alias will clean up your statusbar, so any new activity becomes immediately available. As mentioned, think of this as "marking all channels as read":

/alias READ script exec \$_->activity(0) for Irssi::windows

Scripts
My scripts have changed a little bit, but not by much. Really, there's only two scripts worth mentioning about that I haven't already done so in the previous post.

The first script is similar to "screen_away.pl", except for tmux(1). I've adopted nesting Irssi behind tmux(1) instead of screen(1) because I just couldn't get bolded black and white to work correctly. Now, everything looks as it should, with zero formatting issues. At any event, "tmux_away.pl" does the same thing, in that it sets your "/away" status when you detach your session, and resets it when you reattach.

The second script is "trigger.pl". This script is mostly a search/replace script for events that happen in the channel. For example, if you don't like the "F-word", you can change it to something less vulgar with trigger.pl. It can handle much more, but I only use it mainly for that purpose. The triggers I've set up this far are as follows:

-publics -regexp 'Message from unknown participant ' -replace ''
-all -nocase -regexp '</?(a|b|body|div|em|font|i|s|u)( +\w+=".*?")*>' -replace ''
-privmsgs -nocase -regexp '&amp;' -replace '&'
-privmsgs -nocase -regexp '&gt;' -replace '>'
-privmsgs -nocase -regexp '&lt;' -replace '<' -privmsgs -nocase -regexp '&quot;' -replace '"' -publics -masks 'update!update@identi.ca' -channels '&bitlbee' -regexp '^aaron: ' -replace ''

All of my triggers are Bitlbee triggers. Let's cover each trigger that I've created, just so there's no confusion:

  1. When joining Jabber MUCs with Bitlbee, the Jabber server you're connecting to can be configured to print the past 20 lines (or whatever) of the previous log, so you can get caught up to speed on the discussion quickly. However, if the user that said something in that log is not in the MUC when you join, you'll see "Message from unknown participant" in the backlog. This trigger removes that completely.
  2. When Pidgin or Adium users get online, sometimes the clients send HTML/XML to Bitlbee. THis trigger strips the HTML/XML tags as listed completely.
  3. Similar to trigger #2, this trigger replaces the HTML "&amp;" with "&";
  4. Similar to trigger #2, this trigger replaces the HTML "&gt;" with ">";
  5. Similar to trigger #2, this trigger replaces the HTML "&lt;" with "<";
  6. Similar to trigger #2, this trigger replaces the HTML "&quot;" with '"';
  7. I use Identi.ca with the XMPP bot in the "&bitlbee" channel. I know the bot is addressing everything to me, so I don't want to see each notice begin with "aaron: ", so I've stripped that out completely, so the notice actually starts with the user nickname. If you're interfacing with the bot in a private channel, rather than in "&bitlbee", then this trigger won't do much good for you. You'll need to modify it.

Theme
As much as I try to do a new theme, I always come back to my custom-built "madcow.theme" called "88_madcows.theme". It's the awesome. Screenshot

Announcing Penny Red

This is my first open source project that I've started and maintained, so I'm pretty excited about it. I was upset at the current offerings of Hashcash for the various MUAs, so I set out to do something about it. You've already read on my blog about my solutions for minting and verifying Hashcash tokens with Mutt. Well, a good friend of mine suggested that I start an open source project out of it, put it into revision control and get my rear in gear. So, I did just that.

Announcing Penny Red, a Python solution to integrate Hashcash into Mutt, licensed under the GPLv3. I chose the name "Penny Red" for a two reasons:

  1. Penny Red was the second British postal stamp which was debuted in 1841, the first being the Penny Black stamp. The color was changed, so the black cancellation mark could be easily seen on a red stamp, versus the previous black stamp. Because the goal of the project is to implement a payment system through minted tokens, and a previous solution had existed in Perl (without documentation, mind you), this seemed appropriate.
  2. Unfortunately, in the English language, the words "red" and "read" have the same sound, yet different spelling. After reading your email, you would say "I have read my mail". Because it had postage attached in the headers, you read mail that was paid for. So, "Penny Red" is a play on "Penny Read"

The great thing with the Python scripts in Penny Red is their portability. The "mint_hashcash.py" script reads a file as a passed argument, and then writes to the headers. The "verify_hashcash.py" script reads the mail from STDIN and prints the message and the verified tokens back to STDOUT. Nothing specific about Mutt is in either script! As a result, as long as the MUA supports STDIN and STDOUT with each message, and modifying the headers, these scripts can be used. I guess what I'm saying is, I would like to extend these scripts to Pine, Alpine, Gnus (although not really necessary as Gnus supports Hashcash out of the box), Elm, and others, without forking the project. Penny Red should be able to support multiple MUAs.

At any event, I just wanted to get this post out there, seeing as though I just barely setup the project. I'm pretty excited. Should be fun, and give me something to do after graduation.