Image of the glider from the Game of Life by John Conway
Skip to content

Hashcash and Mutt

Introduction

I wanted to used Hashcash with Mutt, for nothing more than a curiosity to see if it generates any discussion, and to see if people notice. Further, I'm a big crypto advocate, and while Hashcash isn't exactly crypto, it's highly related to it, and uses it. Regardless, I wanted to see if I could seamlessly tie in Hashcash with Mutt, both for minting and verification.

I make the following assumptions:

  • You have Hashcash installed.
  • You have Mutt installed.
  • You have Python 2.5 or later installed.

With that, let's continue.

What is Hashcash?

Hashcash is a proof-of-work protocol, with the motivation of eliminating SPAM. The idea is simple, although we'll cover it in greater detail. The TL;DR version, is using Hashcash is about generating tokens, and attaching them to the header of the message you're sending. The recipient checks the token for verification. If it passes, then you must not be SPAM, as you've put your computer through enough strenuous work to prove it. If the token fails, you could be SPAM, or you just did it wrong. Either way, it's no guarantee that you're not a spammer.

Now, the detailed version.

The premise behind Hashcash is that some certain mathematical results are difficult to discover but easy to verify. Factoring large numbers is one such example. So Hashcash is some sort of challenge for your CPU. For example, I say that you cannot comment on my blog, unless you can find the two prime factors of some large number. Once you've paid the work through CPU cycles, and provide the numbers, I multiply them together to see if it gives the result. The work was difficult for you to find, but simple for me to verify. Hashcash is based on this principle.

Using a lot of CPU cycles to answer a question non-interactively is one way to beat spammers at their own game, which Hashcash hopes to address. Rather than looking for large prime factors though, Hashcash is looking for a SHA1 sum of a unique string where the first set of characters in the sum are zero. Because SHA1 can provide 2^160 possible hashes before a collision is found, it shouldn't take too much work to find such a string. In other words, the search space is large enough for the work to be reasonable. However, the search space is also large enough, that given a certain criteria, it can be difficult to find the requested string.

Let's look at an example. Consider the SHA1 string:

00000528ba02c4da777839db269fde97de56ddf7

The first 20 bits of the string are zero. So, this is one such SHA1 hash that would work. The questions are, how did we find it, and what string did we use to generate it? Well, because the first 20 bits of the string are zero, this means that I should only have to search up to 2^20 possible hashes to find the string I'm looking for. On a moderate system within the last 10 years or so, this should take under a second. Indeed, the SHA1 hash string above was done on an AMD Athlon(tm) XP 1800+ with only 384 MB of PC133 RAM, and it was calculated in 940 milliseconds. The string, or in this case, our "Hashcash token" that generated that SHA1 hash is:

1:20:110321:aaron.toponce@gmail.com::Ci2v7/gsBbYY/dhk:0BTS

You can verify this easily enough:

echo -n 1:20:110321:aaron.toponce@gmail.com::Ci2v7/gsBbYY/dhk:0BTS | sha1sum
00000528ba02c4da777839db269fde97de56ddf7  -

So, if that is indeed a Hashcash token, then what are all the fields, and how are they used when searching for a SHA1 hash that starts with zeros? Well, the breakdown is thus:

  1. The version number.
  2. The claimed bit value.
  3. The date (and time) a stamp was minted.
  4. The resource for which a stamp is minted.
  5. Extensions that a specialized application may want.
  6. A random salt.
  7. The suffix.

Hashcash has undergone two revisions thus far. The version number shows the claimed version of the protocol being used. With the latest release of Hashcash, this is "1". "0" has shown to have some limitations.

The claimed bit value it telling the verifier of the token how many zeros the resulting SHA1 hash should have. The default is 20-bits, but this can be changed. If the bit value is too low, then it becomes trivial for spammers to reproduce with minimal effort. If the bit value is too high, it may take some considerable time for your CPU to mint the token.

The timestamp is the date when the token was minted. We can take advantage of this for expiry. When the verifier downloads the token, he can keep it in a database. This is useful to hopefully prevent someone else from claiming the same token. However, holding on to tokens indefinitely could be cumbersome, so we can set expiration dates on when the tokens expire, to help manage the database. The default expiration is 28 days.

The resource for which the token is minted is generally the email address of the recipients. However, it's flexible enough to be anything. It could be a URL to a webpage, some string of text, or anything that uniquely identifies a certain resource. We'll be using email addresses as our resource.

The fifth field is generally left blank, but the protocol specifies that this is used for extensions that any specialized application may want.

The salt is used just like it is used anywhere else. Here, we wish to make our token unique for our resource. The email address and timestamp might not be unique enough to prevent two minted tokens from being the same. So, we add a random, hopefully unique salt here to the token. Once the salt is chosen, just like the timestamp and the resource, it won't change for that token. The real grunt work is done with the next field. But, the goal with the salt is to just eliminate the possibility that two people send me an email on the same day, and the minted token is the same. If each salt is randomly chosen, then the tokens will differ.

As stated, this last field is where the real grunt work of your CPU lies. Every previous field is statically set upon instantiation. It is only this field that changes until we find the SHA1 sum that produces the correct number of zeros that we've asked for. Fortunately, the suffix is sequential, starting at zero, and working it's way up, base 64, until the appropriate suffix attached to this string gives us the SHA1 we want. As mentioned, if our bit size is 20 bits, by default, then we will only have to work our way through 2^20 possible suffixes, on average, to find the right string that gives us the SHA1 with 20 bits of leading zeros.

The security of the token comes from the fact that there should only be one collision for every 2^160 possible strings in SHA1. Let's not get tied up in the cryptanalysis. Suffice it to say, that SHA1 is still holding well in cryptographic circles, and for all practical purposes, we are interested in the amount of work it takes for the CPU to find the right token. When SHA1 becomes broken, and cryptanalysis shows it's no longer secure enough for today's applications, Hashcash is flexible enough to move to a different cryptographic hashing algorithm by just changing the protocol version. In the meantime, SHA1 works.

Also, given today's multicore CPUs, and Moore's Law, 20 bits might not be enough work for the CPU to crunch through. Remember, the idea is to beat spammers at their own game. If they can produce mass valid tokens for each spam mail they send, then we should make sure that our bit strength is stronger. Right now, 20 bits seems to be strong enough, but maybe moving it to 24 or 28 bits in the future might be necessary. Just remember the amount of time it takes for your CPU to calculate the work needed for the token.

How Hashcash works with email

When the token is minted, we wish to add the token to our mail header. We can add a token for each recipient in the "To:" and "Cc:" lines (with special care on how the headers are modified with Bcc: recipients (generally solved by sending separate emails with individual tokens)). What is added to the mail header is the following:

X-Hashcash: 1:20:110321:aaron.toponce@gmail.com::Ci2v7/gsBbYY/dhk:0BTS

For those using Hashcash, including antispam utilities such as SpamAssassin, looking for the X-Hashcash header in the mail, then verifying that the minted token is valid takes a fraction of a second. But, as we discussed earlier, minting the token takes some time. 20 bits for me takes just under a second. So, now imagine a SPAM ring with control of thousands of zombie computers infected with trojans. There are only 86400 seconds in a day, so if it takes one second to mint a valid token, then a single computer could only send out at most 86400 mails in a day, a far cry from the millions it wishes to send out. So, while it doesn't completely eliminate the zombie nets from sending mass emails, it does slow them down considerably.

Yet, for those of us who only send a couple dozen mails per day, it's trivial for our CPU to do the work, and doesn't slow us down any. Further, those receiving our messages can verify the token is valid in a fraction of the time it took to mint it. As the receiver, you set the price of the token you wish to validate as antispam. Again, 20 bits seems sufficient enough for today, but 24 or 28 bits might need to be your standard in the future.

For those of you who are not using Hashcash, you can simply ignore the header, and move on with your life. It won't slow your mail experience down any, and shouldn't get in the way of reading or replying to it.

Mutt Configuration

Now that we have a decent understanding of how the Hashcash protocol works, let's see how we set this up with Mutt. Thankfully, this is rather straight forward. All you need to add to your ~/.muttrc is:

set edit_headers="yes"                 # required
set editor="/path/to/mint_hashcash.py" # required
set askcc="yes"                        # recommended, but not required

This will allow you to edit the mail headers when composing your message with the editor of your choice. The editor is chosen for you in the Python script. I use Vim, so it's hardcoded to that. Feel free to change it to your preferred editor of choice. I should also mention at this point that I have only found a way to automate minting tokens with Mutt. I haven't found a way yet, although I'm working on it, to automate the process of verifying tokens, and storing tokens in a database, with Mutt. Hopefully, a followup post can come to fruition when I figure it out (after all, what's the point of minting tokens if you can't verify others' tokens yourself?).

The script is here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/env python
# Licensed under the public domain

import fileinput
import rfc822
import subprocess
import sys

subprocess.call("vim %s" % sys.argv[1], shell=True)

file = open(sys.argv[1], 'r')
headers = rfc822.Message(file)

to_emails = headers.getaddrlist("To")
cc_emails = headers.getaddrlist("Cc")

email_addrs = []
tokens = []

# Harvest all email addresses from the header
for email in to_emails:
    email_addrs.append(email[1])

for email in cc_emails:
    email_addrs.append(email[1])

# Remove duplicate emails from the list, requires Python 2.5 and later
email_addrs = list(set(email_addrs))

# Check if an appropriate token is already generated for the mail
if headers.has_key("X-Hashcash"):
    for list in headers.getheaders("X-Hashcash"):
        email_addrs.remove(list.split(":")[3])
if headers.has_key("Hashcash"):
    for list in headers.getheaders("Hashcash"):
        email_addrs.remove(list.split(":")[3])

# Call the hashcash function from the operating system to mint tokens
for email in email_addrs:
    t = subprocess.Popen("hashcash -m %s -X -Z 2" % email, shell=True, stdout=subprocess.PIPE)
    tokens.append(t.stdout.read())

# Write the newly minted tokens to the header
f = fileinput.FileInput(sys.argv[1], inplace=1)
for line in f:
    line = line.strip()
    if f.lineno() == 1:
        for token in tokens:
            print token,
        print line
        continue
    else:
        print line

file.close()

The code is fairly straight forward. Parse the email headers using RFC 822, grab the necessary email addresses, and mint the tokens as necessary. You'll quickly note that "Bcc:" addresses aren't supported, as minting tokens for those addresses would give them away in the header. I could send separate emails for each Bcc recipient, putting their own token in the header, but I'm not motivated enough to write that code, and I rarely, if ever, use blind carbon copy.

I've been running the above code now for 2-3 days, trying to break it as best I can, and I haven't come across anything. If you do notice a bug, or something sloppy in the code, let me know, and I'll make a best attempt at addressing it. However, it seems to be working quite well. The only thing I would mention, is if you have a lot of emails in the "To:" or "Cc:" fields, it may take some time to mint all those tokens. Just let it do its thing. It will get you back to Mutt. I promise.

🙂

{ 6 } Comments