## SCALE 12x PGP Keysigning Party

This year, at SCALE 12x, I'll be hosting the PGP keysigning party. What is a keysigning party, and why should you attend? Maybe this will clear things up.

## What is a keysigning party?

A PGP keysigning party is an event where PGP users meet together to exchange identity information and PGP fingerprints. Typically, at a conference such as SCALE 12x, PGP parties are common. The whole idea of the party is to expand the global "Web of Trust". In reality, however, you attend a keysigning party, because you share interests in cryptography, or you are interested in communicating privately with others. Usually, you can expect 10-20 people to attend keysigning parties at Linux conferences, sometimes more, sometimes less.

## What is the Web of Trust?

The Web of Trust is just a logical mapping of exchanged identities. When at the keysigning party, you will exchange photo identification with each other as wall as PGP fingerprints. You do this for two reasons, and two reasons only: to verify that you have the correct key and to verify that the owner of that key is who they say they are. That's it. When you leave the party, you will sign every key that you have personally identified. By doing so, you bring that key into your own personal Web of Trust. An arrow from you to them indicates that you have signed their key. If they return the favor, and sign your key, then a return arrow from them to you will result.

It's not too difficult to create a personal Web of Trust that you can view. I have blogged about it in the past at https://pthree.org/2013/03/01/create-your-own-graphical-web-of-trust-updated/. It's interesting to see the large groupings of signatures. It's clear that there was a PGP keysigning party in those large groups.

## What do I bring to a keysigning party?

You really only need three things with you when you come to the party:

1. Something to write with, like a pen or pencil.
2. A government issued photo identification at a minimum. Additional identification is appreciated.

That last item is important. Very important. If you didn't bring item #3, then most PGP keysigning party organizers will not allow you to participate. In order to printout your PGP key fingerprint, run the following command at your terminal:

$gpg -K --fingerprint Print that out on a piece of paper, and bring it with you to the party. Some conferences, such as SCALE 12x, will print your PGP fingerprint on your conference badge. This will allow you to sign keys anywhere anytime. All you need to do is verify that the fingerprint on your personal printout matches the fingerprint on the conference badge. Then you may use your conference badge fingerprint at the party. It's important that you bring your own copy of your PGP key fingerprint, however. The keysigning party organizer will handout a printout of all the PGP key fingerprints for everyone in attendance. This is to verify that the organizer downloaded your correct and current key(s). You will read off your fingerprint from your personal printout, and everyone else will verify the fingerprint on their printout. ## What happens before the party? All that needs to be done, is every attendant must submit their public PGP key to the party organizer. Typically, there is a deadline on when keys can be submitted. It's important that you adhere to that deadline. The party organize then prints out on paper a list of every PGP key fingerprint of those who are attending. If you submit your key late, it will not be on the paper, and as such, many party orgasizers will not let you participate. ## What happens at the party? The key party organizer will pass out sheets of paper with every PGP key fingerprint that has been submitted. Typically, party organizers will also explain the importance of the party, why cryptography, and other things related to crypto. Then, he will explain how the party will proceed, at which point every attendee will read their PGP key fingerprint from their own printout. Everyone else in attendance will verify that the organizer has the correct key by following along on the handout. This is done for everyone in attendance. After fingerprints have been verified, we then get into two equal lines, facing each other. While facing someone in the line opposite if you, you introduce yourself, explain where your key is on the handout, and exchange government issued identification. After identification has been exchanged, everyone in both lines takes 1/2 step to their right. This will put you in front of a new person to repeat the process. Those at the ends turn around, facing the opposite direction, and continue shuffling to their right. Think of the whole process as one large conveyor belt. Once you are facing the person you started with, then everyone should have successfully verified everyone else's identity and key. At this point, the party is typically over. ## What happens after the party? This is the most critical step of the whole process, and for some reason, not everyone does it. Now that you have your handout with all the keys printed on it, you need to sign every key that you have personally identified. What you attended was a KEYSIGNING party. This means that you must SIGN KEYS. I know I'm putting a lot of emphasis on this, but I have personally attended close to a dozen PGP keysigning parties, and I would say the rate of signing keys is about 70%, unless I annoy them week in and week out to sign my key, then I'll get a return close to 90%. It blows my mind that people spent a great amount of time at the PGP keysigning party, then don't actually do anything about it. There are a lot of utilities out there for keysigning party events that make attempts at making the whole process easier. In all reality, the only "difficult" or troublesome part about it is, is converting the fingerprints you have on paper to your computer. Some PGP keysigning organizers will already have a party public keyring, that they will email to those who attended, with only the keys of those that attended. If that's the case, you have it easy. Otherwise, you must do something like the following: $ gpg --recv-keys 0x<KEYID>

Where "<KEYID>" is the last 16 characters of the person's fingerprint. After you have their public key, then you can sign it with the following command for each key:

$gpg --default-cert-level 3 --sign-key 0x<KEYID> It's important that you add the "--default-cert-level 3" as part of the signing process. This certification level says that you have done very careful checking, and you are 100% confident that the key in question belongs to the owner, and that you have personally verified the owner's identity. After you have signed the key, it is courtesy to email them their public key with your signature. As such, you will need to export their key from your public keyring. You should do this for each key: $ gpg --armor --output /tmp/0x<KEYID>.asc --export 0x<KEYID>

Attach "/tmp/0x<KEYID>.asc" to your encrypted email, and send it off.

• Should I bring a computer to the keysigning party? No. It's not necessary, and many party organizers consider it a security liability. Things like swapping around USB sticks could infect viruses or other badware. If secret keys are on the computer, it's possible they could be compromised. Even worse, the physical computer itself could get damaged. It's generally best to just leave the computer in your backback or hotel room, and attend the party without it.
• Why should I care about signing PGP keys? Have you ever stopped to think about certificate authorities? When you create an SSL certificate signing request, you ship the CSR off to the CA, along with $200, and they returned a signed key. You then install your key on your web server, and browsers automatically trust data encrypted with it. All because you paid someone money. PGP keys do not have a centralized authority. As such, signing keys is done in an ad hoc manner. Further, money is not exchanged when signing keys. Instead, signing keys is done in person, with face-to-face contact. When you sign a key, you are ultimately saying that you have verified the owner is who they claim to be, and that the key you just signed belongs to them. As a result, a decentralized Web of Trust is built. • What is the Web of Trust really? The PGP Web of Trust is a decentralized web of connected keys, where the connection is made by cryptographically signing keys. The larger and more connected the Web of Trust is, the stronger the trust becomes for people to send private data to each other within that web. It sounds all geeky and mathematical, but if you just sit back and think about it, it makes sense. No money is involved, like using CAs. No blind trust is involved, such as the behavior of browsers. It's you and me, meeting face-to-face, claiming we are who we say we are, and claiming we own the key we have. Now, after this meeting, I can rest assured that if you send me cryptographically signed data from your key, I know it came from you, and no one else. If you send me encrypted data, I have a copy of your public key, and can decrypt it knowing that you were the one encrypting it. The Web of Trust is just that- establishing trust. • What is the PGP Strong Set? The largest and most connected Web of Trust is called the PGP Strong Set. There are more keys in this Web of Trust than any other. The way you get into the PGP Strong Set is by having someone in the Strong Set sign your key. A great deal of analysis has been done on the Strong Set. You can read about that analysis at http://pgp.cs.uu.nl/plot/. You can get key statistics such as mean signature distance (MSD), and calculate the distance from one key in the strong set to another key, such as yours that may not be in the strong set. My key, 0x8086060F is in the Strong Set. If ever I am at a keysigning party, and I sign your key, your key will also be included in the Strong Set. ## Announcing d-note: A Self Destructing Notes Application I'm pleased to announce something I've been working on, on and off, for over a year. Introducing d-note, a self hosted web application with self destructing notes. d-note is written in Python using the Flask web framework. d-note comes from the idea that sending private information across the Internet can be very insecure. Ask yourself- how often you've sent usernames and passwords in chat or email, either across the global Internet, or even inside of your work's Intranet? Maybe you've sent PINs, SSNs, credit cards, or other sensitive information too. Or maybe you haven't, but someone you know has. d-note aims to solve this problem. With d-note, notes are compressed and encrypted with Blowfish on disk, using either a shared key stored on the server, or a private key which is not stored. When a note is created, the sender is given a random URL which will link to the note. That URL can only be viewed once, and when viewed, it is immediately destroyed on the server. If you try to visit the note again, because it doesn't exist, a standard 404 error will be raised. Here are the current features in this release: • Data is compressed with zlib, then encrypted with Blowfish. Never at any point is the note in plaintext on the filesystem. • Form submission is protected by the browser minting a Hashcash token. This will prevent form spam, and POST denial of service attacks. As such, JavaScript must be enabled. • The note can be encrypted with a separate private key. This private key must be securely communicated to the recipient, so they may decrypt the note. The private key is not stored on the server. • The note can also be protected with a duress key, in case someone is trying to coerce the decryption key out of you. The duress key will immediately and silently destroy the note, without decrypting it, and redirect the browser to a standard 404 error. • Notes are destroyed immediately upon viewing. They are destroyed by securely overwriting the note with random data before removing it from the underlying filesystem. • Notes can be shared with mobile devices, by scanning a QR code. This allows you to share the note via SMS, email, instant message, or some other application installed on your mobile device. • Unread notes are automatically and securely destroyed after 30 days. • d-note tries its best at preventing the browser from caching the session. With that said, the back button is still functional. Because the application uses Hashcash to protect the submission form, JavaScript must be enabled to post a note. Don't worry, there is no identifying or tracking software in the default source code. Further, it may take your browser a few seconds to mint a valid token, before the form is submitted (it may even take longer if using a mobile device to create the note). Due to the nature of this application, some best practices and precautions should be made: • The web application MUST be served over HTTPS. • The server administrator hosting the d-note application could have modified the source code to store private keys, or even the notes in plaintext. As such, don't put all of your eggs in one basket. Give the usernames over one channel, and use d-note for the passwords, or vice versa. Even better, host this on your own web server, where you have full control. • Storing the encrypted notes in a ramdisk would be the most secure. However, if the web server stops, unread notes will be lost. There are still some things that need to be done, such as improving the overall look and feel with much needed CSS, and making the site a bit more dynamic with JavaScript. I'm also debating on creating account support, so you can view which notes you've created, and which ones have not been read. In the long term, I'd like to create an API, so you can create notes from other sources, such as the command line, or mobile device apps. However, for the time being, it works, it's feature complete (for a 0.1 release), and it's mostly bug free. If you would like to try a demo of d-note, you can visit https://ae7.st/d/. It's currently using a self-signed certificate. As such, to verify you have the right server, the certificate fingerprints are: MD5 Fingerprint=5A:E1:4E:B6:31:B8:3D:69:B1:D6:C0:A7:6B:46:FE:67 SHA1 Fingerprint=55:89:CD:C0:D4:85:CC:A5:DE:30:11:5D:9C:C9:12:1C:5C:9D:10:C5 SHA256 Fingerprint=12:91:BB:4C:E8:2F:1C:0C:D9:96:AF:4E:1D:8C:F7:B0:A8:07:70:C5:9C:89:B8:94:EE:E2:2A:D6:19:43:17:A4 ## The Drunken Bishop Cipher Is Weak Well, it turns out that my own hand cipher is incredibly weak. When I initially started designing it, using a chessboard felt a lot like an S-box lookup. There has been a great deal of research into S-boxes since the release of DES, and many ciphers today use them. What plagued me from day one, and I should have listened to my own intuition, is my chessboard key remains static throughout the entire operation. Unlike the Solitaire Cipher, by Bruce Schneier, where the cards in the deck are dynamically changing all the time. To get an idea of S-boxes, check out this AES animation (flash), which I think describes AES very well. With ciphers, you need an ingredient of non-linearity in the system. Otherwise, your cipher can fall victim to linear cryptanalysis. Initially, I had thought, incorrectly I might add, that by using the ciphertext as the starting direction of the bishop's walk, I was introducing the non-linearity that I needed. Turns out this isn't the case. Let's use my board from my initial announcement, and the trivial example of "AAAAAAAAAAAAAAA" as my plaintext. Here's my board: As per the algorithm, the bishop starts in "a1", which is valued at "38". Converted to binary, this gives us "100110", which means his travel is "SW" (no move), "NE", then finally "SW". He's back in the corner from which he started. Okay. No problem. Let's continue with the cipher then. Let's setup our worksheet. The character "A" as the value of "0", so my plaintext is:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +38 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ 38 Now we take our ciphertext, 38, and use this as the start of our next bishop walk. As you can immediately see, we have a problem. He stays stuck in the corner, and we get the following result:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 Our ciphertext is thus "mmmmm mmmmm mmmmm". While this may not seem like a practical example, it draws up a very real problem: this cipher is subject to a chosen plaintext attack. And, despite my best efforts to ensure that there are no repeated rounds in the cipher, here's a trivial example that shows they still exist. As such, the cipher is terribly linear. The BIG PROBLEM with this cipher, and one that was hanging over my head the entire time I was designing it, is that the board key remains static. However, all is not over. Ultimately, the board is just an 8x8 matrix. As such, we should be able to do some matrix operations, such as multiplicitive inverses, exclusive OR, addition, subtraction, and so forth. This might be a major problem when writing the board down on a piece of paper, but this is trivial for a calculator, such as the HP-48 to do. But then we lose the allure of using a pure hand cipher, as we begin relying on computing tools to manage the internal state of the system. At this point, why not just use a typical computer, and use a cipher that has been tried and true, such as AES? I must admit that I was sloppy when designing the cipher. I didn't take my abstract algebra and linear algebra into account. I should have looked at the math when initially designing it. Peter Maxwell, on the Cryptography Discussion mailing list, pointed out the following: If you view the moving-the-bishop as an s-box lookup, and apply it to itself three times (composition), you end up with another s-box of the same size, lets call it S. Given S doesn't change, things should be rather easy indeed. If your cipher is then roughly akin to C[n] = P[n] + S[ C[n-1] ] with all operations taken modulo 2^6 the problem should now be a little more obvious. Indeed. Basically, the cipher is weak due to two inherent problems with the system: 1. The internal state of the system is static. 2. The cipher does not contain a non-linear component. I really want to use a chessboard for a cipher. I think it would be a fantastic tool that you could basically hide in plain sight. But, given the initial design of this cipher, and how weak it really is, it just doesn't work. The drunken bishop may make pretty ASCII art pictures for SSH server keys, but when in comes to cryptography, it's had just too much wine to be practical. I'll hang this one up as a learning exercise, and move on. Designing hand ciphers is much more difficult than I had initially thought. ## The Drunken Bishop Cipher ## Background Ever since learning Bruce Schneier's Solitaire Cipher, I was interested in creating a hand cipher of my own. Unfortunately, I'm just an amateur cryptographer, and a lousy one at that. So I didn't have any confidence in creating my own hand cipher. However, after learning about the SSH ASCII art, and the drunken bishop, I was confident that I could turn this into a hand cipher. So, that's exactly what I did. Even though this is technically a hand cipher, and I've done my best to address some of the shortcomings internal to the system, I am not a genius mathematician. So, I can look at some of the numbers specifically, and give a broad sense of what the shortcomings might be, but I have not invested into a full scale cryptanalysis of the cipher. If someone stumbles upon this post, and is interested in launching such an attack, I would be interested in your feedback. This post is lengthy, so I'll separate everything with <h2> headers for easier visibility (a standard with my blog posts lately, it seems). ## Cipher Design Overview All that is needed is a standard 8x8 checker or chess board, and a marker, such as the bishop chess piece, or a checker, to make its way around the board. Each square on the board will be assigned a unique value from 0 through 63. The board should be assigned randomly, as this choice of number assignments is your key to encrypting and decrypting messages. However, at the end of this post, I'll give a few ideas on how you can key the board reliably without the need to communicate a random board. This cipher is a stream cipher. As with all stream ciphers, the output of n depends entirely on the accuracy of n-1. If n-1 is incorrect, then the rest of the process will be incorrect, and encrypting the plaintext from that point forward will be incorrect, which means decryption will not be possible. Thankfully, the board number assignments are static, so it shouldn't be difficult to double check your work, unlike the Solitaire Cipher, which requires keeping a backup copy of your keyed deck. Because there are 64 squares on the board, this allows us to use a base64 system for encryption and decryption. As such, we can use uppercase letters, lowercase letters, and digits. This will provide 62 of the characters. So, we can throw in white space, and padding at the end of the message, giving us our full 64 characters. One drawback with most hand ciphers is the lack of numbers support in the message. Either you have to spell out the numbers, lengthening the message, and as a result the time to encrypt and decrypt it, or you need to truncate them out, possibly creating confusion for the person decrypting the message. By having uppercase and lowercase letter support, we can now differentiate between proper names and not. ## The Board First, you must arrange the chess board such that square "a1" is in the lower left corner, as would be standard in a tournament chess match. This square should be black. Instead of referring to this corner as "a1", we'll refer to it as the "southwest corner", or just "SW". The other three corners on the board will be identified analogously: the lower right corner, or "h1" will be identified as the "southeast corner, or just "SE". The upper left corner, or "a8" will be identified as the "northwest corner", or "NW". Finally, the upper right corner, or "h8" will be identified as the "northeast corner", or just "NE". Now that we've identified the corners, we need to identify the edges of the board that are not a corner. On the bottom of the board, we'll identify this as edge "B", and the top of the board as edge "T". The left edge of the board will be identified as edge "L", and the right of the board as edge "R". Every additional square that is not identified as an edge or a corner will be identified as a middle or "M". After making these identifications, our board should have the following layout: ## The Bishop's Movement As in standard chess, the bishop may only move diagonal across the board. In standard chess, if the bishop is on a black square, then he will remain on the black square throughout game play. Our bishop is drunk, unfortunately. So, when our bishop encounters an edge; specifically, "B", "T", "L", or "R", then it's possible our bishop might switch space color from black to white, or from white to black. Any other time, our bishop is always moving diagonal as he would normally. So, we need to accommodate for when our bishop hits the wall or a corner, and still wishes to move. Let's look at the bottom edge first. Suppose our bishop traveled to square "e1" which has the value of "44" in our key. If the bishop wishes to move either diagonally NE or NW, that move in unrestricted. However, if the bishop wishes to move SW from square "e1", then it would step onto the white square "d1". If the bishop wishes to move SE from square "e1", then it would step onto the white square "f1". Similar rules hold for the other three edges. In summary then: • If at "B", and wishes to move SW, then the new square is (n-1,1). • If at "B", and wishes to move SE, then the new square is (n+1,1). • If at "T", and wishes to move NW, then the new square is (n-1,8). • If at "T", and wishes to move NE, then the new square is (n+1,8). • If at "L", and wishes to move NW, then the new square is (a,n+1). • If at "L", and wishes to move SW, then the new square is (a,n-1). • If at "R", and wishes to move NE, then the new square is (h,n+1). • If at "R", and wishes to move SE, then the new square is (h,n-1). If any additional movement is needed from an edge, then this means the bishop wishes to move away from the edge towards the middle of the board, and as such, it would do so in a standard diagonal manner, staying on its same color. Now that we've handled the four edges of the board, we need to handle the four corners. The movement is analogous to the edges, except for one move: • If at "SW", and wishes to move SW, no movement is made. • If at "SE", and wishes to move SE, no movement is made. • If at "NW", and wishes to move NW, no movement is made. • If at "NE", and wishes to move NE, no movement is made. If in the corner, and any other movement needs to be made, then use the previous edge rules. Knowing these rules, we can now describe where our drunk bishop moves when he lands on any square on the board. Now, we just need to generate a random board, which will determine the bishop's movement. When generating a random board, all 64 numbers from 0 through 63 must be assigned to a square. Each chessboard key is one of 64 factorial, or about the same size as a 296-bit symmetric key. Below is one such board that could be generated: . ## Generating the stream Because the board is now a static key, and doesn't change like the cards in the Solitaire Cipher, there is the possibility that the bishop could land on the same square at the end of the algorithm that he did at the start of the algorithm. As such, from that point forward, the same number would be generated in our stream, and our message would fall victim to a frequency analysis attack. If this doesn't happen, it is still possible that the bishop could land on a square at the end of his drunken walk that we've landed on before. This means our bishop will be caught in an infinite loop, generating the same sequence of numbers. To accommodate for both of these shortcomings, rather than use the number he is on for the start of his next walk, we will use the addition of the plaintext number and the stream number to produce an output number. This output number will determine the beginnings of his next walk. Each character in the plaintext will be given a value according to section 3 of RFC 3548. The only adjustments that will be made, is whitespace will be filled with the forward slash "/", and we will both pad the message modulo 5, as is standard with hand ciphers, and replace the full stop period with the plus character "+" at the end. The other punctuation and special characters must be stripped from the plaintext, as our base64 system cannot handle them. So, our plaintext message "Attack at dawn." would be converted to "Attack/at/dawn+" before we begin encrypting. ## The Four Movements The bishop always starts on the SW square, or "a1" at the beginning of each message. In order to know which way to travel, each square on the board describes three movements. This is done by converting the decimal number into binary, zero padded up to 6 bits wide. As such, the decimal number "0" will be the binary number "000000". The decimal number 38 will be the binary number "100110", and so forth. We'll call our 6-bit binary number a "word" for this cipher, even though in computer science, you learn that a binary word is 8 bits. We'll take our binary word, and divide it into 3 sections: the first two bits, the middle two bits, and the last two bits. So, for the case of "38", the binary word divided up would be "10" for the first two bits, "01" for the middle two bits, and "10" for the last two bits. There are four combinations that each of these bit pairs could be: "00", "01", "10", or "11". Because the bishop can move diagonally in one of four directions, each of these bit pairs describes the direction for each of his 3 moves. We will describe our movements as follows: • 00- NW • 01- NE • 10- SW • 11- SE So for the number "38", where our bishop starts in our random key that we chose earlier, the bishop would move "10" or SW, which would mean no movement, "01" or NE to square "b2", and finally "10" or SW, back to "a1". So, already we see that we have an inherent problem with the system, in that starting with "38" prevents the bishop from moving around the board. However, we'll add this to our plaintext number, and use our output number as the direction for the next walk. So, we won't be stuck for long. ## The Drunken Bishop Algorithm The algorithm is defined with the following steps: 1. Convert the previous output number to binary, and move the the bishop three spaces as described above based on this output number. 2. Convert the number of the square the bishop landed on to binary, and again move the bishop three spaces as based on this number. 3. Convert the number of the square the bishop landed on to binary, and again move the bishop three spaces as based on this number. 4. The bishop should have made a total of 9 movements. Note the number the bishop has landed on. This is your stream number. 5. Add the stream number to the plaintext number and to the index number modulo 64. This is your output number. Repeat the algorithm as necessary until you have generated all the output numbers to encrypt your message. To decrypt the message, follow the same algorithm above, but instead of addition modulo 64, use subtraction modulo64 to get back to the plaintext. ## Encryption Example For simplicity sake, we will use the chessboard key generated above, and we will encrypt "We confirm the delivery of 4 packages.". First, let's pad it modulo 5. We end up with "We confirm the delivery of 4 packages.++". Now convert all full stop periods to "+" and all spaces to "/". We now have: We/confirm/the/delivery/of/4/packages+++ Now me must convert each character to their decimal equivalent as found in RFC 3548, section 3. Thus, we end up with the numbers "22 30 63 28 40 39 31 34 43 38 63 45 33 30 63 29 30 37 34 47 30 43 50 63 40 31 63 56 63 41 26 28 36 26 32 30 44 62 62 62". Before the bishop starts the drunken walk around the board, let's setup a workspace, so it will be easy to do our modulo 64 addition. The top line will contain my plaintext numbers, the second line will contain our stream number. Both of these numbers will be added modulo 64 to produce our output number. I will be zero padding the decimal numbers as necessary:  22 30 63 28 40 39 31 34 43 38 63 45 33 30 63 29 30 37 34 47 30 43 50 63 40 31 63 56 63 41 26 28 36 26 32 30 44 62 62 62 + __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ Now our bishop is ready to take his random walk across the board. Let's use our key above, so you can follow along. Our bishop always starts at square SW or "a1". This has a value of "38" which is "100110" in binary. So, the bishop moves "SW", "NE", "SW", placing him back on the same square. We do this two more times, and our bishop has not moved. So, we write the number down as our stream number, and add it to 22 modulo 64:  22 30 63 28 40 39 31 34 43 38 63 45 33 30 63 29 30 37 34 47 30 43 50 63 40 31 63 56 63 41 26 28 36 26 32 30 44 62 62 62 + 38 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ 60 Our output number is 60, so we convert this to binary, and get "111100". So, the bishop moves "SE", "SE", "NW", placing him on square "b2" with a value of "35". We now convert 35 to binary, and get "100011". So, the bishop now moves "SW", "NW", "SE", placing him on square "a2" with a value of "04". We now convert 04 to binary, and get "000100". So, our bishop make the final move of "NW", "NE", "NW", placing him on square "d1" with a value of "26". We write down 26 in our worksheet, add to to 30 modulo 64, to get our output number of "56":  22 30 63 28 40 39 31 34 43 38 63 45 33 30 63 29 30 37 34 47 30 43 50 63 40 31 63 56 63 41 26 28 36 26 32 30 44 62 62 62 + 38 26 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ 60 56 Now we convert 56 to binary, and get "111000", and work our way through the algorithm getting our third output number. We continue in like fashion until our worksheet is complete:  22 30 63 28 40 39 31 34 43 38 63 45 33 30 63 29 30 37 34 47 30 43 50 63 40 31 63 56 63 41 26 28 36 26 32 30 44 62 62 62 <--- plaintext + 38 26 04 26 09 31 14 13 59 41 59 41 46 14 13 14 13 35 4 38 31 14 13 59 41 13 35 38 50 39 6 48 16 14 27 31 14 59 56 59 <--- final bishop square __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ 60 56 03 54 49 06 45 47 38 15 58 22 15 44 12 43 13 8 38 21 61 57 63 58 17 44 34 30 49 16 32 12 52 40 59 61 58 57 54 57 <--- output number Which in turn gives us the ciphertext: 84D2x GtvmP 6WPsM rrImV 94/6R siexQ gM0q7 96525 ## Decryption Example To decrypt our ciphertext from above, we must first convert the characters back to their RFC 3548 values. We'll refer to this number as our "input number". Our bishop will start in the SW corner, or square "a1" as he did for our encryption example. Also, like with did with encryption, we'll convert the "a1" number to binary for the first walk. After that, we'll use the ciphertext input numbers to determine his path. Lastly, we need to subtract modulo 64, rather than add, to reverse our work. So, as we did with our encryption example, let's setup our workspace:  60 56 03 54 49 06 45 47 38 15 58 22 15 44 12 43 13 8 38 21 61 57 63 58 17 44 34 30 49 16 32 12 52 40 59 61 58 57 54 57 <--- ciphertext number - 38 <--- final bishop square __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ 22 <--- plaintext Now convert "56" to binary, and have it do it's walk, 3 times, just like you would for encryption. You'll find that it ends up on square value "26". Subtract this value from our 2nd ciphertext number, to get back to our plaintext value of "30". Continue in a like manner, converting the ciphertext number to binary, starting the walk, doing two more binary conversions, to land on the right square. Subtract that number modulo 64, and you'll uncover your plaintext. ## Observations You may have noticed that you are always using the ciphertext number, either when encrypting, or decrypting, to start the bishop's initial walk. After which, the bishop makes two more walks around the board, based on the number he landed on. Because we are using the ciphertext number for the initial walk, we need to lengthen his path for a few reasons: 1. When decrypting, the additional walks places the bishop elsewhere in the board, preventing any knowledge of the key. 2. When both encrypting and decrypting, the 3 walks make it possible for the bishop to reach any number on the board, regardless of his location. This prevents our cipher from being focused on a set of squares based on his initial location. 3. By using the ciphertext to start the initial walk, we prevent the possibility of the bishop from getting stuck walking in a circle. A simple example is with the number "38" in the SW corner, that would normally prevent him from making any movement on the board. IE: his ending the location is the same as his starting location. Setting up an 8x8 key grid might be difficult. Certainly not much more difficult than keying a 52-card deck randomly. There may be creative ways to key the board, such as using previously played chess games, using passphrases, or some other method. I haven't had time to think about those yet. If you have good ideas, I'm very interested. At the moment, the best way to key the board, is to use a computer to randomly create a board assignment, and securely communicate the random assignment to your recipient. This cipher is a stream cipher, as already mentioned. As such, it is absolutely critical that you get every movement of the bishop right, and that your mathematics is exact. If you make a mistake, and continue encrypting the message after the mistake, the ciphertext may not be decipherable to plaintext. Double check your movements. Further, as with all symmetric ciphers, DO NOT USE THE SAME KEY TWICE. If the same key is used for two different messages, say C1 and C2, it's simple mathematics to extract the key. Think of it like this: (A+K) = C1 (B+K) = C2 C1-C2 = (A+K)-(B+K) = A+K-B-K = A-B+K-K = A-B In other words, using the same key, reveals the plaintext messages. This might not be trivial for you to pull off, but it is for a serious cryptographer. Don't share the key across messages. Just generate a new key every time you wish to encrypt. Lastly, I have found this cipher a bit faster to encrypt and decrypt messages than the solitaire cipher, but I make no guarantees to its strength. However, this is real cryptography. Not stenography or obscurity. This cipher is a pseudo random number generator, that you apply to your plaintext to produce a random output. I still have work to do to, such as frequency analysis, and discovering if any bias exists, but upon initial inspection, it seems to hold up well. As with all cryptography, however, only time will tell. ## Drunken Bishop Cipher Recommendations 1. Although a chess board can be use, it's not required. If you can draw an 8x8 grid on a piece of paper, populated with the random key, then you are ready to start encrypting and decrypting. 2. Never share a key when encrypting messages. Always use a different key. 3. Use a number 2 pencil, and not ink, when doing your work. Ink bleeds, and can leave traces of your message or work. 4. Use ungummed cigarette paper when actually doing your work. They burn completely, but slowly. 5. Do as much of the work in your head as possible, and no not write on impressionable surfaces, such as a pad of paper. 6. Work with a candle or a lighter. If someone breaks in while you are encrypting or decrypting messages, calmly burn the papers containing your work. 7. Assume that Big Brother is aware that you are using the Drunken Bishop to encrypt and decrypt your messages. The secret lies in the 8x8 grid key, not in the algorithm. Of course, this doesn't mean you need to advertise that you are using the Drunken Bishop either. The ciphertext should appear as random as possible, with no obvious clues as to how it was created. 8. Practice makes perfect. After practicing the Drunken Bishop, you should be able to immediately convert a number from 0 through 63 to binary without any external tools. This will speed things up. 9. Which reminds me, the Drunken Bishop is slow, although not as slow as Solitaire. It will probably take you about 30 seconds per character. Keep that in mind; you may need a quiet, secure place with several hours. However, you shouldn't have cramped hands while working the Drunken Bishop, like you get with Solitaire. ## Disclaimer I am not a professional cryptographer. I study cryptography as a hobby. As such, I do not make any guarantees about the security of this cipher. However, I have studied RC4, as well as stream ciphers in general, and have a thorough understanding of the "big picture" as to what the internals of the cipher should be doing. The Drunken Bishop does its best to follow these practices. I graduated with a degree in Applied Mathematics, and have taken and studied Number Theory, as applied to cryptography. I have not done any cryptanalysis on this cipher yet. My next goal is to write a Python program that can read a generated 8x8 key, read a text file, and encrypt and decrypt it. Only then will I be able to get more insight into the output that the Drunken Bishop provides, such as biases or other internal problems that I have not addressed. If you go ahead and write a utility to do this testing, and find some interesting results, I would be interested in your feedback, and will publish your results here on this post. ## Cryptanalysis Turns out this cipher is incredibly weak. It suffers from two problems that make it fall victim to linear cryptanalysis and a chosen plaintext attack: 1. The chessboard (an S-box) remains static during the algorithm. 2. There is no non-linear component to the system. It turns out, designing hand ciphers is incredibly difficult. However, I wrote a follow-up post describing how weak it really is. ## ZFS Administration, Appendix D- The True Cost Of Deduplication ## Table of Contents Zpool Administration ZFS Administration Appendices 0. Install ZFS on Debian GNU/Linux 9. Copy-on-write A. Visualizing The ZFS Intent Log (ZIL) 1. VDEVs 10. Creating Filesystems B. Using USB Drives 2. RAIDZ 11. Compression and Deduplication C. Why You Should Use ECC RAM 3. The ZFS Intent Log (ZIL) 12. Snapshots and Clones D. The True Cost Of Deduplication 4. The Adjustable Replacement Cache (ARC) 13. Sending and Receiving Filesystems 5. Exporting and Importing Storage Pools 14. ZVOLs 6. Scrub and Resilver 15. iSCSI, NFS and Samba 7. Getting and Setting Properties 16. Getting and Setting Properties 8. Best Practices and Caveats 17. Best Practices and Caveats This post gets filed under the "budget and planning" part of systems administration. When planning out your ZFS storage pool, you will need to make decision about space efficiency, and the cost required to build out that architecture. We've heard over and over that ZFS block deduplication is expensive, and I've even mentioned it on this blog, but how expensive is it really? What are we looking at out of pocket? That's what this post is about. We'll look at it from two perspectives- enterprise hardware and commodity hardware. We should be able to make some decent conclusions after looking into it. We're only going to address storage, not total cost which would include interconnects, board, CPU, etc. Those costs can be so variable, it can make this post rather complicated. So, let's stick with the basics. We're going to define enterprise hardware as 15k SAS drives and SLC SSDs, and we'll define commodity hardware as 7200 SATA drives and MLC SSDs. In both cases, we'll stick with high quality ECC DDR3 RAM modules. We'll use a base ZFS pool of 10TB. So, without further ado, let's begin. ## Determining Disk Needs Before we go off purchasing hardware, we'll need to know what we're looking at for deduplication, and if it's a good fit for our data needs. This can be a hard puzzle to solve, without actually storing all the data in the pool, and seeing when you end up. However, here are a few ideas for coming to that solution (the "three S tests"): 1. Sample Test: Get a good representative sample of your data. You don't need a lot. Maybe 1/5 of the full data. Just something that represents what will actually be stored. This will be the most accurate test, provided you get a good sample- the more, the better. Store that sample on the deduplicated pool, and see where you end up with your dedupratio. 2. Simulation Test: This will be less accurate than the sample test above, but it can still give a good idea of what you'll be looking at. Run the "zfs -S" command, and see where the cards fall. This will take some time, and may stress your pool, so run this command off hours, if you must do it on a production pool. It won't actually deduplicate your data, just simulate it. Here is actual an actual simulation histogram from my personal ZFS production servers: # zdb -S Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 5.23M 629G 484G 486G 5.23M 629G 484G 486G 2 860K 97.4G 86.3G 86.6G 1.85M 215G 190G 190G 4 47.6K 4.18G 3.03G 3.05G 227K 19.7G 14.2G 14.3G 8 11.8K 931M 496M 504M 109K 8.49G 4.25G 4.33G 16 3.89K 306M 64.3M 68.3M 81.8K 6.64G 1.29G 1.37G 32 5.85K 499M 116M 122M 238K 17.9G 4.64G 4.86G 64 1.28K 43.7M 20.0M 21.0M 115K 3.74G 1.69G 1.79G 128 2.60K 50.2M 20.0M 22.0M 501K 9.22G 3.62G 3.99G 256 526 6.61M 3.18M 3.62M 163K 1.94G 946M 1.06G 512 265 3.25M 2.02M 2.19M 203K 2.67G 1.72G 1.86G 1K 134 1.41M 628K 720K 185K 2.13G 912M 1.02G 2K 75 1.16M 188K 244K 222K 3.37G 550M 716M 4K 51 127K 85.5K 125K 254K 657M 450M 650M 8K 2 1K 1K 2.46K 26.7K 13.3M 13.3M 32.8M 16K 1 512 512 1.94K 31.3K 15.6M 15.6M 60.7M Total 6.15M 732G 574G 576G 9.38M 920G 708G 712G dedup = 1.24, compress = 1.30, copies = 1.01, dedup * compress / copies = 1.60 3. Supposed Test: Basically, just guess. It's by far the least accurate of our testing, but you might understand your data better than you think. For example, is this 10TB server going to be a Debian or RPM package repository? If so, the data is likely highly duplicated, and you could probably get close to 3:1 savings, or better. Maybe this server will store a lot of virtual machine images, in which case the base operating system will be greatly duplicated. Again, your ratios could be very high as a result. But, you know what you are planning on storing, and what to expect. Now you'll have a deduplication ratio number. In my case, it's 1.24:1. This number will help us "oversubscribe" our storage. In order to determine how much disk to purchase, our equation should be: Savings = Need - (Need / Ratio) With my ratio of 1.24:1, which is running about a dozen virtual machines, rather than purchasing the full 10TB of disk, we really only need to purchase 8TB of disk. This is a realistic expectation. So, I can save purchasing 2TB worth of storage for this setup. The question then becomes whether or not those savings are worth it. ## Determining RAM Needs Ok, now that we know how much disk to purchase, we now need to determine how much RAM to purchase. Already, we know that the deduplication table (DDT) will occupy no more than 25% of installed RAM, by default. This is adjustable with the kernel module, but we'll stick with default for this post. So, we just need to determine how large that 25% is, so we can understand exactly how much RAM will be needed to safely store the ARC without spilling over to spinning platter disk. In order to get a handle on this metric, we have two options: 1. Counting Blocks: With the "zdb -b" command, you can count the number of currently used blocks in your pool. As with the "zdb -S" command, this will stress your pool, but it will give you the most accurate picture of what to expect with a deduplication table. Below is an actual counting of block on my production servers: # zdb -b pool Traversing all blocks to verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 11975124 bp logical: 1023913523200 avg: 85503 bp physical: 765382441472 avg: 63914 compression: 1.34 bp allocated: 780946764288 avg: 65214 compression: 1.31 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 780946764288 used: 39.19% In this case, I have 11975124 used blocks, and my 2 TB pool is 39.19% full, or 784GB. Thus, each block is about 70KB in size. You might see something different. According to Oracle, each deduplicated block will occupy about 320 bytes in RAM. Thus, 2TB divided by 70KB blocks gives a total storage space of about 30,700,000 total blocks. 30,700,000 blocks multiplied by 320 bytes, is 9,824,000,000 bytes, or 9.8GB of RAM for the DDT. Because the DDT is no more than 25% of ARC, and the ARC is typically 25% of RAM, I need at least 156.8GB, or basically 160GB of installed RAM to prevent the DDT from spilling to spinning platter disk. 2. Rule of Thumb: This is our "rule of thumb" rule that you've read in this series, and elsewhere on the Internet. The rule is to assign 5GB of RAM for every 1TB of disk. This ratio comes from the fact that a deduplicated block seems to occupy about 320 bytes of storage in RAM, and your blocks could occupy anwhere between 512 bytes to 128KB, usually averaging about 64KB in size. So, the ratio sits around 1:208, which is where we come up with the "5GB RAM per 1TB disk" metric. So with a 10TB pool, we can expect to need 50GB of RAM for the DDT, or 200GB of RAM for the ARC. In both cases, these RAM installations might just be physically or cost prohibitive. In my servers, the motherboards do not allow for more than 32GB of physically installed RAM modules. So 40GB isn't doable. As such, is deduplication out of the question? Not necessarily. If you have a fast SSD, something capable of 100k IOPS, or roughly the equivalent of your RAM install, then you can let the DDT spill out of RAM onto the L2ARC, and performance will not be impacted. A 256GB SSD is much more practical than 200GB of physical RAM modules, both in terms of physical limitations and cost prohibition. ## Enterprise Hardware Without SSD 15k SAS drives don't come cheap. Currently, the Seagate Cheetah drives go for about$1 per 3GB, or about $330 per 1TB. So, for the 8TB we would be spending on disk, we would be spending about$2600 for disk. We already determined that we need about 200GB of space for the ARC. If we need to fit everything in RAM, and our motherboard will support the install size, then ECC registered RAM goes for about $320 per 16GB (how convenient). I'll need at least 14 sticks of 16GB RAM modules. This would put my RAM cost at about$4480. Thus my total bill for storage only would be $7080. I'm only saving$670 by not purchasing 2 disks to save on deduplication.

With SSD
Rather than purchasing 14 16GB memory modules, we could easily purchase an enterprise 256GB fast SLC SSD for about $500. A 256GB SSD is attractive, because as an L2ARC, it will be storing more than just the DDT, but other cached pages from disk. The SSD could also be partitioned to store the ZIL, acting as a SLOG. So, we'll only need maybe 16GB installed RAM (2x8GB modules for dual channel), which would put our RAM cost at$320, our SSD cost at $500 and our drive cost at$2600, or $3420 for the total setup. This is half of the initial price using only ECC RAM to fit the DDT. That's significant, IMO. Again, I only saved$670 by not purchasing 2 disks.

Without SSD
7200 SATA drives come cheap these days. ZFS was designed with commodity disk in mind, knowing it's full of failures and silent data corruption. I can purchase a single 2TB disk for $80 right now, brand new. Four of those put my total cost at$320. However, the ECC RAM doesn't change, and if I needed 14 of the 16GB sticks as with my enterprise setup, then I can count on my total cost for this commodity setup at $4800. But, a RAM install of that size does not make sense for a "commodity" setup, so let's reduce the RAM footprint, and add an SSD. With SSD A fast commodity SSD puts us at the MLC SSDs. The 256GB Samsung 840 Pro is going for$180 right now, and can sustain 100k IOPS, which could possibly rival your DDR3 RAM. So, again sticking with 4 2TB drives at $320, 16GB of RAM at$320 and our Samsung SSD at $180 our total cost for this setup is$820, only saving $80 by not purchasing an additional 2TB SATA drive. This is by far the most cost effective solution. ## Additional Hidden Costs & SSD Performance Considerations When we made these plans on purchasing RAM, we were only considering the cost of storing the ARC and the DDT. We were not considering that your operating system will still need room outside of the ARC to operate. Most ZFS administrators I know won't give more than 25% of RAM to the ARC, on memory intensive setups, and no more than 50% on less memory intensive setups. So, for our 200GB ARC requirement, it may be as much as 400GB of RAM, or even 800GB. I have yet to administer a server with that sort of RAM install. So, SSDs all of the sudden become MUCH more attractive. If you decide to go the route of an SSD for an L2ARC, you need to make sure that it performs on par with your installed RAM, otherwise you'll see a performance hit when doing lookups in your DDT. It's expected for DDR3 RAM to have a rate of 100k to 150k sustained sequential read/write IOPS. Getting an SSD to perform similarly means getting the high end SATA connected SSDs, or low end PCIe connected, such as the OCZ RevoDrive. However, suppose you don't purchase an SSD that performs equally with your DDR3 modules. Suppose your DDR3 sustains 100k IOPS, but your SSD only does 20k IOPS. That's 5x as slow as DDR3 (spinning 7200 RPM disk only sustains about 100 IOPS). With as frequently as ZFS will be doing DDT lookups, this is a SIGNIFICANT performance hit. So, it's critical that your L2ARC can match the same bandwidth as your RAM. Further, there's a hidden cost with SSDs, and that's reliability. Typical enterprise SLC SSDs can endure about 10k write cycles, with wear leveling, before the chips begin to wear down. However, for commodity, more "consumer grade" SSDs, they will only sustain about 3k-5k write cycles. Don't fool yourself though. For our 256GB SSD, this means you can write 256GB 10,000 times, or 2.56PB worth of data on a SLC SSD, or 256GB 3,000 times, or 768TB on an MLC SSD. That's a lot of writing, assuming again, that the SSDs have wear leveling algorithms on board. But, the SSD may fail early, which means the DDT spilling to disk, and completely killing performance of the pool. By putting a portion of the DDT on the SSD, the L2ARC becomes much more write intensive, as ZFS expands the DDT table for new deduplicated blocks. Without deduplication, the L2ARC is less write intensive, and should be very read intensive. So, by not using deduplication, you can lengthen the life of the SSD. ## Conclusion So, now when you approach the CFO with your hardware quote, with a dedup ratio of 1.24:1, it's going to be a hard sell. With commodity hardware using an SSD, you're getting close to your savings in disk (10.25:1), as compared to enterprise hardware where you're spending much, much more to get close to those space savings (5.10:1). But, with commodity hardware, you're spending 1/4 of the enterprise equivalent. In my opinion, it's still too costly. However, if you can get your ratio close to 2:1, or better, then it may be a good fit. You really need to know your data, and you really need to be able to demonstrate that you will get solid ratios. A storage server of virtual machines might be a good fit, or where you have redundant data that is nearly exactly the same. For general purpose storage, especially with a ratio of 1.24:1, it doesn't seem to be worth it. However, you're not out of luck, if you wish to save disk. For good space savings on your disk, that you get nearly for free, I strongly encourage compression. Compression doesn't tax the CPU, even for heavy workloads, provides similar space savings (in my example above, I am getting a 1.3:1 compression ratio versus 1.24:1 dedup ratio), doesn't require an expensive DDT, and actually provides enhanced performance. The extra performance comes from the fact that highly compressable data does not need to physically write as much data to slow spinning platter, and also does not need to read as much physical disk. Your spinning disk is the slowest bottleneck in your infrastructure, so anything you can do to optimize the reads and writes could provide large gains. Compression wins here. Hopefully, this post helps you analyze your deduplication plans, and identify the necessary costs. ## ZFS Administration, Appendix C- Why You Should Use ECC RAM ## Table of Contents Zpool Administration ZFS Administration Appendices 0. Install ZFS on Debian GNU/Linux 9. Copy-on-write A. Visualizing The ZFS Intent Log (ZIL) 1. VDEVs 10. Creating Filesystems B. Using USB Drives 2. RAIDZ 11. Compression and Deduplication C. Why You Should Use ECC RAM 3. The ZFS Intent Log (ZIL) 12. Snapshots and Clones D. The True Cost Of Deduplication 4. The Adjustable Replacement Cache (ARC) 13. Sending and Receiving Filesystems 5. Exporting and Importing Storage Pools 14. ZVOLs 6. Scrub and Resilver 15. iSCSI, NFS and Samba 7. Getting and Setting Properties 16. Getting and Setting Properties 8. Best Practices and Caveats 17. Best Practices and Caveats ## Introduction With the proliferation of ZFS into FreeBSD, Linux, FreeNAS, Illumos, and many other operating systems, and with the introduction of OpenZFS to unify all the projects under one collective whole, more and more people are beginning to tinker with ZFS in many different situations. Some install it on their main production servers, others install it on large back-end storage arrays, and even yet, some install it on their workstations or laptops. As ZFS grows in popularity, you'll see more and more ZFS installations on commodity hardware, rather than enterprise hardware. As such, you'll see more and more installations of ZFS on hardware that does not support ECC RAM. The question I pose here: Is this a bad idea? If you spend some time searching the Web, you'll find posts over and over on why you should choose ECC RAM for your ZFS install, with great arguments, and for good reason too. In this post, I wish to reiterate those points, and make the case for ECC RAM. Your chain is only as strong as your weakest link, and if that link is non-ECC RAM, you lose everything ZFS developers have worked so hard to achieve on keeping your data from corruption. ## Good RAM vs Bad RAM vs ECC RAM To begin, let's make a clear distinction between "Good RAM" and "Bad RAM" and how that compares to "ECC RAM": • Good RAM- High quality RAM modules with a low failure rate. • Bad RAM- Low quality RAM modules with a high failure rate. • ECC RAM- RAM modules with error correcting capabilities. "Bad RAM" isn't necessarily non-ECC RAM. I've deployed bad ECC RAM in the past, where even though they are error correcting, they fail frequently, and need to be replaced. Further, ECC RAM isn't necessarily "Good RAM". I've deployed non-ECC RAM that has been in production for years, and have yet to see a corrupted file due to not having error correction in the hardware. The point is, you can have exceptional non-ECC "Good RAM" that will never fail you, and you can have horrid ECC "Bad RAM" that still creates data corruption. What you need to realize is the rate of failure. An ECC RAM module can fail just as frequently as a non-ECC module of the same build quality. Hopefully, the failure rate is such that ECC can fix the errors it detects, and still function without data corruption. But just to beat a dead horse dead, ECC RAM and hardware failure rates are disjointed. Just because it's ECC RAM does not mean that the hardware fails less frequently. All it means is that it detects the failures, and attempts to correct them. ## ECC RAM Failure rates are hard to get a handle on. If you read the Wikipedia article on ECC RAM, it mentions a couple studies that have been attempted to get a handle on how often bit errors occur in DIMM modules: Work published between 2007 and 2009 showed widely varying error rates with over 7 orders of magnitude difference, ranging from 10^(−10) [to] 10^(−17) error/bit-h[ours], roughly one bit error, per hour, per gigabyte of memory to one bit error, per millennium, per gigabyte of memory. A very large-scale study based on Google's very large number of servers was presented at the SIGMETRICS/Performance’09 conference. The actual error rate found was several orders of magnitude higher than previous small-scale or laboratory studies, with 25,000 to 70,000 errors per billion device hours per megabit (about 2.5^(–7) x 10^(−11) error/bit-h[ours])(i.e. about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error rate), and more than 8% of DIMM memory modules affected by errors per year. So roughly, from what Google was seeing in their datacenters, 5 bit errors in 8 GB of RAM per hour in 8% of their installed RAM. If you don't think this is significant, you're fooling yourself. Most of these bit errors are caused by background radiation affecting the installed DIMMs, due to neutrons from cosmic rays. But voltage fluctuations, bad circuitry, and just poor build quality can also come in as factors to "bit flips" in your RAM. ECC RAM works by detecting this bad bit by using an extra parity bit per byte. In other words, for every 8 bits, there is a 9th parity bit which operates as the checksum for the previous 8. So, for a DIMM module registering itself as 64 GB to the system, there is actually 72 GB physically installed on the chip to give space for parity. However, it's important to note that ECC RAM can only correct 1 bit flip per byte (8 bits). If you have 2 bit flips per byte, ECC RAM will not be able to recover the data. ZFS was designed to detect silent data errors that happen due to hardware and other factors. ZFS checksums your data from top to bottom to ensure that you do not have data corruption. If you've read this series from the beginning, you'll know how ZFS is architected, and how data integrity is first priority for ZFS. People who use ZFS use it because they cannot stand data corruption anywhere in their filesystem, at any time. However, if your RAM is not ECC RAM, then you do not have the guarantee that your file is not corrupt when stored to disk. If the file was corrupted in RAM, due to a frozen bit in RAM, then when stored to ZFS, it will get checksummed with this bad bit, as ZFS will assume the data it is receiving is good. As such, you'll have corrupted data in your ZFS dataset, and it will be checksummed corrupted, with no way to fix the error internally. This is bad. ## A scenario To drive the point home further about ECC RAM in ZFS, let's create a scenario. Let's suppose that you are not using ECC RAM. Maybe this is installed on your workstation or laptop, because you like the ZFS userspace tools, and you like the idea behind ZFS. So, you want to use it locally. However, let's assume that you have non-ECC "Bad RAM" as defined above. For whatever reason, you have a "frozen bit" in one of your modules. The DIMM is only storing a "0" or a "1" in a specific location. Let's say it always reports a "0" due to the hardware failure, no matter what should be written there. To keep things simple, we'll look at 8 bits, or 1 byte in our example. I'll show the bad bit with a red "0". Your application wishes to write "11001011", but due to your Bad RAM, you end up with "11000011". As a result, "11000011" is sent to ZFS to be stored. ZFS adds a checksum to "11000011" and stores it in the pool. You have data corruption, and ZFS doesn't know any different. ZFS assumes that the data coming out of RAM is intentional, so parity and checksums are calculated based on that result. But what happens when you read the data off disk and store it back in your faulty non-ECC RAM? Things get ugly at this point. So, you read back "11000011" to RAM. However, it's stored in _almost_ the same position before it was sent to disk. Assume it is stored only 4 bits later. Then, you get back "01000011". Not only was your file corrupt on disk, but you've made things worse by storing them back into RAM where the faulty hardware is. But, ZFS is designed to correct this, right? So, we can fix the bad bit back to "11000011", but the problem is that the data is still corrupted! Things go downhill from here. Because this is a physical hardware failure, we actually can't set that first bit to "1". So, any attempt at doing so, will immediately revert it back to "0". So, while the data is stored in our faulty non-ECC RAM, the byte will remain as "01000011". Now, suppose we're ready to flush the data in RAM to disk, we've compounded our errors by storing "01000011" on platter. ZFS calculates a new checksum based on the newly corrupted data, again assuming our DIMM modules are telling us the truth, and we've further corrupted our data. As you can see, the more we read and write data to and from non-ECC RAM, the more we have a chance of corrupting data on the filesystem. ZFS was designed to protect us against this, but our chain is only as strong as the weakest link, which in this case is non-ECC RAM corrupting our data. You might think that backups can save you. If you have a non-corrupted file you can restore from, great. However, if your ZFS snapshot, or rsync(1) copied over the corrupted bit to your backups, then you're sunk. And ZFS scrubbing won't help you here either. As already mentioned, you stored corrupted data on disk correctly, meaning the checksum for the corrupted byte is correct. However, if you have additional bad bits in RAM, scrubbing will actually try to "fix" the bad bits. But, because it's a hardware failure, causing a frozen bit, the scrub will continue, and continue, thrashing your pool. So, scrubbing will actually bring performance to its knees, trying to fix that one bad bit in RAM. Further, scrubbing won't fix bad bits already stored in your pool that have been properly checksummed. No matter how you slice and dice it, you trusted your non-ECC RAM, and your trusty RAM failed you, with no recourse to fall back on. ## ECC Pricing Due to the extra hardware on the DIMM, ECC RAM is certainly more costly than their non-ECC counterparts, but not by much. In fact, because ECC DIMMs have 9/8 additional more hardware, the price pretty closely reflects that. In my experience, 64 GB of ECC RAM is roughly 9/8 more costly than 64 GB of non-ECC RAM. Many general purpose motherboards will support unbuffered ECC RAM also, although you should choose a motherboard that supports active ECC scrubbing, to keep bit corruption minimized. You can get high quality ECC DDR3 SDRAM off of Newegg for about$50 per 4 GB. Non-ECC DDR3 SDRAM retails for almost exactly the same price. To me, it just seems obvious. All you need is a motherboard supporting it, and Supermicro motherboards supporting ECC RAM can also be relatively inexpensive. I know this is subjective, but I recently built a two-node KVM hypervisor shared storage cluster with 32 GB of registered ECC RAM in each box with Tyan motherboards. Total for all 32 GB was certainly more costly than everything else in the system, but I was able to get them at ~ $150 per 16 GB, or$600 for all 64 GB total. The boards were ~$250 each, or$500 total for two boards. So, it total, for two very beefy servers, I spent ~1100, minus CPU, disk, etc. To me, this is a small investment to ensure data integrity, and I would not have saved much going the non-ECC route. The very small extra investment was very much worth it, to make sure I have data integrity from top-to-bottom. ## Conclusion ZFS was built from the ground up with parity, mirroring, checksums and other mechanisms to protect your data. If a checksum fails, ZFS can make attempts at loading good data based on redundancy in the pool, and fix the corrupted bit. But ZFS is assuming that a correct checksum means the bits were correct before the checksum was applied. This is where ECC RAM is so critical. ECC RAM can greatly reduce the risk that your bits are not correct before they get stored into the pool. So, some lessons you should take away from this article: • ZFS checksums assume the data is correct from RAM. • Regular ZFS scrubs will greatly reduce the risk of corrupted bits, but can be your worst enemy with non-ECC RAM hardware failures. • Backups are only as good as the data they store. If the backup is corrupted, it's not a backup. • ZFS parity data, checksums, and physical data all need to match. When they don't, repairs start taking place. If it is corrupted out the gate, due to non-ECC RAM, why are you using ZFS again? Thanks to "cyberjock" on the FreeBSD forums for inspiring this post. ## Hello Morse Code As many of you may know, I am a licensed Amateur Radio operator in the United States. Recently, I've taken up a desire to learn Morse Code at a full 25 WPM using the Koch method. I only started last week, and tonight I copied my first beginners code "A NOTE TO TENNESSEE", and other such silliness. I don't know how many of my readers are hams, and how many of them know their CW. Some equipment that I'm practicing with: • Morse Code Trainer- An Android application for both sending (tapping the screen) and receiving. It's flexible in that you can choose what to listen to, your speed, as well as your tone frequency and volume. Currently, I'm using it to largely just receive. • MFJ-557 code oscillator with key. This is very much a beginners straight key, but it comes with its own speaker, so you can hear how you sound when you transmit. • Morse Code Reader- Another Android application, this time for using the MIC input to listen to outside noise, and translate that to letters. I've found it to be somewhat unreliable, even in a quiet room, with only code to be heard. With that said, during the beginning stages, it seems to be more reliable than me on what it picks up and translates. So, it's good to look back, and see what I got wrong, and where I need improvement. I'm hoping by the end of the year, I can copy code at a full 25 WPM with 90% or better accuracy. I'm not actually on planning to work on transmitting and spacing until next year. ## Goodbye Ubuntu In 1999, I discovered GNU/Linux. Before then, I was a Solaris fanboy. Solaris could do no wrong, and it even took until about 2003 before I finally made the plunge, and removed Solaris off my Sun Ultra 1 (complete with 21" CRT monitor), and put Debian GNU/Linux on it. It was either Debian, or Gentoo that had SPARC support, and compiling software from source didn't sound like a lot of fun. I also had an HP laptop. It ran SUSE, Red Hat Linux, and various other distros, until it too settled on Debian. Then, in October of 2004, while at a local LUG meeting, I learned of this Debian fork called "Ubuntu". I gave it a try. I switched from using Debian to Ubuntu on my laptop. I liked the prospects of using something that had more frequent stable releases. After which, I helped setup the Ubuntu Utah users group. We had install fests, meetings, and other activities. Our group grew fast and strong. Then, I helped to start the Ubuntu US Teams project, getting local state and regional groups, strong like the Utah group was. Eventually, I applied for Ubuntu membership, and in 2006, I got my membership, syndicated my blog to the Ubuntu Planet, and I have been here since. Sometime around 2008, things started changing in the Ubuntu culture, and it was becoming difficult to enjoy working on it. I'm not going to list everything that Canonical has done to Ubuntu, but it's been steady. Not committing patches upstream to Linux mainline. Breaking ties with the Debian project, including rolling their own packages. Group development moved to centralized development. Copyright assignments. Switching from GNOME to Unity. Then Unity lenses and Amazon advertising. Over and over, things began changing, and as a result, the culture changed. I stopped really loving Ubuntu. Eventually, I went back to Debian for my servers, laptops and workstations. Ubuntu isn't Unix anymore. It's Apple, and I'm not sure I like the change. Now, Micah Flee, who works for the EFF, put up a "sucks site" showing how to disable the privacy violations in Unity. Rather than take it in stride, Canonical has decided to abuse trademark law, and issue a cease and desist notice of the Fix Ubuntu site. United States courts have shown over and over than "sucks sites" are free speech, fair use, and do not infringe on the company mark. In fact, no where on the Fix Ubuntu site is the actual Ubuntu trademark. No logo, no marks, nothing. Just text. Yet, Canonical wants to silence their critics using a heavy hand. To be fair, their notice is less grumpy and bullying than most cease and desist notices. However, it doesn't change the principle. I can't be associated with a project like this any longer. Effective immediately, my blog will no longer on the Ubuntu Planet. My Ubuntu Membership will be cancelled. My "UBUNTU" license plates, which have been on my car since August 2006, will be removed, in favor of my Amateur Radio callsign. I wish everyone in the Ubuntu community the best of wishes. I also hope you have the power to change Ubuntu back to what it used to be. I have no ill feelings towards any person in the Ubuntu community. I just wish to now distance myself from Ubuntu, and no longer be associated with the project. Canonical's goals and visions do not align with something I think should be a Unix. Don't worry though- I'll keep blogging. You can't get that out of my blood. Ubuntu just isn't for me any longer. Goodbye Ubuntu. ## Real Life NTP I've been spending a good amount of my spare time recently configuring NTP, reading the documentation, setting up both a stratum 1 and stratum 2 NTP server, and in general, just playing around with NTP. This post is meant to be a set of notes of what I've learned in the process, and hopefully, it can benefit you. It's not meant to be an exhaustive, or authoritative set of instructions on how you should configure your own NTP installation. Strata Before getting into the client configuration, we need to understand how NTP serves time to clients. We need to understand the concept of "strata" or "stratum". An authoritative time source, such as GPS satellites, cesium atomic fountains, WWVB radio waves, and so forth, are referred to as "stratum 0" clocks. They are authoritative, because they have some way of maintaining extremely accurate timekeeping. Any time source will suffice, including a standard quartz oscillating clock. However, knowing that quartz based clocks can gain or lose up to 15 seconds per month, we don't generally use them as time sources. Instead, we're interested in time sources that don't gain or lose a second in 300,000 years, as an example. Computers that connect to these accurate time sources to set their local time are referred to as "stratum 1" time sources. Because there is some inherent latencies involved with connecting to the stratum 0 time source, and the latencies involved with setting the time, as well as the drift that the stratum 1 clocks will exhibit, these stratum 1 computers may not be as accurate as their stratum 0 neighbors. In real life, the clocks on good stratum 1 computers will probably drift enough that their time will be off by a couple microseconds, compared to the stratum 0 source that their are getting their time. Computers that connect to stratum 1 computers to synchronize their clocks are referred to as "stratum 2" time sources. Again, due to many latencies involved, stratum 2 clocks may not be as accurate as their stratum 1 neighbors, and even worse compared to the further upstream stratum 0 time sources. In practice, your stratum 2 server will probably be off from its stratum 1 upstream server by anywhere from a few microseconds to a few milliseconds. Many factors come into play in how this is calculated, but realize that stratum 2 computers, in practice, are probably the furthest time source from stratum 0 that you want to synchronize your clocks with. As you would expect, stratum 3 clocks are connected upstream to stratum 2 clocks. Stratum 4 clocks are connected upstream to stratum 3 clocks, and so forth. Once you reach the lowest level of stratum 16, the clock is now considered to be unsynchronized. So again, in practice, you probably don't want to sync your computers clock with any strata lower than 2, thus making your computer a stratum 3. At this point, you're far enough away from the "true time" source, that your computer could exhibit time offsets anywhere from a few milliseconds to several hundred milliseconds. If your clock is off by 1000 seconds, NTP will refuse to synchronize your clock, and it will require manual intervention. If the upstream stratum from which you are synchronizing your clock is off by 1000 milliseconds, or 1 full second, that time source will not be used in synchronizing your clock, and others will be picked instead (this is to help weed out bad time sources). Client Debian, Ubuntu, Fedora, CentOS, and most operating system vendors, don't package NTP into client and server packages separately. When you install NTP, you've made your computer both a server, and a client simultaneously. If you don't want to serve NTP to the network, then don't open the port in your firewall. In this section, we'll assume that you're not going to use NTP as a server, but wish to use it as a client instead. I'm not going to cover everything in the /etc/ntp.conf configuration file, which is generally the standard installation path. However, there are a few things I do want to cover. First, the "server" lines. You can have multiple server lines in for configuration file. NTP will actively use up to 10. However, how many do you add? Consider the following: 1. If you only have one server configured, and that server begins to drift, then you will blindly follow the drift. If that server consistently gained 5 seconds every month, so would you. 2. If you only have two servers configured, then both will be automatically assigned as "false tickers" by NTP. If one of the servers began to drift, NTP would not be able to tell which upstream server is correct, as there would not be a quorum. 3. If you have three or more servers configured, then you can support "false tickers", and still have an agreement on the exact time. If you have five or six servers, then you can support two false tickers. If you have seven or eight servers, you can support three false tickers, and if you have nine or ten servers configured, then you can support up to four false tickers. NTP Pool Project As a client, rather than pointing your servers to static IP addresses, you may want to consider using the NTP pool project. Various people all over the world have donated their stratum 1 and stratum 2 servers to the pool, Microsoft, XMission, and even myself have offered their servers to the project. As such, clients can point their NTP configuration to the pool, which will round robin and load balance which server you will be connecting to. There are a number of different domains that you can use for the round robin. For example, if you live in the United States, you could use: • 0.us.pool.ntp.org • 1.us.pool.ntp.org • 2.us.pool.ntp.org • 3.us.pool.ntp.org There are round robin domains for each continent, minus Antarctica, and for many countries in each of those continents. There are also round robin servers for projects, such as Ubuntu and Debian: • 0.debian.pool.ntp.org • 1.debian.pool.ntp.org • 2.debian.pool.ntp.org • 3.debian.pool.ntp.org ntpq(1) NTP ships with a good client utility for querying NTP; it's the ntpq(1) utility. However, understanding the output of this utility, as well as its many subcommands, can be daunting. I'll let you read its manpage and documentation online. I do want to discuss its peering output in this blog post though. On my public NTP stratum 2 server, I run the following command to see its status:  ntpq -pn
remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*198.60.22.240   .GPS.            1 u  912 1024  377    0.488   -0.016   0.098
+199.104.120.73  .GPS.            1 u   88 1024  377    0.966    0.014   1.379
-155.98.64.225   .GPS.            1 u   74 1024  377    2.782    0.296   0.158
-137.190.2.4     .GPS.            1 u 1020 1024  377    5.248    0.194   0.371
-131.188.3.221   .DCFp.           1 u  952 1024  377  147.806   -3.160   0.198
-217.34.142.19   .LFa.            1 u  885 1024  377  161.499   -8.044   5.839
-184.22.153.11   .WWVB.           1 u  167 1024  377   65.175   -8.151   0.131
+216.218.192.202 .CDMA.           1 u   66 1024  377   39.293    0.003   0.121
-64.147.116.229  .ACTS.           1 u   62 1024  377   16.606    4.206   0.216

We need to understand each of the columns, so we understand what this is saying:

• remote- The remote server you wish to synchronize your clock with
• refid- The upstream stratum to the remote server. For stratum 1 servers, this will be the stratum 0 source.
• st- The stratum level, 0 through 16.
• t- The type of connection. Can be "u" for unicast or manycast, "b" for broadcast or multicast, "l" for local reference clock, "s" for symmetric peer, "A" for a manycast server, "B" for a broadcast server, or "M" for a multicast server
• when- The last time when the server was queried for the time. Default is seconds, or "m" will be displayed for minutes, "h" for hours and "d" for days.
• poll- How often the server is queried for the time, with a minimum of 16 seconds to a maximum of 36 hours. It's also displayed as a value from a power of two. Typically, it's between 64 seconds and 1024 seconds.
• reach- This is an 8-bit left shift octal value that shows the success and failure rate of communicating with the remote server. Success means the bit is set, failure means the bit is not set. 377 is the highest value.
• delay- This value is displayed in milliseconds, and shows the round trip time (RTT) of your computer communicating with the remote server.
• offset- This value is displayed in milliseconds, using root mean squares, and shows how far off your clock is from the reported time the server gave you. It can be positive or negative.
• jitter- This number is an absolute value in milliseconds, showing the root mean squared deviation of your offsets.

Next to the remote server, you'll notice a single character. This character is referred to as the "tally code", and indicates whether or not NTP is or will be using that remote server in order to synchronize your clock. Here are the possible values:

• " " Discarded as not valid. Could be that you cannot communicate with the remote machine (it's not online), this time source is a ".LOCL." refid time source, it's a high stratum server, or the remote server is using this computer as an NTP server.
• "x" Discarded by the intersection algorithm.
• "." Discarded by table overflow (not used).
• "-" Discarded by the cluster algorithm.
• "+" Included in the combine algorithm. This is a good candidate if the current server we are synchronizing with is discarded for any reason.
• "#" Good remote server to be used as an alternative backup. This is only shown if you have more than 10 remote servers.
• "*" The current system peer. The computer is using this remote server as its time source to synchronize the clock
• "o" Pulse per second (PPS) peer. This is generally used with GPS time sources, although any time source delivering a PPS will do. This tally code and the previous tally code "*" will not be displayed simultaneously.

Lastly, in understanding the output, we need to understand the what is being used as a reference clock in the "refid" column.

• IP address- The IP address of the remote peer or server.
• .ACST.- NTP manycast server.
• .ACTS.- Automated Computer Time Service clock reference from the American National Institute of Standards and Technology.
• .AUTH.- Authentication error.
• .AUTO.- Autokey sequence error.
• .CRYPT.- Autokey protocol error
• .DCFx.- LF radio receiver from station DCF77 operating out of Mainflingen, Germany.
• .GAL.- European Galileo satellite receiver.
• .GOES.- American Geostationary Operational Environmental Satellite receiver.
• .GPS.- American Global Positioning System receiver.
• .HBG.- LF radio receiver from station HBG operating out of Prangins, Switzerland.
• .INIT.- Peer association initialized.
• .IRIG.- Inter Range Instrumentation Group time code.
• .JJY.- LF radio receiver from station JJY operating out of Mount Otakadoya, near Fukushima, and also on Mount Hagane, located on Kyushu Island, Japan.
• .LOCL.- The local clock on the host.
• .MCST.- NTP multicast server.
• .MSF.- National clock reference from Anthorn Radio Station near Anthorn, Cumbria.
• .NIST.- American National Institute of Standards and Technology clock reference.
• .PPS.- Pulse per second clock discipline.
• .PTB.- Physikalisch-Technische Bundesanstalt clock reference operating out of Brunswick and Berlin, Germany.
• .RATE.- NTP polling rate exceeded.
• .STEP.- NTP step time change. The offset is less than 1000 millisecends but more than 125 milliseconds.
• .TDF.- LF radio receiver from station TéléDiffusion de France operating out of Allouis, France.
• .TIME.- NTP association timeout.
• .USNO.- United States Naval Observatory clock reference.
• .WWVH.- HF radio receiver from station WWVH operating out of Kekaha, on the island of Kauai in the state of Hawaii, United States.

Client Best Practice
There seem to be a couple long standing myths out there about NTP configuration. The first is that you should only use stratum 1 NTP servers, because they are closest to the true time source. Well, this isn't always the case. Connecting to stratum 1 time servers that have high RTT latencies could exhibit large jitter and large offsets. Rather, you should find stratum 1 servers that are physically close to your client. Also, many stratum 1 servers might be overloaded, and finding less stressed stratum 2 servers might deliver more accurate results.

The other myth out there is that you should only connect to physically close NTP servers. This isn't necessarily true either. If the closest NTP servers to you only have one physical link, and that link goes down, you're sunk. Further, if the closest NTP servers to you are stratum 4 or 5 servers, you may exhibit high offsets from the upstream stratum 0 sources. There is a reason why the NTP Pool Project only lists public stratum 1 and stratum 2 servers, and there's a reason why stratum 16 is considered unsynchronized.

Point is, there is a balance in configuring NTP. If you have a large infrastructure, it would make sense for you to build and install a stratum 1 or stratum 2 source at each logically different location (geographically or VLAN'd), and have each server and workstation connect to that logically local NTP server. If it's just your personal computer, then it probably makes sense to just use the NTP Pool Project, and use the round robin domain names. You should keep efficiency and redundancy in mind.

So, you should probably consider the following best practices when configuring your NTP client:

• Use at least 3 servers, and don't statically use busy servers.
• Consider using the NTP pool project, if you will be operating as a client only.
• If statically setting IP addresses for your servers, try to keep the following in mind:
• Use servers that are physically close to your computer. These servers should have low ping latencies.
• Use servers that are geographically separated across the globe. Just in case the trans-Atlantic cable is cut, you can still communicate to other servers.
• Use servers that use different time sources. If all of your servers use GPS as their time source, and GPS goes offline, you will not have anything to synchronize your clocks against.
• Consider using all 3 of the above on a single client.

## Open Letter To All GNU/Linux and Unix Operating System Vendors

This is an open letter to all GNU/Linux and Unix operating system vendors.

Please provide some sort of RSS or Atom feed for just new releases. Nothing else. No package updates. No "community" posts. No extra fluff. It shouldn't include news about being included in the Google Summer of Code. It shouldn't provide a list of package security advisories. It shouldn't include why you think dropping one package for a fork is a good idea.

Did you just release "H4x0rz Linux 13.37"? Great! Publish that release news to a central spot, where only releases are posted, and give me a feed to parse. Still confused? Let me give you an example:

Perfect. Includes alpha releases, which are fine. But it focuses only on the new releases. No community news; that's what planets are for. No package updaets; I can figure those out in the OS itself. Just releases.

Here's a list of vendors that I would like to put in my feed reader, that I cannot find any such centralized feed source:

• CentOS
• Debian
• Fedora
• FreeBSD
• Linux Mint
• OpenBSD
• OpenSUSE
• Scientific Linux
• Slackware

I know some projects have web forums, of which there may be a subforum dedicated to releases only. If that forum provides an RSS feed, perfect. I know some mailing list managers also provide RSS feeds for archives. That works too. I don't care where it comes from, just so long as there is a reliable source where I can get reliable, up-to-date news on just the latest release, nothing else.

Thanks!

I just recently acquired a Raspberry Pi at SAINTCON 2013. I already had one, and forgot how much fun these little computers can be. I also forgot what a PITA they can be if you don't have your house hard wired to your switch for Internet access, and have to go into the basement to plug in. Plugging into a monitor and keyboard isn't a big deal for me, it's just the inconvenience of getting to the Internet. So, I downloaded Raspbian, ran through the initial config, including setting up an SSH server. The only thing left to do is get it online, and that will take a little config, which this post is about.

My laptop is connected wirelessly, so my ethernet port is available. So, I should be able to plug the Raspberry Pi into the laptop, and have it use the laptop's wireless connection. In other words, using my laptop as a router and a gateway. So, let's get started. Below is an image of what I am trying to accomplish:

The Raspberry Pi needs to be connected to the laptop via a standard twisted pair ethernet cable. The laptop will be connecting to the Internet wirelessly. So, while I still had my Raspberry Pi connected to the monitor and keyboard, while it is offline, I edit the /etc/network/interfaces file as follows:

# Raspberry Pi
iface eth0 inet static
gateway 172.16.1.1

Then, on my laptop, I gave my ethernet port the address of "172.16.1.1" (mostly because no one ever uses this network- so it shouldn't conflict with your home/office). My laptop must be the gateway to the Internet for the Raspberry Pi. Notice that my laptop does not need a gateway for this interface. Instead, it's going to masquerade off of the "wlan0" interface, which already has a gateway configured:

# Laptop
iface eth0 inet static
netmask 255.255.255.252

Now, I need to make my laptop a router, so it can route packets from one network (172.16.1.0/30) to another (whatever the "wlan0" interface is connected to). As such, run the following command as root:

echo 1 > /proc/sys/net/ipv4/ip_forward

Now, that this point, the "eth0" and "wlan0" interfaces are logically disconnected. Any packets coming into the "eth0" device won't make it any further. So, we need to create a logical pairing, called a "masquerade". This will allow packets going in "eth0" to exit "wlan0", and vice versa. So, as root, pull up a terminal, and type the following:

iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
iptables -A FORWARD -i eth0 -j ACCEPT

If you have any firewall rules in your INPUT chain, you will need to open up access for the 172.16.1.0/30 network.

At this point, plug your Raspberry Pi into your laptop, SSH into the Pi, and see if you can ping out to the Internet.

## NTP Drift File

Many things about NTP are elusive. At the casual user, there are a lot of things to understand: broacast, unicast, multicast, tally codes, servers, peers, stratum, delay, offset, jitter and so much more. Unless you setup your own NTP server, with the intent of providing accurate time keeping for clients, many of those terms can be discarded. However, one term you may want to be familiar with is "drift".

Clock drift is when clocks are either too fast or too slow compared to a reference clock. NTP version 4 has the ability to keep clocks accurate within 233 picoseconds (called "resolution"). Of course, to have this sort of accuracy, you need exceptionally low latency networks with specialized hardware. High volume stock exchanges might keep time accuracy at this level. Generally speaking, for the average NTP server and client on the Internet, comparing time in milliseconds is usually sufficient.

So, where does NTP keep track of the clock drift? For Debian/Ubuntu, you will find this in the /var/lib/ntp/ntp.drift file. In the file, you'll find either a positive or negative number. If it's positive, your clock is fast; if it's negative, your clock is slow. This number however, is not measured in seconds, milliseconds, nanoseconds or picoseconds. Instead, the number is measuring "parts per million", or PPM. It's still related to time, and you can convert this number to seconds, which I'll show you here.

There are 86,400 seconds in one day. If I were to divide that number into one million pieces, then there would be .0864 seconds per piece, or 86.4 milliseconds per piece.

My laptop connects to the standard NTP pool (0.us.pool.ntp.org, etc). I have a number of "3.322" in my drift file. This means that my laptop is fast by 3.322 PPM compared to the time source I am synchronizing my clock with (called the "sys_peer"). If I wanted to convert that to seconds, then:

My laptop is fast by roughly 287 milliseconds compared to my "sys_peer".

I just recently announced an open access NTP server. It was critical for me that this server be as accurate as possible with time keeping. So, all of the stratum 1 time servers that it connected to, had to have a ping latency of less than 10 milliseconds. Thankfully, I was able to find 3 servers with latencies less than 6 milliseconds, one of which is only 500 nanoseconds away. This became the preferred "sys_peer". The contents of its drift file currently is "-0.059". Again, converting this to seconds:

My NTP server is slow by roughly 5 milliseconds compared to the "sys_peer" time source at that specific moment.

Hopefully this clears up the NTP drift file, which I'm sure many of you have noticed. If you connect to NTP servers with very low latencies, then you'll notice that your drift file number approach zero. It's probably best to find 3 or 5 NTP servers that are physically close to you to keep those latencies low. If you travel a lot with your laptop, then connecting to the NTP pool would probably be best, so you don't need to constantly change the servers you're connecting to.

## New Public NTP Server

I just assembled a public access NTP stratum 2 server. Feel free to use it, if you wish. It is considered "Open Access". It has a public webpage at http://jikan.ae7.st. This stratum 2 server has a few advantages over some others online:

• It connects to three stratum 1 GPS time-sourced servers.
• Each stratum 1 server is less than 6 milliseconds away.
• The preferred stratum 1 server is about .5 milliseconds away.
• Stratum 2 peering available- just contact me.
• It has a 100 Mbit connection to the Internet.
• The ISP sits behind four redundant upstream transit providers.
• The ISP also peers on the Seattle Internet Exchange.

It is also available in the NTP pool at http://www.pool.ntp.org/en/. If you want to synchronize your computer with this server, then just add the following line in your /etc/ntp.conf configuration file:

server jikan.ae7.st

Eventually, I'll also offer encrypted NTP for those who wish to have encrypted NTP packets on the wire (only if it's possible to offer both encrypted and unencrypted NTP simultaneously- I think it is). I'm also currently working on finding some other stratum 2 peers that are less than 30 milliseconds away. If you're running an NTP server, and want to peer with me, just let me know.

Hopefully, this will be of some benefit to the community.

## Pthree.org Is Now SSL Enabled

Just a quick update to say that I have enabled SSL, and forced it by default, for this blog. Given all the revelations about the NSA, the straw finally broke the camel's back, and we are now live with SSL. There may be some growing pains, seeing as though this will cost me a bit more on CPU, but I should be able to adjust the load with the growth.

If there are any oddities, or anything of concern that you notice regarding switching this blog to SSL, please let me know in the comments, email me, get me on IRC, or whatever. Thanks.

## Sufficient Paranoia

With all the recent revelations about the NSA violating United States citizen's 4th amendment rights with their warrantless wiretapping, and now the news of Silk Road being taken down, and the NSA trying to crack Tor (it won't happen- I trust the mathematics), I thought now would be a good time to discuss the concept of healthy, or sufficient paranoia.

I am a system administrator by profession. I have certain levels of fears that make sure I don't make a mistake:

• I assume that installing new software will break something.
• I assume upgrading the BIOS will brick the hardware.
• I assume the hardware firewall will fail.
• I assume hard drives will fail.
• I assume the janitors have installed a key logger on my machine.
• I assume walking away from my machine, means my coworkers will want to hack my Gibson.
• I assume backups aren't working

As such, I take the following measures:

• I have a backup of the data.
• I have a disaster recovery plan to take out the old drives, and put them into new hardware.
• I have redundant software firewalls installed on all my boxes.
• I have redundant drives, and I have a backup of the data on those drives.
• I run visual checks to make sure no new hardware has been added.
• I always lock my workstation. Always.
• I test restoring data, even when I don't have to.

There's other paranoia that I have. These things keep me in check. They help me sleep at night. Once, I heard a story from my scout leader about always being prepared. He shared the story like this:

It was at the annual county fair, and farmers from far and near had come to exhibit their harvest and to engage hired hands for the next year. One prosperous farmer came across a husky lad and asked: "What can you do?" The answer: "I can sleep when the wind blows." With such an answer the farmer turned and started to walk away, perturbed at the impudence of the man. But he turned again and asked: What did you say?" "I can sleep when the wind blows." "Well," said the farmer, "I don't know what that means, but I'm going to hire you anyway."

Winter came, followed by the usual spring, and the new hired hand didn't show any particular signs of extra work, but filled the duties of his work as most others would have done. And then one night in early summer the farmer noticed a strong wind rising. He dashed to the hired hand's quarters to arouse him to see that all the stock was properly cared for. There he found the hired hand asleep. He was about to awaken him, when he remembered the boy's strange statement. He went to his barns and there found all his animals in their places, and the doors and windows securely locked. He found the haystack had been crisscrossed with heavy wires, anticipating such a night, and that it would weather the storm.

Then the farmer knew what his hired man meant when he gave as his only qualification, "I can sleep when the wind blows."

I'm sure you've heard similar versions of this story. It has a lot of applications, including sufficient paranoia. The hired assistant kept realized the fear of lost of dead animals. He understood the fear of haystacks blown away with the wind. He knew what flooded barns and stables meant. He had sufficient paranoia, that in the worst of cases, he was prepared. However, not only was he sufficiently paranoid, but his paranoia likely lead to a behavior that most would consider odd.

The same can be said for security. I cryptographically sign all of my emails with my GPG key. I have been doing this since 2005, and I don't see any need to stop now. I've been asked about it many times. My response is always the same: "If you receive an unsigned email from me, then you should question the authenticity of the sender." Of course, it's their duty to verify the signature is valid. I've done my duty by signing them. And what happens when I appear in front of a judge in a court of law, and an email claiming to be sent from me is called into question? I can show with unwavering consistency that I have signed every email since 2005, which would then call into doubt the email in question, if that email is not cryptographically signed. Innocent until proven guilty.

I recently did an audit on all my account passwords. Not only is every account a different, truly random password, but I make sure that the entropy of every passwords exceeds 120 bits, where possible. Further, every account uses a password I know from my password card, as well as a long password I don't know from my Yubikey. So, I have two-factor authentication for every account, where possible. Given what I know about password cracking, this is good security, for very little cost. Not even my wife knows my passwords (which could prove to be difficult if I die).

I even have a different SSH key for every computer, and each SSH key is encrypted with a different password. I encrypt the SSH key with SSL, instead of the default encryption OpenSSH uses, to slow down offline passphrase attacks.

I don't recycle my shredded paper. Instead, I use it as kindling for my parents fireplace during the winter. I've also used it as mulch for our small box garden in the back yard, and our flower garden in the front. If it gets thrown away, I do it in sections- thoroughly mix the shredded paper, and throw away 1/10th of it one month. Then 1/10th the next month, at a different location. Et cetera. I'm paranoid that someone at the land fill is going through the garbage, looking for freebies. The last thing I want is my bank account number found (although improbable given my super awesome paper shredder).

I use Ghostery and AdBlock as necessary extensions for my browsers. When I don't have control of the computer, or the network, I use a browser on a USB thumb drive, in private browsing mode, connected to either an SSH or Tor proxy, including proxying DNS, and I never view Flash media.

Whenever I walk away from my computer, I make sure I lock the screen, pull my Yubikey, and put it in my wallet. Yes, it's trivial for someone to take the contents of the key while I am away, and it's just as trivial for me to take my Yubikey with me when I leave the keyboard.

I run an encrypted filesystem on my computers and servers. For sensitive data, I keep those GPG-encrypted in an eCrypftFS mount, which is also two-factor password protected. I can give law enforcement what I know, without needing to tell them about what I have, without compromising the system.

There are many other things I do, such as not divulging private details of personal things over SMS or IM, or sometimes, even over voice. I always lock my doors, even if I'm occupying the space. When in crowded environments, I put my wallet in my front pocket, under my hand. I could go on and on.

I do these things, because I have what I call "sufficient paranoia". It's just good security practice. Does it make me look crazy, even to my coworkers? Of course. Am I worried that the NSA has bugged my house, or my wife is a secret spy? No. I maintain balance.

We don't know what the future will bring. We don't know if tomorrow, it can be proved that P = NP, and all cryptograhpy falls apart as a result. We don't know the full extent of the NSA illegal spying. We don't know when Google is breached, and all accounts are sold to the highest bidder. We can't control these things. What we can control is how to be prepared for them. We can control a certain level of paranoia that keeps everything in check.

Sufficient paranoia.