With all the news about Heartbleed, passwords, and two-factor authentication, I figured I would blog about exactly how two-factor authentication can work- in this case, TOTP, or Time based one time passwords, as defined by The Initiative for Open Authentication (OATH). TOTP is defined in RFC 6238, and is an open standard, which means anyone can implement it, with no worries about royalty payments, or copyright infringements. In fact, TOTP is actually just an extension of HOTP (HMAC based one time passwords), which is defined in RFC 4226. I'll describe HOTP a little bit in this post, but focus primarily on TOTP.
What is two-factor authentication?
First, let's describe the problem. The point of two-factor authentication is to prevent attackers from getting access to your account. Two-factor authentication requires that two tokens be provided for proof of ownership of the account. The first token is something that we're all familiar with- a username and a password. The second token is a bit more elusive, however. It should be something you have, and only you. No one else should be able to come into possession with the same token. The same should be true for usernames and passwords, but we've seen how easily broken a single-factor authentication system is.
Think of two-factor authentication as layers on an onion. The first layer is your username and password. After peeling away that layer, the second layer is a secret token. Three-factor authentication is also a thing, where you can provide something that you are, such as a fingerprint or retinal scan. That would be the third layer in our onion. You get the idea.
However, the second token should not be public knowledge. It must be kept secret. As such, a shared secret is generated between the client and the server. For both HOTP and TOTP, this is just a base-32 random number. This random number, along with a message turns into an HMAC-SHA1 cryptographic hash (which is defined in RFC 2104; also described at Wikipedia).
Importance of Time
Unfortunately, a "shared secret" is a fairly lame form of authentication. First, the user could memorize the secret, no longer making it something the user has. Second, man in the middle attacks on shared secrets are extremely effective. So, we need the ability to prevent the user from memorizing the shared secret, and we need to make man in the middle attacks exceptionally difficult. As such, we turn our shared secret into a moving target.
HOTP, as already mentioned, is the base from which TOTP comes. HOTP uses the same algorithm as described below in this post, except that rather than using time as the moving factor, an 8-byte counter is changed. We need a moving target, because if the token were static, it would be no different than just a second password. Instead, we need the attacker to be constantly guessing as to what the token could be. So, with HOTP, a token is valid until used. Once used, the counter is incremented, and the HMAC-SHA-1 string is recalculated.
TOTP uses the UNIX epoch as its time scale, in seconds. This means that for TOTP, time starts with January 1, 1970 at 00:00.00, and we count the number of seconds that have elapsed since then. By default, we only look at 30 second intervals, on the minute. This means at the top of the minute (zero seconds past the minute), TOTP is refreshed, and again 30 seconds after the minute. TOTP uses a shared secret between the client and the server, so it's important that both the client and the server clocks are synchronized. However, RFC 6238 does allow for some clock skew and drift. As such, when a code is entered, the server will allow for a 30 second window on either side of the code. This means that the code is actually valid for a maximum of 90 seconds. If using NTP on the server, and a mobile phone for the client, then clock drift isn't a concern, as both will be continuously updated throughout the day to maintain accurate time. If NTP or GSM/CDMA broadcasts are not adjusting the clock, then it should be monitored and adjusted as needed.
Computing the HMAC-SHA-1
Hash-based message authentication codes (HMAC) require two arguments to calculate the hash. First, they require a secret key, and second, they require a message. For TOTP (and HOTP), the secret key is our shared secret between the client and the server. This secret never changes, and is the foundation from which our HMAC is calculated. Our message, is what changes. For TOTP, is the time in seconds, since the UNIX epoch rounded to the nearest 30 seconds. For HOTP, it is the 8-byte counter. This moving target will change our cryptographic hash. You can see this with OpenSSL on your machine:
$ KEY=$(< /dev/random tr -dc 'A-Z0-9' | head -c 16; echo) $ echo $KEY WHDQ9I4W5FZSCCI0 $ echo -n '1397552400' | openssl sha1 -hmac "$KEY" (stdin)= f7702ad6254a06f33f7dcb952000cbffa8b3c72e $ echo -n '1397552430' | openssl sha1 -hmac "$KEY" # increment the time by 30 seconds (stdin)= 70a6492f088785444fc664e1a66189c6f33c2ba4
Suppose that our HMAC-SHA1 string is "0215a7d8c15b492e21116482b6d34fc4e1a9f6ba". We'll use this image of our HMAC-SHA-1 to help us identify a bit more clearly exactly what is happening with our token:
Unfortunately, requiring the user to enter a 40-character hexadecimal string in 30 seconds is a bit unrealistic. So we need a way to convert this string into something a bit more manageable, while still remaining secure. As such, we'll do something called dynamic truncation. Each character occupies 4-bits (16-possible values). So, we'll look at the lower 4-bits (the last character of our string) to determine a starting point from which we'll do the truncating. In our example, the last 4-bits is the character 'a':
The hexadecimal character "a" has the same numerical value as the decimal "10". So, we will read the next 31-bits, starting with offset 10. If you think of your HMAC-SHA-1 as 20 individual 1-byte strings, we want to start looking at the strings, starting with the tenth offset, of course using the number "0" as our zeroth offset:
As such, our dynamically truncated string is "6482b6d3" from our original HMAC-SHA-1 hash of "0215a7d8c15b492e21116482b6d34fc4e1a9f6ba":
The last thing left to do, is to take our hexadecimal numerical value, and convert it to decimal. We can do this easily enough in our command line prompt:
$ echo "ibase=16; 6482B6D3" | bc 1686288083
All we need now are the last 6 digits of the decimal string, zero-padded if necessary. This is easily accomplished by taking the decimal string, modulo 1,000,000. We end up with "288083" as our TOTP code:
The user then types this code into the form requesting the token. The server does the exact same calculation, and verifies if the two codes match (actually, the server does this for the previous 30 seconds, the current 30 seconds, and the next 30 seconds for clock drift). If the code provided by the user matches the code calculated by the server, the token is valid, and the user is authenticated.
TOTP, and alternatively HOTP, is a great way to do two-factor authentication. It's based on open standards, and the calculations for the tokens are done entirely in software, not requiring a proprietary hardware or software solution, such as provided by RSA SecurID and Verisign. Also, the calculations are done entirely offline; there is no need to communicate with an external server for authentication handling at all. No calling home to the mothership, to report someone has just logged in. It's platform independent, obviously, and is already implemented in a number of programming language libraries. You can take this implementation further, by generating QR codes for mobile devices to scan, making it as trivial as installing a mobile application, scanning the code, and using the tokens as needed.