Image of the glider from the Game of Life by John Conway
Skip to content

Use wget(1) To Expand Shortened URLs

I'm a fan of all things microblogging, but let's face it: until URLs become part of the XML, and not part of your character count (which is ridiculous anyway), shortened URLs are going to be a way of life. Unfortunately, those shortened URLs can be problematic. They could host malicious scripts and/or software that could infect your browser and/or system. They could lead you to an inappropriate site, or just something you don't want to see. And because these URLs are a part of our microblogging lives, they've also become a part of our email, SMS, IM, IRC, lives as well as other online aspects.

So, the question is: do you trust the short URL? Well, I've generally gotten into the habit of asking people to expand the shortened url for me if on IRC, email or IM, and it's worked just fine. But, I got curious if there was a way to do it automagically, and thankfully, you can use wget(1) for this very purpose. Here's a "quick and dirty" approach to expanding shortened URLs (emphasis mine):

$ wget --max-redirect=0 -O - http://t.co/LDWqmtDM
--2011-10-18 07:59:53--  http://t.co/LDWqmtDM
Resolving t.co (t.co)... 199.59.148.12
Connecting to t.co (t.co)|199.59.148.12|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://is.gd/jAdSZ3 [following]
0 redirections exceeded.

So, in this case "http://t.co/LDWqmtDM" is pointing to "http://is.gd/jAdSZ3", another shortened URL (thank you Twitter for shortening what is already short (other services are doing this too, and it's annoying- I'm looking at you StatusNet)). So, let's increase our "--max-redirect" (again, emphasis mine):

$ wget --max-redirect=1 -O - http://t.co/LDWqmtDM
--2011-10-18 08:02:12--  http://t.co/LDWqmtDM
Resolving t.co (t.co)... 199.59.148.12
Connecting to t.co (t.co)|199.59.148.12|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://is.gd/jAdSZ3 [following]
--2011-10-18 08:02:13--  http://is.gd/jAdSZ3
Resolving is.gd (is.gd)... 89.200.143.50
Connecting to is.gd (is.gd)|89.200.143.50|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://wiki.ubuntu.com/UbuntuOpenWeek [following]
1 redirections exceeded.

So, in this case, the link finally points to https://wiki.ubuntu.com/UbuntuOpenWeek. I'm familiar enough with the Ubuntu Wiki, that I know I should be safe visiting the initial shortened URL. If you want to add this to a script or shell function, then you can get a bit more fancy:

$ expandurl() { wget -O - --max-redirect=$2 $1 2>&1 | grep ^Location; }
$ expandurl http://t.co/LDWqmtDM 1
Location: http://is.gd/jAdSZ3 [following]
Location: https://wiki.ubuntu.com/UbuntuOpenWeek [following]

In this case, our "expandurl()" function takes two arguments: the first being the URL you wish to expand, and the second being the max redirects. You'll notice further that I added "-0 -" to print to STDERR. This is just in case you give too many redirects, it will print the content of the page's HTML to the terminal, rather than saving to a file. Because you're grepping for "^Location", and sending the HTML to your terminal anyway, technically you could get rid of the "--max-redirects" altogether. But, keeping it in play does seriously increase the time it takes to get the locations. Whatever works for you.

UPDATE (Oct 18, 2011): After some comments have come in on the post, and some discussion on IRC, there is a better way to handle this. According to the wget(1) manpage, "-S" or "--server-response" will print the headers and responses printed by the FTP/HTTP servers. So, here's the updated function that you might find to be less chatty, and faster to execute as well:

$ expandurl() { wget -S $1 2>&1 | grep ^Location; }
$ expandurl http://t.co/LDWqmtDM
Location: http://is.gd/jAdSZ3 [following]
Location: https://wiki.ubuntu.com/UbuntuOpenWeek [following]

Perfect.

{ 14 } Comments

  1. Charles Profitt using Firefox 7.0.1 on GNU/Linux 64 bits | October 18, 2011 at 8:53 am | Permalink

    Thanks for posting this... it certainly looks to be a much better way of trying to determine if I want to click on a shortened link. I never knew about the max redirect on wget so this has been a good lesson there as well.

  2. Me using Firefox 7.0.1 on GNU/Linux 64 bits | October 18, 2011 at 10:30 am | Permalink

    In the expandurl() function, changing "--max-redirect=$2" to "--max-redirect=${2:-0}" lets you skip specifying the second parameter whenever you want max-redirect set to 0.

  3. Aaron Toponce using Debian IceWeasel 7.0.1 on GNU/Linux 64 bits | October 18, 2011 at 10:34 am | Permalink

    @Me- That is loaded with awesome-sauce. Thanks for sharing! I was thinking about if something like that was possible on the way into work this morning. Glad to see it is.

  4. Dexter using Google Chrome 14.0.835.202 on GNU/Linux 64 bits | October 18, 2011 at 11:47 am | Permalink

    expandurl() { wget --spider $1 2>&1 | grep '^Location' }

  5. Aaron Toponce using Debian IceWeasel 7.0.1 on GNU/Linux 64 bits | October 18, 2011 at 1:11 pm | Permalink

    @Dexter- Awesome! I read about --spider in the manual, but it still downloads the page it lands on, which could be quite large (in the case of a PDF, for example). Instead, it appears that the '-S' switch is superior.

  6. Dexter using Google Chrome 14.0.835.202 on GNU/Linux 64 bits | October 18, 2011 at 1:30 pm | Permalink

    The --spider option causes wget to only make HEAD requests, so you don't have to worry about it downloading the page. The -S just makes it include the headers in the output.

  7. M using Midori on Mac OS | October 18, 2011 at 4:46 pm | Permalink

    Thanks, for the post and comments.

  8. Seth using Google Chrome 14.0.835.202 on GNU/Linux 64 bits | October 18, 2011 at 4:50 pm | Permalink

    No discussion about something in wget is complete without the curl version:

    curl -sIL http://t.co/LDWqmtDM | grep '^Location'

  9. toobuntu using Firefox 7.0.1 on Mac OS | October 18, 2011 at 10:45 pm | Permalink

    Well, this will return the expanded URI only:
    curl -sIL http://t.co/LDWqmtDM | grep '^Location' | tail -n 1 | awk '{print $2}'

  10. Weboide using Firefox 7.0.1 on GNU/Linux 64 bits | October 21, 2011 at 7:26 am | Permalink

    This works the best for me. It does not create any file.
    expandurl() { wget --spider -S $1 2>&1 | grep ^Location; }

  11. fRIOUX Schmidt using Firefox 8.0 on GNU/Linux | October 22, 2011 at 5:41 pm | Permalink

    For some reason this doesn't seem to work on zsh; when I run the final updated version here is my output:

    helena [14105] ~/code/WLS «master¹» $ expandurl http://t.co/LDWqmtDM
    proto.h:/* Alarm.cpp */
    proto.h:/* Alarm.cpp */

    Any ideas?

  12. Aaron Toponce using Debian IceWeasel 7.0.1 on GNU/Linux 64 bits | October 22, 2011 at 8:37 pm | Permalink

    @fRIOUX Schmidt- I only use ZSH, and it works fine from here. What is proto.h and Alarm.cpp? I would probably start from there.

  13. Lauri Ranta using Safari 534.57.2 on Mac OS | July 4, 2012 at 3:36 am | Permalink

    `curl -s -o /dev/null --head -w "%{url_effective}\n" -L "https://t.co/6e7LFNBv"`

    - `--head` or `-I` only downloads HTTP headers
    - `-w` or `--write-out` prints the specified string after the output
    - `L` or `--location` follows location headers

  14. David using Google Chrome 25.0.1364.172 on GNU/Linux 64 bits | December 5, 2013 at 6:39 am | Permalink

    if you just want the URL resolved, you can add "-O /dev/null" to avoid having lots of files with the content of url

{ 1 } Trackback

  1. Use wget(1) To Expand Shortened URLs | November 19, 2012 at 11:47 am | Permalink

    [...] via pthree.org [...]

Post a Comment

Your email is never published nor shared.

Switch to our mobile site