For those not familiar with TinyURL, it's a simple and popular service which allows a user to submit any lengthy URL and in return receive a URL of the format - http://tinuurl.com/xxxxxx (e.g. http://tinyurl.com/bwcdzs
) , where 'x' represents a base 36 (a-z + 0-9) character. While it may at first glance appear that they'd quickly run out of addresses, such an approach actually renders 2+ billion unique URLs. When a user browses to a TinyURL, the service returns a 301 'Moved Permanently' status code along with a Location header of the true URL that the browser then requests. When TinyURL was first launched in 2002, the primary motivation was to enable linking to newsgroup postings which tend to have lengthy, ugly URLs. Today however, it's more popular than ever given the increasing use of mobile applications and Twitter, which limits posts to 140 characters.
I personally have always wondered if TinyURL was getting a bad rap. While I agree that it provides a basic form of URL obfuscation by hiding the true destination from an end user, there are certainly no shortage of such techniques, especially when dealing with naive victims that are fooled by a TinyURL. Moreover, the browser is simply redirected - it does ultimately make a separate request for the true destination URL. As a result, browser controls such as blacklists are still effective as they would catch the 'evil' site on the second request, following the redirection. I've also heard rumor that TinyURL now actively filters for bad content - if they aren't they certainly should be.
The code is pretty basic, it simply loops 100K times, converts the count to a base36 number, requests the associated TinyURL and then records the Location header in the response. TinyURL does not for some reason implement any verification to ensure that the submitted URL adheres to standards
or even exists
. Therefore, the resulting list of URLs, not surprisingly included close to 10K entries that did not represent legitimate URLs. Of the legitimatly formed URLs that were found, they could be divided into the following protocols:
- HTTP - 90,541
- HTTPS - 1,036
- FTP - 339
Now it wasn't hard to spot URLs that suggested a malicious purpose, such as those which attempted to further obfuscate the true domain, redirect to LAN based resources or even resources on a local machine. While such TinyURLs may have been registered for use in a planned attack, they also suggest a lack of understanding as to how browsers and the web itself actually functions. Most posed no threat whatsoever.
I leveraged the IP::Country::Fast Perl module and ran through the cleaned up URLs to determine where the destination pages were hosted. The United States was by far the most common location for servers to be hosted at, with 98.93% of the TinyURLs investigated. After the US, the following countries made up the top 20:
What does this tell us? Very little...but it is a pretty graph.
URL categories are a little more interesting. I ran the clean TinyURLs collected through our classification engine to determine the overall 'type' of content that was showing up. If the service was being abused, I would expect to see 'malicious' categories dominating the list.
Definitely nothing too scandalous here. While there was some volume for 'questionable' categories such as Nudity (1,047), Pornography (964) and Anonymizers (151)...clearly malicious categories - Malware (126) and Phishing (14) had minimal volume given the overall population.
I would assume that if TinyURL were heavily leveraged in attacks, that a sizable portion of content would point to executable files, which attackers are trying to social engineer victims into downloading. 465 TinyURLs did directly reference *.exe files. However, understandably, not all URLs were available for download at the time of the test. Of the 197 executables that I was able to retrieve, all were run through AV and a grand total of zero were reported to be infected.
This is an area where I fully expected to get some solid results. To identify potential phishing URLs, I leveraged PhishTank
, a great open collection of human verified phishing data. Specifically, I took advantage of their check URL
functionality which permits you to submit an individual URL and receive XML data detailing whether or not the page represents a verified phishing site. All of the cleaned up TinyURL data was submitted (~90K URLs) and none
represented confirmed phishing URLs.
This time around I took advantage of the Google SafeBrowsing Diagnostic page
and once again automated the process. I was only able to get through about half of the full list of URLs as my script needed to incorporate a delay to ensure that it didn't get blocked, however, the results seem fairly conclusive. While malicious URLs were identified, of ~50K URLs only the following five were found to be black listed:
Poor TinyURL...it would appear that you're getting a bum rap after all. Yes, we did identify some malicious content but certainly not enough to justify your lifetime ban from some of the cool web 2.0 parties.