"Maybe I'd learn something I could share with him," he questioned? "What's the harm in allowing a student to have college credit for researching security vulnerabilities?" Well, I decided to forego my original idea and instead crack the "homepage of the Internet."
NOTE: I will not be sharing any of my code for multiple reasons. One reason is that I do love reddit and would be saddened to see it get overrun by script kiddies and spambots. The second reason is that, if you're smart enough, you should be able to do this on your own. I'm not here to hold your hands to do something malicious; I'm here to inform people on how our current online security systems are unsafe.
A Little Background: What's a "Reddit?"
Reddit is the self-described "homepage of the internet." It's an extremely popular social media platform that involves users sharing, commenting on, and judging links to pictures, articles, music, videos, or self-typed posts. Basically, it's what your homepage should be. The site has millions of users and is currently ranked 136 on Alexa, though I imagine it should be much lower but Reddit's demographic probably doesn't install the Alexa Toolbar... You don't need an account to look at comments or links, but you do need one to comment, vote on links, "subscribe" to subreddits, and more. Creating an account is easy, as visible here:
All one needs to do is supply a username, a password, and a proper response to their CAPTCHA. Here lies our attack vector: to create an account on one of the most popular websites in the world, you only need to provide a name, password, and the response to a small vision-based puzzle.
A Little More Background: CAPTCHA, Eh?
A CAPTCHA is defined as a "Completely Automated PublicTuring test to tell Computers and Humans Apart" (thanks Wikipedia!). Basically, it's the funny-looking jumbled up letters you have to type in to "prove that you're human" on many sites. The point of these images isn't to annoy you; it's that they're supposed to be impossible (read: very hard) for computers to comprehend. If computers could understand how to decipher these images, then scripts could be written to create thousands or millions of spam accounts that would overrun a website. (See, this is why I'm not sharing my code)! Anyways, there are three types of obfuscation in a CAPTCHA:
- Warped letters
- Color-based tricks
- Background noise
As you can see above, Reddit employs all of these. The laters are indeed warped and move in a way that is pretty abnormal. The purpose of this effect is that a machine would have a hard time thinking that slanted 'J' is a normal-form 'J.' They use a trick where the background is black, the intermediate level is white/gray, and the foreground letters are white/gray. This makes it extremely difficult for computers to understand what exactly is a letter and what's the background. Finally, the distorted grid in the back is obviously noise used to make it difficult for a machine to understand what's in the image. See, our sight allows us to view the image as a whole and know what we're looking at. Computer software isn't so lucky; it has to go through the image pixel-by-pixel and judge what it's looking at on a small scale rather than a holistic view.
This ends part one of my write up on how I cracked Reddit's CAPTCHA system. The next part should go over how I separated the letters from the background and then from each other. I hope this series is interesting to some, but if not, at least it's interesting to me!