Jeff Forristal here. I run a couple of personal web sites that host the usual gamut of material found on personal web sites (pictures of my kids and family, my latest favorite LOLcat graphic, etc.). Recently I updated one of my sites and added a new comment/contact submission form so I wouldn’t have to expose my email address for spam harvesters to find. A mere few hours after enabling this new comment submission form on my web server, I started to get some comments; unfortunately they were all comment spam.
Of course, web comment spamming has become yet another fact of life on the Internet. Blogs and forums with unmoderated/open comment submission functionality are quickly getting clogged with random and off-topic spam. Basically the spammers are taking the content they would normally send you in email, and now posting it to web site forums and blogs too. This is why comment moderation and CAPTCHAs are becoming the norm for forum and blog comment posting.
Anyways, there is nothing all that exciting about receiving spam of any variety. However the nature of the comment spam I received caught my eye. Over the course of a few weeks, I received multiple comment spams that all had the same format, but different random values.
Two examples of the comment spam I received were:
Email: [email protected]
Comment:
jJsAdx <a href="http://aereogakjvpd.com/">aereogakjvpd</a>, [url=http://ubfcdpkfggto.com/]ubfcdpkfggto[/url], [link=http://wiogiusmvjcz.com/]wiogiusmvjcz[/link], http://ejmugotxbmqc.com/
Email: [email protected]
Comment:
fHwI3w <a href="http://uwtsayqpclib.com/">uwtsayqpclib</a>, [url=http://wetpnicpwkfd.com/]wetpnicpwkfd[/url], [link=http://bvtyjneqigek.com/]bvtyjneqigek[/link], http://prcghesjscpl.com/
We can make an educated guess that this comment submission isn’t directly about blindly shoving links onto sites in order to bolster incoming link counts (inflating PageRank ratings and aiding in SEO efforts), because all the links are random and don’t reference real sites. There is no practical value in the data itself that was submitted, except as acting as a probe to see what comes out the other side of the submission process. The data contains four different link formats (an actual HTML <A> tag, two flavors of popular bulletin board markup tags, and just a raw URL), and perhaps a fifth if you count the email address too. That makes me believe the end goal was probably to find a way to inject a clickable link of some sort. Had the action been successful, perhaps the same software would have then subsequently tried to inject more meaningful links (back to our SEO theory). Or, perhaps this was just a pre-cursor scouting application compiling a list of URLs known to allow arbitrary link injection (such lists would commonly be paired or sold with spamware/crimeware apps used to inject content onto listed sites already known to be open to receiving the injection).
Of course, being the curious individual that I am, I started to wonder:
- The app managed to put something resembling an email address into the email field. Now, sure, the form field name was ‘email’, so it wouldn’t be that hard to deduce; what would happen if I named that field something less obvious ('mail', 'eml', 'e', 'foo')?
- The links were submitted in a textarea field; will it submit the same type of data in a more constrained input text box?
- Will it try to inject content into other identifiably named fields, such as 'phone', 'address', 'url', etc.?
- Does the app support cookies? Is it a well-behaved web user agent? How smart is it?
- Can the app's injection logic handle multi-step submissions, where you submit to one page/place and the data eventually appears on a different page/place but not within the actual submission response (think: you submit a comment, you get a page that says "thank you", but then have to click one more link before you get to where your comment is actually displayed)?
- And of course, the million dollar question: what would have happened had their injection succeeded and produced a clickable link?
I’m not one to leave such important questions burning on my mind (heh), so I came up with a plan to get the answers I want. I will lay a trap for the comment spammer (or rather, their app). The idea is simple: through the magic of some server-side logic, I will detect when this comment spamming app is making submissions to my forms (assuming they come back in the future; but historically they seem to visit me on a fairly regular basis, so it seems a safe assumption). Rather than display the usual "thank you for your submission" response, I’ll instead feed a specially crafted response to make it seem like the submission actually produced a clickable link (making me look vulnerable). Further, I will then flag that session/IP as a spammer, and all subsequent requests will result in some extra forms being added to the web page responses. Those forms will be designed with certain characteristics meant to gauge the effectiveness and operation of the application and its web crawler; I’m assuming the app will repeat its typical injection testing process for as many forms as it encounters.
So if all goes accordingly to plan, I intend to post a follow-up to this aptly titled article in a few weeks. The follow-up will of course be entitled "Second encounters with a web comment spammer."
Until then!
- Jeff