Earlier I wrote my First Encounters with a Web Comment Spammer piece. In that piece I devised a plan to lay a trap of sorts for the web comment spamming application, in order to test the depth of the application's functionality. Well, it's been a few weeks, and now I have some data to share.
The most interesting thing to note is that a few more comment spam applications/crawlers have made their way to my comment form. These new ones exhibit different behavior than the original one I reported on, thus I believe they are entirely different applications. For now, I’m going to stick to the original application I previously discussed; I'll compare my results to these newer spam apps in a future blog post.
One thing I noticed is that many of these comment spam attempts were coming from systems located on the 18.104.22.168/24 network. A large number of them were also using the User-Agent string "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)". Both of these factors turned out to be good indicators of whether the request was coming from a spam bot.
Anyways, here is an example of one submission I received. The names at the beginning of each line are the name of the form field; all of the fields are text input fields (as in, "<input type=text>"), except for 'other' and 'comment' which are textarea fields.
The most obvious things visible from this data are that the application filled in all fields with random garbage. It managed to put something that resembled an email address into the 'email' field, but not the 'eml' field (which is the actual email address field shown to the user for data entry). The application also managed to put a URL into the 'url' field, but not the 'link' field. This makes me believe the application is pre-programmed with a few specific field names where it will submit data of a specific format. Also interesting/notable is that the application submitted the same blob of link garbage to both textarea fields ('other' and 'comment'), and not any of the text input fields.
Another thing I failed to notice before is that the application does actually have the ability to handle multi-step submissions. I recognized the behavior in my logs: whenever the form was submitted, the same user-agent would then go through every link on the page (in exact order of appearance, none-the-less) and subsequently request it. I assume this behavior is to deal with web applications that return a "thank you for your submission" page along with a link taking you back to the forum/comment area where the new submission will appear.
Interesting info, perhaps. But I’ve found that I grown bored with this particular application and its lack of intelligence; the newer bots I’ve been seeing have actually been doing a lot more interesting things. I will take a deeper look at these new bots, and how they differ, in my next blog post. After that, I'll share a few effective tricks I've been using to tell these spam bots apart from the humans (without CAPTCHAs!).