Avoiding “Google” spam: do what others aren’t

This article was written  when I ran Drupal. I have since switched to WordPress, where Akismet is your friend.

Anybody operating popular CMS or forum software in a publicly accessible venue (like this site, that runs the Drupal CMS) has experienced the effect that Google has had on people’s internet conduct.

Comment-Spam

For those unaware, Google “unbiasedly” ranks websites based upon a patented and entirely secret PageRank algorithm which looks into a number of factors, including the interconnected fabric of internet hyperlinks into their analysis (as opposed to simply looking at the words on the pages). This is what initially separated Google from the pack of search engines back when the internet was first getting started.

That’s all well and good, but fast forward 12 years and suddenly Google has become the omnipotent front page of the internet. Those who have highly ranked pages receive highly valued ad revenue every time someone visits their page. There exists a tremendous motivation to put links to your page all around the internet: but what is the best way to do it?

Well it just so happens that if you can figure out a way to put spam onto one type of CMS that is used by hundreds or thousands of websites worldwide, you can artificially inflate your pagerank score. So the brightest minds get to scripting and curling their way through the internet, and figure out how to bypass even the best CAPTCHA algorithms in use by CMS software. For example, phpBB3 has had most of its default CAPTCHA methods (even its reCAPTCHA implementation) broken by Spambots. Make no mistake, the fact that their software is entirely open source definitely contributes to software being easily breakable. You don’t have to reverse engineer it: you just read the code.

So how do you defeat them? The key is to not do what everybody else is doing. Why? Because if what you’re doing is common enough, the script kiddies will find a way through. Instead, realize that scripts are dumb and then trip them up. For example:

  • Change the name of one of the form input names. When the spambot sends the old name (because many don’t bother to look at the page and just send a request), throw a random error.
  • Keep one of the form fields on the page, but use a style=”display:none” div to hide it (the “confirm_email” field in PHPBB3 registration). When a “confirm email” is magically posted, throw a random error and fail.
  • Fetch your captcha by AJAX. Most scripts won’t process AJAX.
  • Use an abnormal captcha, such as a required field with a sentence explaining what to put in the field.

This is not all a free lunch though, because Google is actively working to remove such internet spam from their analysis (and thank you Jesus for that). Until Google becomes all-knowing, you’ll have issues with people spamming your blog. Until then, using a number of these tricks or even coming up with something new will get you a long way to stopping ridiculous amounts of Google comment spam.

One thought on “Avoiding “Google” spam: do what others aren’t”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.