An Unfinished Symphony

It's about the internet and stuff.

Advanced spam control with mod_rewrite

I can't remember where I first got the idea from, but for some time now I've been using mod_rewrite to protect against spam and hack attempts, and this has worked quite well for some time. Essentially, I have a number of rules contained in my .htaccess file which are designed to block attacks from "users" displaying common traits – with one of those common traits being the absence of a user-agent string from the request headers.

As was pointed out to me yesterday, there's no obligation for any user-agent (UA) to send a user-agent string as a part of its request headers. I have no quarrel with that statement at all – except, on this site there is. Considering the vast number (several thousand per month) of attempts to directly access comment and contact forms, or to access non-existent files with random character file names by bots, spammers or hackers whose request headers lack a user-agent string, and the fact that it is very rare in my experience for a legitimate visitor to not include one, I decided that it was a requirement for visiting here and used the following code to block them:

The mod_rewrite code used to block visitors without a user-agent string
  1. RewriteEngine On
  2. RewriteCond %{HTTP_USER_AGENT} =""
  3. RewriteRule .* - [F,L]

Line 1 turns the rewrite engine on. Line 2 sets the condition to be checked for, in this case an empty user-agent string (denoted by the absence of content between the double quote marks), and line 3 says what should happen when the condition is met – with the F stating that the request should fail. In which case it returns a 403, forbidden, error.

As I said above, that has worked quite well for some time and I've been happy with the effect that it's had on the amount of spam I've experienced. However, when checking my access logs on a couple of occasions recently I noticed that something had been trying to access a file relating to the Text Link Ads service; in order to check that their adverts are working properly their server periodically checks publishers' sites to make sure that the adverts are displayed. Whilst this is a reasonable, and sensible, thing to do it appears that their server fails to include a user-agent string in its request headers – meaning that every attempt to check my site was being rejected by the server, which isn't so good. Consequently, this meant that either I had to stop blocking them, or they had to include a user-agent string in their headers

As my attempts to explain the situation to their support people seemed to be met with misunderstanding it turned out that I had to stop blocking them. Though this wasn't as simple as just removing the code from my .htaccess as this would only result in my being bombarded with spam and hack attempts yet again. Instead I had to check for two conditions instead of one, with the extra condition being that the visitor wasn't them. To do that I also checked to see if the visitor's IP belonged to their server or not, like so:

  1. RewriteCond %{REMOTE_ADDR} !^12\.34\.567\.89$

That line of code checks to make sure that the visitor's IP is not the one listed (nb. that is just a dummy IP address rather than their actual one). If both conditions are met (not the listed IP and no user-agent string) then the visitor gets blocked. When added to the previous code we get the following:

The amended mod_rewrite code
  1. RewriteEngine On
  2. RewriteCond %{HTTP_USER_AGENT} =""
  3. RewriteCond %{REMOTE_ADDR} !^12\.34\.567\.89$
  4. RewriteRule .* - [F,L]

While that snippet of code will allow them to access my site even when they have no user-agent string in their request headers, and while there's no obligation for one to be included (as mentioned previously), I personally feel that it would be wiser for them to fix their software to ensure that it identifies itself when accessing remote servers. Not doing so means that it's quite easy to confuse them with spammers and hackers who do their best to disguise their actions and methods, and so leaves them to potentially be blocked by many other users who might take similar measures. Hopefully the support person that responded to my queries will pass the matter on to someone who will understand the issue and be able to do something about it.

Up arrow