A moat against spammers
If you’re like me, you probably have eliminated almost all of your daily spam email, by being very diligent.
You, and I, read every privacy section on every beta we apply for, to feel secure that our email address won’t get in the Infernal Spam Database(TM). It pays to be diligent, and safeguarding your email address is a smart thing to do, and you should, but you can’t do it forever, you know.
Some day, on some website, you need to use that precious email address, so People actually can contact you, and People generally like to do so, since that is what the internet is all about. These same People also don’t like to fill out forms, they like to click links, especially mailto: links . This gives them a warm and fuzzy feeling, and also let’s them keep a copy in their Sent folder.
You and I have a strained relationship with mailto:links. So we have devised schemes (like mail-obfuscation) and voodoo-trickery to let People use mailto:links so long as they themselves use voodoo, in the form of JavaScript-enabled browsers. This leaves The Spammers in the dark. At least for now.
The Spammer Never Sleeps
Spammers don’t like to be in the dark, and so They strive for perfection in their spam-robots. Using all Their knowledge about searching text, and using a heap of regex The Spammers usually manage to suck out all the email addresses They want. According to the report from Centre for Democracy & Technology, published in 2003, e-mail obfuscation seems to work, but for how long? Probably only so long as it takes Them to make a regex converting obfuscated email addresses into correct email addresses. Considering 2009 – 2003 = 6 years They are probably on to Us.
The goal must be to hide that mailto-link and the email address with it, but still make it possible for the People to click links.
For those about to be spammed, we salute you
We have two requirements for our email-address:
1. It has to be easy to type for an author / contributor. No large codes!
2. We have to be able to use any email on whatever domain.
Seeing as every email-address under a domain is unique, we can simplify emails that belongs to the current domain to only the username.
I’ll be using the address ‘iluvspam@3djegrad.net’ for this article. On the demo-site, the link for this email-address will let People click it, open Their favourite email-app, send the email and continue on their Quest to find the End of the Internet.
To do this we’ll use Apache’s mod_rewrite capabilities, to rewrite URLs on the fly, and we will be able to write the email above as:
<a href="contact/iluvspam" rel="no-follow">iluvspam</a>
Open up your favourite text-editor (notepad, or textedit will do fine though I prefer the excellent and gratis TextWrangler from BareBones Software), and start a new file called ‘.htaccess’. Save this file in your root directory on the webserver.
If you have used mod_rewrite before, the already existing .htaccess will probably look something like this.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
</IfModule>
This enables the RewriteEngine in Apache and sets the RewriteBase to /, sort of like BASEHREF in HTML, but not exactly the same.
What we need to add is a RewriteRule which fires every time someone clicks the iluvspam link.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
#Email-harvest prevention for any domain
RewriteRule contact/([a-zA-Z0-9-_\.]+)/([a-zA-Z0-9-_\.]+) email.php?email=$1&domain=$2
# Email-harvest for this domain
RewriteRule contact/([a-zA-Z0-9-_\.]+) email.php?email=$1
</IfModule>
Rewrite-email-to-the-what-now?
Before we start, grab a box of concentration and open your head.
All those characters jumbled together like an imploded anthill, are regex. As I said earlier, regex are regular expressions which lets Spammers work with text. We can also use them for Good.
- ^ (the caret sign) means ‘the start of the text’
- () parenthesis groups text together in a variable which can be used again.
- $ means the end of the string.
- [] groups together a string of letters to be handled later by another parameter, like ‘+’ or ‘*’.
As an example: - ^abc matches abcdef but not aabcdef, since the match doesn’t start with abc - abc$ matches aabc, but not abcd, since the string ‘abcd’ doesn’t end in abc - ^abc$ only matches abc, seeing as we tell the regex-engine to look for a string that starts with abc and ends with abc.
Back to our .htaccess-file:
# Email-harvest for this domain
RewriteRule contact/([a-zA-Z0-9-_\.]+) email.php?email=$1
The RewriteRule above tells Apache to forward all links which refer to ‘contact/(something)’ to the email.php script which parses the URL and returns the correct email to the (hopefully shiny and happy) People.
Phew! Take a deep breath before we dive back in … Ready? Let’s go!
The RewriteRule utilize regex for pattern recognition. The pattern it looks for is ‘contact/’ followed by any letter between a-z in either lower- or UPPERCASE and/or any digits from 0-9 and/or the characters ‘-’, ‘_’ and ‘.’.
This should cover all email-adresses. The plus sign at the end tells the Apache-server that we are looking for multiple characters. If you’re observant you might have noticed that the period-sign (.) has a backslash infront of it (.), and a bracket encloses the text inside the parentheses. The backslash is an escape-character in regex, and tells the regex-engine that it should allow the literal character ‘.’ (period), and not treat it like a wildcard, which is the default behavior for ‘.’ in Regular Expressions. The bracket groups text together to be worked on by parameters later, like the ‘+’
# Email-harvest for this domain
RewriteRule contact/([a-zA-Z0-9-_\.]+) email.php?email=$1
The $1 in /email.php?email=$1 tells the regex-engine to use the found-text variable we grouped with parenthesis ([a-zA-Z0-9-_.]+) and set the variable email to the found text, which in our case is iluvspam. The rewritten URL will then be: /email.php?email=iluvspam
So when People click ‘iluvspam’ they send the variable ‘iluvspam’ into email.php. email.php in turn is configured to append 3djegrad.net after all email variables, unless something else is defined.
As you’ve seen, we have two RewriteRules in our .htaccess-file and the other one let’s us type and send email to addresses on other domains, besides our own. That line looks like this.
#Email-harvest prevention for any domain
RewriteRule contact/([a-zA-Z0-9-_\.]+)/([a-zA-Z0-9-_\.]+) email.php?epost=$1&domene=$2
Using the knowledge you’ve gained above, you should be able to recognize that we now have two variables in the RewriteRule. The first variable is the email and the second is the domain. This RewriteRule let’s us write email-adresses to other domains like so:
Contact him via<a href="contact/chrleon/gmail.com" rel="no-follow">his email-adress</a>
Just one thing. The more advanced RewriteRule has to be defined before the simpler ones, or the regex-engine will recognize the simpler form before the more advanced one and fire, never activating the more advanced one. So in this case the .htaccess looks like this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
#Email-harvest prevention for any domain
RewriteRule contact/([a-zA-Z0-9-_\.]+)/([a-zA-Z0-9-_\.]+) email.php?epost=$1&domene=$2
# Email-harvest for this domain
RewriteRule contact/([a-zA-Z0-9-_\.]+) email.php?epost=$1
</IfModule>
You can learn more about Regular Expressions at regular-expressions.info and at weblogtoolscollection. Or do you need to read it again?
.htaccess files are extra configuration-files for the Apache web-server. To learn more about them, got to the Apache documentation website.
So, what does email.php look like?
The file is relatively simple. All it conceptually does is to get the user-name from the email/iluvspam and append the domain address. Try to also check the email-variable for non-allowed characters, or better yet do a white-list check where you only allow the characters a-zA-Z0-9.-_. That way you guard against injection-attacks. Any comments on the security issues are most welcome.
Here is the code for email.php, with comments.
// set two variables to hold the generated email-adress
// the $domain variable is used for email-addresses on THIS domain
// if no domain is entered in the link like contact/abc123
// then the email will be sent to 'abc123@3djegrad.net'
$recipient = " ";
$domain = "3djegrad.net";
// The email-address is gathered from the GET-request
// This GET-request is from the link you clicked
// An if-statement checks to see that the email is set
if ($_GET['email']) {
$recipient = $_GET['email'];
}
// The domain is gathered from the GET-request.
// if it is set, we replace the $domain variable with the correct domain
if($_GET['domain']) {
$domain = $_GET['domain'];
}
This is the gist of it, but we still haven’t made any headway on opening uur Readers’ email-application. That’s the next step.
We open the Readers’ email.application with a header-change in php, like so:
header("Location: mailto:$recipient@$domain");
This will open the mail-app with the correct email-address, already filled in. But there’s just a blank page in the browser.
All the People who want to contact you, don’t like to see blank pages. They’re a tough crowd.
For this to work we must rely on voodoo-trickery, since PHP won’t let us send two header(location)-commands to the browser. If we try to send two header(location), only the last one fires. Even with the PHP function ob_start(). This is for security-reasons.
echo '<script type="text/javascript">history.back()</script>';
$explanation = file_get_contents("spamexplain.html");
echo '<noscript>';
echo $explanation;
echo '</noscript>';
The code above does just that. The first line tells the browser to go back one page. The following lines prepare a page for those Readers who for some reason have Javascript turned off or there is no Javascript available to them. This is our graceful fail.
But we’re not in Kansas yet, Dorothy.

Oi! What about my statistics!?
It seems that Firefox and Opera generates a hit when you use the back-button, and so does using the history.back() function. Safari is the only browser not doing this, as I can tell from my tests.
The solution for this is to set a $_SESSION-variable in the email.script and test for that variable on the article page. If the variable is set, then don’t load the counters. After that test is run, destroy the session-variable. Now the count won’t register.
For this example I just use a date and time-stamp in a textfile
if (!isset($_SESSION['url'])) {
$filename = fopen("counter.txt", a);
// This could be Google Analytics or others
// like so: include('google-analytics-script.php');
// write the date and time in the logfile, followed by a newline
$dateandtime = date("d-m-Y - H:i:s",time());
fwrite($filename, $dateandtime . "\n");
fclose($filename); // close the file
unset($_SESSION['url']);
// unset the session-variable so that the counting still works on
// the rest of the page. Remember, we only want to drop the count
// on this one page, after People have clicked their happy
// mailto:links.
}
Try it out
Demo-site: 3djegrad.net/moat
So there you have it. Hopefully spamfree, and humanreadable email-addresses on the internet.
The People will be glad they can click on mailto-links again. Wayhey!
Thanks for reading. Any comments? Submit them below.
Creative Commons photos by: http://www.flickr.com/photos/wheatfields/