More Spam = Less Spam

It might sound like a totally crazy thing to suggest, but it can be true. If you play the “game of spam” properly you can find your email addresses receiving more spam but much less, even none, getting through to your inbox.

For those that aren’t familiar with the process of how we get spam, I’ll run through some of the methods spammers use to get our email addresses:

From a web page.
If a site is displaying your email address on their pages, chances are an automated computer will scoot over the page and see your email address, even if you post it like: me at domain.com
Buying them from dodgy websites
There are plenty of websites that collect you details and won’t think a second about selling your details on if they can make some cash from it. It’s unfortunate (for us) but true, nevertheless.
Sites getting hacked
Just as some sites will sell us out, some dont but some sites get hacked and all their user information can be used to spam. This happed in America where an AOL database with millions of accounts was hacked and sold off.
Random chance
Lots of spammers will still try and probe domains by writing seemingly safe looking emails to random accounts. If somebody replies they go on the list of “good email accounts”. There are other methods for detecting to see if an email reaches its target but I won’t go too far into those now.

The sad thing is, once you’re on one of these lists, that’s it. For life you’ll be getting increasing amounts of spam for that account. You could move account but that requires you to keep all your accounts up to date and if one of those account providers is selling your details on, you’ll just end up at the beginning again.

Deploy the countermeasures!

Because of the volume of spam, it becomes increasingly popular to use spam-lists where companies that deal with lots of email can report suspected spammers. Blocking using this method can cut back on around 60-70% of spam, alone.

Bayesian analysis is becoming more and more popular in all fields of spam prevention. It works by comparing messages you receive and ranking them based on how ‘spammy’ they look. The user can say “this is definitely spam” and this teaches the system more things to look out for.

This learning process can be very slow though. You can speed it up by letting it use the spam-list rules at the same time but even together, the system can take a long time to adjust to new spamming regimes.

Turn the volume up to 11

I hope you get the reference to Spinal Tap.

The higher the volume of spam your learning system is dealing with, the higher chance it’s gets at matching a message against a previous spam signature.

If you have 100 emails coming in per day and 90 of them are spam, it’s going to match those against each other and determine which are the spam and you gain up to 90 new spam signatures. If you have 1,000,000 coming in for the same ratio of real emails, you’re giving it so much more learning data that it’s going to become a very powerful and clever tool for filtering out your spam.

“I’m with GMail/Hotmail/Yahoo! Mail… I don’t get spam”

For exactly this reason: those companies are dealing with millions of emails every day. They can pool their learning data and effectively spot the real emails in the sea of spam.

Any one of them could end spam forever, for everybody. Seriously. If Google, Microsoft or Yahoo! were to provide a web-service that allowed your email server to use their learning data, you’d find you would get next to 0 spam. Implemented globally, I’m sure we’d see the end of nuisance emailing.

However, they don’t because they would lose the incentive for people to use their software — with that, the users that pay and click adverts.

How to stop spam without them

If you’re somebody who has their own email server, away from services like GMail, you have to do what Google does — get tons of spam.

My favourite method is setting up a catch-all email address for my domains. This gets email to every non-assigned email address on my server. I then tell the filters that every email that comes through that account is spam. I then submit a load of email addresses that would go through that to known spam-collecting sites, even posting them on random websites for the spiders to find.

Now I get roughly 1000 messages coming to the server every day. 30-50 of those will be valid and get filed appropriately. Roughly 700 of them are coming through the catch-all account so they’re analysed and deleted. The other 250-270 coming in on my real email addresses are recognised as spam and get put in my junk mail inbox for a week.

I should note that it is very important that you don’t dump your junk mail on your real accounts automatically. Bayesian analysis isn’t perfect and it WILL catch some emails that are valid. I’d therefore suggest white-listing the addresses that you know are real, adding server-side rules for those people, if you can — but be sure to glance down your junk mail just to check to see if you’ve forgotten, preferably on a weekly basis.

Another method is weighting spam. If your system thinks something is 100% spam, its usually safe to say it is and you should have it learn about this variation and delete the email message. You should be able to give your checking services different authority scores to make up your final spam score between them, this way if you use a white-list, you can assign it 100% authority to override any other lists or learning systems.

Intelligent clients

Although my server-side method does slice out 99.something% of spam, I do get the occasional notice where enough information has been faked for it to get through the email filters but even those rarely stay in my inbox for more than a millisecond.

I use Thunderbird, Mozilla’s POP3, IMAP, SMTP and RSS client. It has a genius system built into it that can work with the spam-scores that get assigned to emails (which get added to the headers of emails) and use its own algorithms to determine the likelihood of spam. It also learns from the emails that are automatically placed in my junk-mail folder.

In conclusion…

I get no* spam, I don’t pay for the privilege (like John C Dvorak) either and I’m very happy with my wash. *By no spam I that 1 or 2 get through each week but I have had some weeks without a single message making an impact.

Your email server does have to do a bit of work though especially if you’re dealing with millions of emails every day. If you’re dealing with those sorts of volumes, you can usually split the jobs out to multiple servers to balance the load.

With a similar system, you should find that you no longer have to choose to go with a company that provides their own spam protection.