The most common (and reliable) forms of anti-spam rely on comparing email content. If you have access to read the email of 1million email accounts, you could quite quickly and easily see which identical (or near-identical) messages are being sent to a large proportion of those users. This is how Google blocks spam for its GMail users, just as Yahoo and MSN do.
There are many free services available to allow you to try and block spam. For example my email goes through 4 individual scans: Bayesian Filtering; Spam List: ORDB; Spam List: SpamCop; and Spam List: SpamHaus SBL+XBL. Each agent gets 10 "points" where they can give the probability of the email in question being spam. If the total score is 10 or less, the email is usually safe. If the score ranges from 11 to 30, the email gets hurled in my junk email box which I check once a week. If the email gets more than that, it gets binned immediately.
If they were to create a webservice that your email server could contact when new messages were received — like these other services — I think they could seriously harm the spammers around the world.
They could also do a very similar service for comments on blogs. There are services like Akismet where you can submit that incoming content to your blog and find out if it’s spam or not. Google, using its (practically infinite) knowledge of what’s happening on the Internet, could easily provide another webservice to query if the content has featured on another site. If it has, there’s a good possibility that the message might be spam.
It could also provide information about any sites linked from the comment body. Nobody wants their site linking to spyware-pushing sites but when users have the reigns, it’s hard to stop them without enforced moderation.
But there’s a problem…
Google is in this game to make money. You might think giving away gigs of email storage would count against them but it’s overselling to begin with — that being they assume you’re never going to get near the upper limits of your account. They also get a massive amount of data from being able to read your email. This allows them to target advertising services based on your email content and advertising means a lot to Google as it has made them severely rich over the years.
Yahoo and Hotmail have exactly the same deal going. If people use their webmail, they get advertising all around the viewport. They also stick adverts at the footers of outgoing emails.
Google/Yahoo/Microsoft also get to look at the contents of your email. This is mostly harmless but they can see where you’re getting email from and what sites you shop at, how much you spend, etc.
To most this level of trust is fine but, again, Google exists to make money. If they can exploit that data, chances are they will. This could mean placing targeted adverts around your email, or even tracking you out to other pages and targeting you there.
If they did publish a spam webservice, they would be gaining access to billions of emails. That may bother you.
About Oli: I’m a Django and Python programmer, occasional designer, Ubuntu member, Ask Ubuntu moderator and technical blogger. I occasionally like to rant about subjects I should probably learn more about but I usually mean well.