Having had tremendous feedback and support on the first implementation of KittenAuth I had pushed out, it seems only fair to push this project further and bring it to a stage where everyone can use it.
There have already been several version made in ASP, my version in .NET and a few versions in PHP with varying features. I’m going to be chatting with their creators in the next few days to work things out and try and come up with a universal best-implementation that forefils as much of the problem domain as possible.
Preventing automated image comparison
The problem with the current version that I’m running on TPCS is that if a bot has an image that I’m using and knows its a kitten, its really easy for it to compare it, with almost no processing time (compared to OCR). There are several ways to get around this limitation:
Having a massive number of images in the database and only showing a few images at a time to a client makes it much harder for a bot/script to harvest all the images before being thrown to humans to identify them. Doing this practically is much harder. As soon as you keep the same images and layout for a validation (eg everything stays the same until they get it right) the bot can brute the grid over and over again until it passes. This can be improved by IP time-blocking after (say) 3 incorrect attempts.
Another way of stopping people harvesting all your images is never to use the same image for more than one verification. This would mean you’d need 9 unique images — and I mean unique — for each user that you needed to verify. A system like this could be established by having multiple webcams on multiple animals or in a cattery/kennels/etc. Its a cute idea but its not that practical.
This still doesn’t stop long-term or mass-bot harvesting of all the source images. The next step is to automate the input of images from other sites such as Flickr. By doing a weekly batch download of 500 pictures from Flickr, you should be able to keep the bots on their toes… or at least make it too much effort for a spammer to sort through all the images on a weekly basis.
The real killer move is to stop just using kittens as the "correct" pass. By using multiple "win-groups" and asking the user to click different things each time, this means the spammers will need to keep on top of several different groups of images. This has the bonus of again making it such hard work that it no longer becomes viable.
So in practice you would have directories of images of "kittens", "puppies", "hedgehogs", "skylines" and "cars". Each time a verification request comes in for a user, a random group is picked as the win-group. Random images are selected from its directory and placed. The other squares are filled in with other random pictures.
This group system would be easy to update from things like Flickr due to the tag search and API provided by them — But it does divert away from the cute.
Preventing verification via free porn "users"
Something that was unaware to me 20 days ago when I wrote the original article was the way that some CAPTCHAs are circumvented by. Its incredibly simple but seems like it really would work. Spammers offering services to enlarge certain bodily organs would of course probably be running free-porn websites with their advertising all over. The two realms of marketing go hand-in-hand.
The current hack goes something like this:
- The porn user hits the free porn site
- The spammers bot gets the request and loads up the registration page on a free-email site
- The bot takes a screen shot of the CAPTCHA on the page and outputs it on the porn page saying "Enter this code for free porn!!!"
- The starry-eyed user enters in the code and this gets passed through the bot and back into the registration form. If they’re not stupid, registration completes and the spammer has a new email account — The user might even have his free porn.
As far as I can see there’s only a few of ways to defeat this. The first of which I think I’ve already combated.
1. Use a set of pictures too cute for porn. "Every time you masturbate, God kills a kitten." A message like that cannot be ignored — But its not foolproof
2. Stamp the images with text saying for people not to use the system if they’re on a porn site. Its a bit complex and a lot to fit into an image but it might work for some users.
3. Changing what’s being looked for. The same system as I suggested before. The bot needs to ask the porn user what to look for. If this keeps changing, they’re going to have to beef up to stay on top of things.
1-in-84 is too easy to brute-force
That’s a very fair statement. The current 3x3 grid with three selections makes it really easy if there is a spammer with lots of computers that can spam the site. Again there area couple of ways to improve this and drag the number of combinations up.
Firstly asking the user to click "all the kittens" or "all the foxes" and changing how many are in the grid you end up with a 2^9 combinations situation — that’s a 1-in-512 chance now.
If we increase the the grid to 4x4, we get 65,536 combinations!
If we increase the size of the grid slightly to 4x4, we get 2^16 and we’re suddenly at 65,536 combinations! That’s much more acceptable… You could take it even further (5x5 gives 1-in-33,554,432) but it makes the user have to concentrate harder on the task of verifying their existence. I don’t want to require effort from the user.
What about visually impaired users?
KittenAuth is a replacement for visual CAPTCHAs. Nothing more.
This problem has had me banging my head against a brick wall. There’s just no easy way for this system to handle people that cannot see the images. The way other systems get by this problem is offering an alternative audio CAPTCHA that speaks letters to the user based on a massive sound-bank of voices. There is some distortion added to the system to make it slightly harder to hear.
The only thing I can say on this is: Scrambled-Text-CAPTCHAs do not cover this. Neither does KittenAuth. It is a replacement for visual CAPTCHAs. Nothing more.
There is long term scope for offering different types of sounds (eg of cats or planes and asking the user to identify them) but nothing as simple as KittenAuth. I’m not telling you to lock your blind users off your website, I’m telling you to look into the audio-CAPTCHA methods. Just remember, the spammers will go for the weakest point which could well be the audio.
When will this be available?
I’m still doing a lot of personal research on this and I really hate PHP with a passion (I like my programming languages to have a bit of backbone) so its taking me longer than I’d like getting things sorted out. I’ve also got the upbringing of KittenAuth.com coming along and I’m fighting various prefab CMS systems against each other to see which is the best for my purposes. I’ll probably settle with a hacked WordPress install.
When there’s a site up there, I’ll start the process of gathering people together and putting out some code for people to use. Hopefully it wont be long after that and there’ll be plugins for software. There’s already a phpBB version (which looks super by the way) but it needs to come some distance to cover people’s hesitations with the process.
About Oli: I’m a Django and Python programmer, occasional designer, Ubuntu member, Ask Ubuntu moderator and technical blogger. I occasionally like to rant about subjects I should probably learn more about but I usually mean well.