Improving KittenAuth

By Oli on Tuesday, 25th April 2006. More information. Comments.

Having had tremendous feedback and support on the first implementation of KittenAuth I had pushed out, it seems only fair to push this project further and bring it to a stage where everyone can use it. There have already been several version made in ASP, my version in .NET and a few versions

Having had tremendous feedback and support on the first implementation of KittenAuth I had pushed out, it seems only fair to push this project further and bring it to a stage where everyone can use it.

There have already been several version made in ASP, my version in .NET and a few versions in PHP with varying features. I'm going to be chatting with their creators in the next few days to work things out and try and come up with a universal best-implementation that forefils as much of the problem domain as possible.

Article Links
KittenAuth v1 Test
KittenAuth v1 Article

Preventing automated image comparison

The problem with the current version that I'm running on TPCS is that if a bot has an image that I'm using and knows its a kitten, its really easy for it to compare it, with almost no processing time (compared to OCR). There are several ways to get around this limitation:

Having a massive number of images in the database and only showing a few images at a time to a client makes it much harder for a bot/script to harvest all the images before being thrown to humans to identify them. Doing this practically is much harder. As soon as you keep the same images and layout for a validation (eg everything stays the same until they get it right) the bot can brute the grid over and over again until it passes. This can be improved by IP time-blocking after (say) 3 incorrect attempts.

Another way of stopping people harvesting all your images is never to use the same image for more than one verification. This would mean you'd need 9 unique images — and I mean unique — for each user that you needed to verify. A system like this could be established by having multiple webcams on multiple animals or in a cattery/kennels/etc. Its a cute idea but its not that practical.

This still doesn't stop long-term or mass-bot harvesting of all the source images. The next step is to automate the input of images from other sites such as Flickr. By doing a weekly batch download of 500 pictures from Flickr, you should be able to keep the bots on their toes... or at least make it too much effort for a spammer to sort through all the images on a weekly basis.

The real killer move is to stop just using kittens as the "correct" pass. By using multiple "win-groups" and asking the user to click different things each time, this means the spammers will need to keep on top of several different groups of images. This has the bonus of again making it such hard work that it no longer becomes viable.

So in practice you would have directories of images of "kittens", "puppies", "hedgehogs", "skylines" and "cars". Each time a verification request comes in for a user, a random group is picked as the win-group. Random images are selected from its directory and placed. The other squares are filled in with other random pictures.

This group system would be easy to update from things like Flickr due to the tag search and API provided by them — But it does divert away from the cute.

Preventing verification via free porn "users"

Something that was unaware to me 20 days ago when I wrote the original article was the way that some CAPTCHAs are circumvented by. Its incredibly simple but seems like it really would work. Spammers offering services to enlarge certain bodily organs would of course probably be running free-porn websites with their advertising all over. The two realms of marketing go hand-in-hand.

The current hack goes something like this:

  • The porn user hits the free porn site
  • The spammers bot gets the request and loads up the registration page on a free-email site
  • The bot takes a screen shot of the CAPTCHA on the page and outputs it on the porn page saying "Enter this code for free porn!!!"
  • The starry-eyed user enters in the code and this gets passed through the bot and back into the registration form. If they're not stupid, registration completes and the spammer has a new email account -- The user might even have his free porn.

As far as I can see there's only a few of ways to defeat this. The first of which I think I've already combated.

1. Use a set of pictures too cute for porn. "Every time you masturbate, God kills a kitten." A message like that cannot be ignored — But its not foolproof

2. Stamp the images with text saying for people not to use the system if they're on a porn site. Its a bit complex and a lot to fit into an image but it might work for some users.

3. Changing what's being looked for. The same system as I suggested before. The bot needs to ask the porn user what to look for. If this keeps changing, they're going to have to beef up to stay on top of things.

1-in-84 is too easy to brute-force

That's a very fair statement. The current 3x3 grid with three selections makes it really easy if there is a spammer with lots of computers that can spam the site. Again there area couple of ways to improve this and drag the number of combinations up.

Firstly asking the user to click "all the kittens" or "all the foxes" and changing how many are in the grid you end up with a 2^9 combinations situation — that's a 1-in-512 chance now.

If we increase the the grid to 4x4, we get 65,536 combinations!

If we increase the size of the grid slightly to 4x4, we get 2^16 and we're suddenly at 65,536 combinations! That's much more acceptable... You could take it even further (5x5 gives 1-in-33,554,432) but it makes the user have to concentrate harder on the task of verifying their existence. I don't want to require effort from the user.

One this this does make a difference in is the javascript. The current model "knows" there are going to be 3 selections and thus auto-submits after its been displayed. For a random number, there are going to have to be some changes, including but not limited to actually just stripping out the JS and having checkboxes for each picture and a submit button. Its not as crazy as it sounds. There could be some intermediary JS to check the box or change the background but no auto-submitting.

What about visually impaired users?

KittenAuth is a replacement for visual CAPTCHAs. Nothing more.

This problem has had me banging my head against a brick wall. There's just no easy way for this system to handle people that cannot see the images. The way other systems get by this problem is offering an alternative audio CAPTCHA that speaks letters to the user based on a massive sound-bank of voices. There is some distortion added to the system to make it slightly harder to hear.

The only thing I can say on this is: Scrambled-Text-CAPTCHAs do not cover this. Neither does KittenAuth. It is a replacement for visual CAPTCHAs. Nothing more.

There is long term scope for offering different types of sounds (eg of cats or planes and asking the user to identify them) but nothing as simple as KittenAuth. I'm not telling you to lock your blind users off your website, I'm telling you to look into the audio-CAPTCHA methods. Just remember, the spammers will go for the weakest point which could well be the audio.

When will this be available?

I'm still doing a lot of personal research on this and I really hate PHP with a passion (I like my programming languages to have a bit of backbone) so its taking me longer than I'd like getting things sorted out. I've also got the upbringing of KittenAuth.com coming along and I'm fighting various prefab CMS systems against each other to see which is the best for my purposes. I'll probably settle with a hacked WordPress install.

When there's a site up there, I'll start the process of gathering people together and putting out some code for people to use. Hopefully it wont be long after that and there'll be plugins for software. There's already a phpBB version (which looks super by the way) but it needs to come some distance to cover people's hesitations with the process.

Grav

Written by Oli on Tuesday, 25 April 2006. Tagged with kittenauth, security. Read 11002 times. If you liked it, please give it a digg.

1 to 10 of 21 < 1 2 3 >
#1 /* 4 years, 11 months ago */
I would go for the checkboxes.
Also like suggested in the first edition:use a 4x4 system, throttle the variation of shown animals between a range with a minimum:3 to 7 kittens can be possibly shown.
Not only that you have 2^16 possibilities because of the minimum, but the variation of the total visible amount of images add another power to the amount of possibilities.
The checkboxes could add another if you would randomly switch the numbers (never have them lined up from 1 to 12, but hussle them up each session)

The visual impaired, well, you can always use audio and add generated noise into it or use a speech synthesis algorithm that constantly changes pitch and speeds but a good FFT (fourier) analysis routine could break that easily.

A blogmaster could always create a shadow database where a visually impaired person could submit his comment for manual checkup. The comment will never be shown automatically and the only thing a comment spammer would reach by posting in that database is nothing more but butchering the webmaster rather than benefit from the system.
Then again, this last part requires the webmaster to do lots of coding on his own.

Regards,

Vince.
#2 — Author comment /* 4 years, 11 months ago */
>> A blogmaster could always create a shadow database where a visually impaired person could submit his comment for manual checkup.

Yeah I've see that done elsewhere and it looks like the best method for people that run low-end websites that dont get huge traffic. Audio CAPTCHAs are a lot of effort to make and nigh on impossible for Joe Everyman's server to create on the fly.

On the other hand the use of KittenAuth can also determine how much of a problem accessibility is. I only use KA for anonymous posting here. That's why I invented it.

People that have registered accounts require an email address. Now I know that's not spam proof but its something...

Keep an eye on KittenAuth.com for more news =)
#3 /* 4 years, 11 months ago */
One more thing.... why offering matrix of #x# framed images instead of putting them inside one frame and offer them as one image?
Making this random (total amount and different amounts of images to guess) makes it harder for image analysers to figure out what's on it.

(This was my zoologic concept:7 to 15 different animals in one frame and 3 to 5 animals that had to be selected ("Click all below matching animal names you see in the image") out of 15 checkboxed answers... (having at least 8 bogus answers)

Cheers,

Vince.
#4 /* 4 years, 9 months ago */
I once saw a bo-zo write "feral intense impurposes" because he phonetically misunderstood "for all intents and purposes"

Is this the same sort of thing as "forefils" (should read "fulfills"). Anyway, thanks for the chuckle.
#5 /* 3 years, 8 months ago */
I see a few problems.

First, training a computer to recognize a whole ton of images by hash (or by pixel-by-pixel comparison) really is quite easy.

Once a small fraction of images are categorized manually, brute forcing becomes progressively easier. A human trainer wouldn't need to categorize many images relative to your total image count before brute-force attempts begin to see rapidly increasing returns and rapidly increasing rates of learning.

Automatically retrieving them from flickr by tag just makes it easier for a bot (or network of bots) to do the same without human intervention. If you want to cause a malicious agent to "do work" then shortcuts need to be avoided in sourcing images.

Further, sourcing (automatically or otherwise) new batches of images _without_ disposing of or modifying your old images also yields diminishing returns, under the same premise that training a fraction of images reduces the strength of the captcha by that fraction. If you have 500 "trained" images and you only add 500 new images, the chances of any image being known is 50% . And so on, to the point where adding 500 images in a week would be less than 1% of your total image library, once you have a library of 50,000 images (in 2 years of 500 a week).

Once your numbers of "unknown" images gets low enough that many captchas only have one or two unknown images, brute forcing barely merits the name.

There are only a few ways that I can think of for an image-library based captcha to be "strong" in the long term -- effectively infinite image supply, such that "learning" has little value, or the availability of effectively infinite image masking/distorting algorithms (which turns the entire problem back into one of parsing ugly messed up graphics), again making any machine learning less valuable.

Another idea that would make attacks "harder" would be to use image processing to create a single composite image with randomly placed/sized images and use an image-map (or javascript) to track the location of clicks. It would at least require some visual AI techniques to find edges and such to automate an attack on something like that.

Regards,
Sam
#6 /* 2 years, 4 months ago */
er e er wr rewer rer ewr ggggggggggggg
gg
#7 /* 2 years, 1 month ago */
I think a large number of small categorised images, using a backend (imagemajik?) program to stitch them together into a large grid like you're describing. I would included a slight blur over the edge of each to prevent straight up panel comparison. Having different orders etc prevents image hashing and makes recognition difficult. Now implement something along the lines of "click all the foxes" and vary the type of animal.

Now have javascript do an xml post of the co-ords of every click and do everything on the server, aggregating the results server-side until the user clicks submit. So now any bot has to emulate your javascript posting xml calls with co-ords of clicks on the correct places before submitting.

I can't think of any way this could be easily scammed - a bot would have to download a large single (unique!) image, process it, emulate a series of xml posts with co-ords of clicks in the appropriate places, and then post the form with the same session cookie. The use of a porn site to get one handed users to do the clicks for you would be complicated by the intermediate layers of script but of course wouldn't be impossible - still more difficult than the original idea though which is good.

It wouldn't be too hard to process the co-ords of the kittehs etc server side, and session vars are already available to take care of the aggregation of correct/incorrect clicks. Stitching the image together would be slightly CPU intensive, but no more than generating captchas in the first place.

Win!
#8 /* 2 years, 0 months ago */
I had the same idea as Sam did as I read your article, but his comment provoked me to think of this: Instead of a static image, how about providing a "game" to do your checking, for example written in flash.

I'm not a technical expert on flash, but could it not be used to provide a moving image of, for example, a kitten, which the user clicks on, and only if they succeed in a certain number of tries does the flash game submit a request to the server, perhaps using an internally stored hash?

I can see a few problems with my suggestion, the main one being that Flash isn't (quite) standard software. I'm also unsure as to how secure flash animations are, and how easy it would be to decode them. However, I'd hope the animation would be easy to make compared to the effort it would take to circumvent one, so you could have a bank of different games to play.

Does my idea have any worth at all?

(P.s. I love the idea of kittenauth)
#9 /* 2 years, 11 months ago */
good stuff
#10 /* 2 years, 11 months ago */
good stuff
1 to 10 of 21 < 1 2 3 >

Don't just sit there like a lemon! Reply!

Got something to say? Now's the time to share it with the author and everybody else that reads this posting! Lemons need not apply.

edtBOX - xHTML: yes - bbcode:no
Home | Advertise | About | Contact | Legal © Oli Warner 2001—2007 Proud 9rules member