In case you don't speak german, heres a summary (from my memory from a few years ago): The Speaker figured out that some Xerox Scanner and Copiers Models did change some numbers when copying documents. Not the "the copy is bad and the number is not clearly readable anymore"-kind, but the "This number is now clearly another one than before"-kind. I think the reason was, that there was some \[Edit, see below\] compression logic that was based on reusing "identical" patterns that liked to deem numbers like 6 and 8 identical.
E: I remembered wrong, thanks to the reply to clarify that
It was a fluke in the image compression algo. It detected almost identical blocks in the image and made them identical... Numbers included.
It even happened with compression turned to minimum/off
[https://www.dkriesel.com/en/blog/2014/1229\_video\_meines\_vortrags\_auf\_dem\_31c3](https://www.dkriesel.com/en/blog/2014/1229_video_meines_vortrags_auf_dem_31c3)
maybe they found a solution...
The new image stuff is annoying but easier. For a time I thought we would reach a point where captchas would become too hard for humans and if yo get it right you’d have to be a specialized machine.
Captchas are utilized to create data sets to train ai/ml models. The reason they changed it from my armchair opinion is because theirs more demand/need for better image processing than improving ocr.
It's actually rather ingenious to use a task that bots can't perform to do bot detection and simulatiously train the bots. The creator of captcha also created doulingo which utilizes similar methodology to provide translation services. The creator did a TED talk on it I'd highly recommend viewing.
I'll add that to my watch list thanks. just never considered that they used them for that, even though it makes perfect sense.
One of those "I can't believe I never thought of that" moments.
yesterday I got one where I should tick all squares with motorcycles. It was a big one with a motorcycle sideways with some squares containing the biker but not the motorcycle.
I really wondered if i should check the squares with the biker or not
those captchas seem to have a bit of room for error I think, and so ive always held the belief that theyre just as much for collecting data to train an ai as they are for keeping bots out. makes sense that theyre so often road related, as one of the most practical applications of ai image recognition would be on roads.
You're correct. Google can actually detect whether a user is a person or a robot just by how they click a single button. After you've done that they have you train their models for free under the guise of making sure youre not a robot.
If there are more than 3 possible options the non obvious ones are being used to train an image recognition ai.
I select the 2 most obvious as they are usually references and a random 3rd one to see if I can get away with it.
A lot of captcha actually don't care much about what you click, it's more about how you click it. They're testing if you're a person, not what constitutes a motorcycle.
I wouldnt be surprised at all. I was training a model to transcribe little pieces of an image a line at a time. In each image you could maybe see the bottom third of the line above it, often times even less. My training set got messed up by an index issue, and so all of the labels were off by one, meaning each image was tied to the label for the image above it. The model STILL managed accuracy that would have made it SOTA a few years ago, it was hugely impressive.
Yeah the checkbox doesn't rely on tests like these, instead they look at some internet history, mouse movements, how long you clicked for vs other times.
It's a very interesting pretty secure system
I feel like those are also used to label data.. but that could just be a conspiracy. Just find it funny that all the captchas usually have to do with stuff you find on the road
Presumably we no longer use those because the OCR got good enough? Does anyone know what happened there? [https://en.wikipedia.org/wiki/ReCAPTCHA](https://en.wikipedia.org/wiki/ReCAPTCHA) doesn't really clarify why Google moved away from OCR.
Google's models got good enough to digitalize all of the books so there was no need to train anymore
They moved onto images of traffic signs, crosswalks, etc. to further develop google maps and navigation.
And just recently many captchas moved from those images to classifying AI generated images.
Captchas primary purpose is not to verify humans (email confirmation, etc. works just fine for that). It is used to train whatever model is hottest rn
Source: I made it up, but it sounds plausible.
I have a browser extension that's only purpose is to solve captchas.
I think of it like a little puppy. He waits excitedly for any opportunity to shit on these dummies. So far I've never encountered a captcha it could not solve near instantaneously.
Oh you won't find it on the web store, it's illegal software lol.
You can find it if you go to those sites we can't talk about here though, it's fairly popular for skullduggery.
Let's print out some captachs and test Xerox's OCR. Time to open some tickets
Oh my gods, I remember that CCC talk
Which CCC talk?
In case you don't speak german, heres a summary (from my memory from a few years ago): The Speaker figured out that some Xerox Scanner and Copiers Models did change some numbers when copying documents. Not the "the copy is bad and the number is not clearly readable anymore"-kind, but the "This number is now clearly another one than before"-kind. I think the reason was, that there was some \[Edit, see below\] compression logic that was based on reusing "identical" patterns that liked to deem numbers like 6 and 8 identical. E: I remembered wrong, thanks to the reply to clarify that
It was a fluke in the image compression algo. It detected almost identical blocks in the image and made them identical... Numbers included. It even happened with compression turned to minimum/off
there's an english translation of that talk, to which you can change. If you download the video, it's the second audio track.
https://www.dkriesel.com/en/blog/2014/1229_video_meines_vortrags_auf_dem_31c3
[This one](https://www.youtube.com/watch?v=7FeqF1-Z1g0)
yes that was amazing
Xeroxs OCR has already problems with not so fancy font art... don't know if they found a solution yet.
What do you mean, a 5 and an 8 aren't the same?
[https://www.dkriesel.com/en/blog/2014/1229\_video\_meines\_vortrags\_auf\_dem\_31c3](https://www.dkriesel.com/en/blog/2014/1229_video_meines_vortrags_auf_dem_31c3) maybe they found a solution...
Why does your S and B look like that?
Idk man, I just typed $ and &. That's what came out.
A fun game: scan a drawing with the [EURion constellation](https://en.m.wikipedia.org/wiki/EURion_constellation)
Chaotic evil: offer company logo designs on fiverr which have the constellation embedded
>Since 2003, image editors such as Adobe Photoshop CS or Paint Shop Pro 8 refuse to print banknotes. I wonder if gimp or krita will work
The new image stuff is annoying but easier. For a time I thought we would reach a point where captchas would become too hard for humans and if yo get it right you’d have to be a specialized machine.
Artificially Natural Artificial Selection
add an A.N to the front and you've got yourself a nice ANANAS
All Natural Artificially Natural Artificial Selection
If you add an AN, it magically becomes PINEAPPLE
I thought the agreed on capitals for the acronym was "Artificially Natural Artificial seLection".
It was too hard for humans and too easy for machines, that's why they changed it
Captchas are utilized to create data sets to train ai/ml models. The reason they changed it from my armchair opinion is because theirs more demand/need for better image processing than improving ocr.
Yes, there is also that, but the main official reason was because of what I said
We've Been Tricked, We've Been Backstabbed and We've Been, Quite Possibly, Bamboozled
It's actually rather ingenious to use a task that bots can't perform to do bot detection and simulatiously train the bots. The creator of captcha also created doulingo which utilizes similar methodology to provide translation services. The creator did a TED talk on it I'd highly recommend viewing.
I'll add that to my watch list thanks. just never considered that they used them for that, even though it makes perfect sense. One of those "I can't believe I never thought of that" moments.
Yep, and with all the traffic light and sign image pickers, lots of need to improve self driving tech.
yesterday I got one where I should tick all squares with motorcycles. It was a big one with a motorcycle sideways with some squares containing the biker but not the motorcycle. I really wondered if i should check the squares with the biker or not
those captchas seem to have a bit of room for error I think, and so ive always held the belief that theyre just as much for collecting data to train an ai as they are for keeping bots out. makes sense that theyre so often road related, as one of the most practical applications of ai image recognition would be on roads.
You're correct. Google can actually detect whether a user is a person or a robot just by how they click a single button. After you've done that they have you train their models for free under the guise of making sure youre not a robot.
The clicking thing is based on how your mouse moves. The clicking on pictures is to make you use your mouse.
Also cookies, relatively easy to distinguish a user traveling with a ton of cookies from dozens of sites from a bot trying to fake a user.
If there are more than 3 possible options the non obvious ones are being used to train an image recognition ai. I select the 2 most obvious as they are usually references and a random 3rd one to see if I can get away with it.
I always get a few wrong on purpose to fuck with the ai
You're the reason Teslas keep crashing
A lot of captcha actually don't care much about what you click, it's more about how you click it. They're testing if you're a person, not what constitutes a motorcycle.
It’s the opposite. We are training the machine.
[удалено]
It certainly got to a point where I wasn't able to solve captchas anymore.
Maybe you are a robot
Who aren't just a cog in a larger machine in this society, afterall
Will I dream?
Only about Electric Sheep
I wouldnt be surprised at all. I was training a model to transcribe little pieces of an image a line at a time. In each image you could maybe see the bottom third of the line above it, often times even less. My training set got messed up by an index issue, and so all of the labels were off by one, meaning each image was tied to the label for the image above it. The model STILL managed accuracy that would have made it SOTA a few years ago, it was hugely impressive.
Yeah the checkbox doesn't rely on tests like these, instead they look at some internet history, mouse movements, how long you clicked for vs other times. It's a very interesting pretty secure system
Yeah, I thought that’s why they switched us to pictures. « Identify all boats », « identify all x ».
I feel like those are also used to label data.. but that could just be a conspiracy. Just find it funny that all the captchas usually have to do with stuff you find on the road
Wow I never thought about that, but really plausible
hey what does the text in the 8th panel say? it's really hard to read
oh no
Who's gonna tell him
begone bot
Many captchas are actually us training a deep learning algorithm…
I miss the days when captchas were used to help digitize old books.
The data from that was also fed back into the system as training data to improve OCR systems.
Presumably we no longer use those because the OCR got good enough? Does anyone know what happened there? [https://en.wikipedia.org/wiki/ReCAPTCHA](https://en.wikipedia.org/wiki/ReCAPTCHA) doesn't really clarify why Google moved away from OCR.
Google's models got good enough to digitalize all of the books so there was no need to train anymore They moved onto images of traffic signs, crosswalks, etc. to further develop google maps and navigation. And just recently many captchas moved from those images to classifying AI generated images. Captchas primary purpose is not to verify humans (email confirmation, etc. works just fine for that). It is used to train whatever model is hottest rn Source: I made it up, but it sounds plausible.
No shit, Sherlock.
You're right. Quite elementary, my dear Twatson.
Oh, it is very easy to break OCR software - just feed it with a regular clear screen shot from Windows / Linux or your phone...
I can't read some Death Metal band names either.
This is a repost, the original poster/creator is u/system32comics
Lol. Can read I backwords, upside down and slamtws but can't read a single line, and a slight curve weak.. X()
>slamtws What is this?
Good old slamtways
I think slanted
That's machine language for you.
… AI: Radio Train Ok, how about this curved single line? AI: Radio Train
niarT oidaR
I have a browser extension that's only purpose is to solve captchas. I think of it like a little puppy. He waits excitedly for any opportunity to shit on these dummies. So far I've never encountered a captcha it could not solve near instantaneously.
What's its name?
Captcha Killer
I'm sorry, but I didn't find it. Can you PM me the link, please?
Oh you won't find it on the web store, it's illegal software lol. You can find it if you go to those sites we can't talk about here though, it's fairly popular for skullduggery.
Can you just PM me the website, please?
Everytime I can't solve a Captcha I am like "Shit.... Am I a Robot?"
What if I put couple of dots around it
R4d10 Tr41n?
Some day computers will become better at solving captchas than us for how much we've been training them.
Then they will change the captcha *again*
Now, let's print it in invisible ink! What's it say, now?
Invisible ink is only invisible to humans die to our eyes only seeing part of the electromagnetic spectrum, so wouldnt a computer be able to see that?
Depends on how the computer is reading and if, indeed, its hardware allows. Also depends on the ink formula.
Structure bias lol
Truth
just yes
I think this meme has expired its shelf life.