[HN Gopher] Breaking the 4Chan CAPTCHA
___________________________________________________________________
Breaking the 4Chan CAPTCHA
Author : hazebooth
Score : 94 points
Date : 2024-11-29 20:32 UTC (2 hours ago)
(HTM) web link (www.nullpt.rs)
(TXT) w3m dump (www.nullpt.rs)
| anigbrowl wrote:
| Congratulations, now it will get upgraded and become more work
| for humans to solve, increasing the burden on every non-malicious
| user.
| jeroenhd wrote:
| It's not like bots aren't already bypassing these CAPTCHAs. One
| author writing a blog post about how they accomplished what
| spammers and bots have been doing for ages isn't going to
| change anything.
|
| I just opened 4chan and after the initial Cloudflare bot
| detection I was told to register an email or wait 15 minutes
| before I was allowed to even obtain a CAPTCHA. Looks like
| they're already taking a layered approach to combat bots.
| blackjackfoe wrote:
| (author here) Interestingly, the email registration/time-
| limit was added after I started this project, but before I
| told anyone about it.
| sunaookami wrote:
| There are already loads of extensions and scripts out there
| that can solve these captchas with a great success rate.
| tumsfestival wrote:
| I can only imagine how much worse they'll make the captcha after
| stuff like this picks up speed with the users all the while being
| ineffective against the bots.
| rany_ wrote:
| I really doubt that they're the first to do this.
| cchance wrote:
| I mean at some point ... the average visitor is dumber than the
| AI and your now just blocking dumb people
| OmarShehata wrote:
| yes, we're creating websites that are gated by IQ tests. This
| isn't the way
| OmarShehata wrote:
| captchas are broken, forever. There is no way to prevent bots
| without also preventing a bottom tier of human users (visually
| impaired people, old people, or just impatient people). Like
| this xkcd [1] comic suggests, we need to just focus on
| rewarding and punishing specific behavior, regardless of
| whether the agent is human or not
|
| [1] https://xkcd.com/810/
| lofenfew wrote:
| It might be worth noting that this, including the harder version
| the op encountered, are not the hardest captchas that 4chan can
| serve. There is a still harder version which is sent to less
| trustworthy IPs. I imagine it would still be tractably solved
| with computer vision. This in part misses the point though, since
| 4chan has been continuously altering their captcha since it
| released, making it difficult to create a permanent solution that
| won't be broken down the road.
| blackjackfoe wrote:
| Yeah, I encountered those as well in my data gathering. I threw
| them out from the training set, but I kept them for possible
| future experimentation.
| Shank wrote:
| Can you upload a few of these samples somewhere?
| blackjackfoe wrote:
| I need to manipulate the data a bit, because right now it's
| just raw, unaligned foreground/background images with
| solutions. I need to do the alignment and save them as
| images rather than JSON files. I'll do that when I have the
| time.
| chatmasta wrote:
| Datacenter IPs can't even post at all, nevermind needing to
| solve a CAPTCHA. That's why the accusations of "VPN shill" are
| usually wrong, as is the assumption of anonymity - 4chan is in
| fact one of the least anonymous sites on the internet. The
| optional username feature gives it a veneer of anonymity, but
| the strict IP requirements ensure almost every post is
| attributable to a residential internet connection, and reliably
| associable with other posts from that same connection.
| blackjackfoe wrote:
| Some datacenter IPs can post fine, mostly just not those
| belonging to any large hosting company. I would mention a
| list of ones I know aren't blocked, but, well, that might get
| them blocked.
| chatmasta wrote:
| That's surprising to me. I assumed they were using some
| service (like Cloudflare) with an updated list of non-
| residential IP addresses.
|
| I've only ever tried to post through Cloudflare WARP (or
| Apple Private Relay, which is also Cloudflare but different
| exit IP range). Once I realized that didn't work, I thought
| maybe it wasn't worth posting at all :) I don't like the
| idea of my ISP having any suspicion I posted to 4Chan (even
| if it's technically https yadda yadda...)
| gruez wrote:
| What about users behind CGNAT, like mobile users?
| chatmasta wrote:
| That's attributable with the right warrant and correlation
| with other data available to the ISP.
|
| CGNAT is not an anonymity mechanism - at best it may be a
| very crude one, but the carriers will make extra effort to
| remove that anonymity through logging, retention, and
| segmentation.
| BlueTemplar wrote:
| "Attributable" means by law enforcement, and mobile
| carriers, like all ISPs, must keep logs. In this case, for
| who had which IP address when.
|
| (Otherwise, it's akin to the usual confusion between
| anonymity and pseudonymity.)
| chatmasta wrote:
| That's true, but to be fair my original comment also said
| posts would be reliably associable with other posts from
| the same IP. With CGNAT, that association will be
| slightly less reliable, but not meaningfully so. The
| segment of the population who posts on 4chan is so low
| that there is negligible chance of two 4chan users
| sharing an exit IP and time window. Even with non-
| overlapping time windows, the population will be low
| enough for stylometry (and other factors) to remove any
| remaining ambiguity.
| Hamuko wrote:
| Some mobile users can post but I think they've gone so far
| as to ban entire ISP mobile IP ranges to prevent people
| from constantly rolling new IPs on their phone.
| antirez wrote:
| Appropriate response by 4Chan to this: simplify the human work
| given that anyway it's simple to solve via NNs. We are at a point
| where designing very hard captchas has high probabilities to
| increase the human annoyance without decreasing the machine
| solvability.
| hackernewds wrote:
| Just use Worldcoin retina scans next
| dmitrygr wrote:
| > The official TensorFlow-to-TFJS model converter doesn't work on
| Python 3.12. This doesn't seem to really be documented, and the
| error messages thrown when you try to use it on Python 3.12 are
| non-obvious. I tried an older version of Python (3.10) on a
| hunch, using PyEnv, and it worked like a charm.
|
| Amazing. And then people wonder why "just use python 2" is still
| a thing.
| orhmeh09 wrote:
| Do you have examples of "just use python 2" still being a thing
| in 2024?
| dmitrygr wrote:
| Yeah, whenever i need to write a quick script and have no
| time to suffer "$library needs python 3.x, where x must be >
| $value and <= $value2, and not a prime except when that ends
| in a 3, except on leap days"
|
| 2 is stable and does not change from under you. Which is what
| you want in a programming langiuage
| ChrisMarshallNY wrote:
| That's like spending a few hours, learning to take the lid off
| your septic tank.
| blackjackfoe wrote:
| Little bit, but at least you learned something :)
| morkalork wrote:
| Following the links to the captcha solving service you can read
| profiles of the humans doing the work where its pitched as more
| ethical than them working in hazardous factories!
| cherryteastain wrote:
| The part about bad Keras<->Tensorflow.js interop is classic
| Tensorflow. Using TF always felt like using a bunch of vaguely
| related tools put under the same umbrella rather than an
| integrated, streamlined product.
|
| Actually, I'll extend that to saying every open source Google
| library/tool feels like that.
| Retr0id wrote:
| something something Conway's law
| cchance wrote:
| Jesus looking at both example captchas... as a human... i have no
| fucking clue the answer lol
| makifoxgirl wrote:
| This project also solves the 4chan captcha
| https://github.com/moffatman/chan
| chad1n wrote:
| I've built 3 iterations of captcha solvers for that crappy
| website based on https://github.com/drunohazarb/4chan-captcha-
| solver/issues/1 . The only thing I've learned along the way is
| that it's mostly pointless outside of a "learning" exercise,
| since they'll change the captcha (in terms of letter count or the
| entropy background). Initially, it was 4 characters with pretty
| obvious background, then it turned to 5, then it was both 4 and 5
| and the current iteration which is also either 4 or 5, but with a
| lot of entropy surrounding the characters.
| bryan0 wrote:
| In the article it mentions they changed the number of
| characters in the captcha after he trained the model, and the
| model could still solve it
| blackjackfoe wrote:
| This project was really my first decent introduction to
| computer vision and machine learning (along with that of those
| who helped me in various ways; none of them desired to be
| credited here other than the guy who collected some of the data
| for me.)
|
| It was definitely a successful learning exercise, and it's made
| me more confident tackling some other problems I've had in mind
| for awhile.
___________________________________________________________________
(page generated 2024-11-29 23:00 UTC)