[HN Gopher] Breaking the 4Chan CAPTCHA
       ___________________________________________________________________
        
       Breaking the 4Chan CAPTCHA
        
       Author : hazebooth
       Score  : 94 points
       Date   : 2024-11-29 20:32 UTC (2 hours ago)
        
 (HTM) web link (www.nullpt.rs)
 (TXT) w3m dump (www.nullpt.rs)
        
       | anigbrowl wrote:
       | Congratulations, now it will get upgraded and become more work
       | for humans to solve, increasing the burden on every non-malicious
       | user.
        
         | jeroenhd wrote:
         | It's not like bots aren't already bypassing these CAPTCHAs. One
         | author writing a blog post about how they accomplished what
         | spammers and bots have been doing for ages isn't going to
         | change anything.
         | 
         | I just opened 4chan and after the initial Cloudflare bot
         | detection I was told to register an email or wait 15 minutes
         | before I was allowed to even obtain a CAPTCHA. Looks like
         | they're already taking a layered approach to combat bots.
        
           | blackjackfoe wrote:
           | (author here) Interestingly, the email registration/time-
           | limit was added after I started this project, but before I
           | told anyone about it.
        
         | sunaookami wrote:
         | There are already loads of extensions and scripts out there
         | that can solve these captchas with a great success rate.
        
       | tumsfestival wrote:
       | I can only imagine how much worse they'll make the captcha after
       | stuff like this picks up speed with the users all the while being
       | ineffective against the bots.
        
         | rany_ wrote:
         | I really doubt that they're the first to do this.
        
         | cchance wrote:
         | I mean at some point ... the average visitor is dumber than the
         | AI and your now just blocking dumb people
        
           | OmarShehata wrote:
           | yes, we're creating websites that are gated by IQ tests. This
           | isn't the way
        
         | OmarShehata wrote:
         | captchas are broken, forever. There is no way to prevent bots
         | without also preventing a bottom tier of human users (visually
         | impaired people, old people, or just impatient people). Like
         | this xkcd [1] comic suggests, we need to just focus on
         | rewarding and punishing specific behavior, regardless of
         | whether the agent is human or not
         | 
         | [1] https://xkcd.com/810/
        
       | lofenfew wrote:
       | It might be worth noting that this, including the harder version
       | the op encountered, are not the hardest captchas that 4chan can
       | serve. There is a still harder version which is sent to less
       | trustworthy IPs. I imagine it would still be tractably solved
       | with computer vision. This in part misses the point though, since
       | 4chan has been continuously altering their captcha since it
       | released, making it difficult to create a permanent solution that
       | won't be broken down the road.
        
         | blackjackfoe wrote:
         | Yeah, I encountered those as well in my data gathering. I threw
         | them out from the training set, but I kept them for possible
         | future experimentation.
        
           | Shank wrote:
           | Can you upload a few of these samples somewhere?
        
             | blackjackfoe wrote:
             | I need to manipulate the data a bit, because right now it's
             | just raw, unaligned foreground/background images with
             | solutions. I need to do the alignment and save them as
             | images rather than JSON files. I'll do that when I have the
             | time.
        
         | chatmasta wrote:
         | Datacenter IPs can't even post at all, nevermind needing to
         | solve a CAPTCHA. That's why the accusations of "VPN shill" are
         | usually wrong, as is the assumption of anonymity - 4chan is in
         | fact one of the least anonymous sites on the internet. The
         | optional username feature gives it a veneer of anonymity, but
         | the strict IP requirements ensure almost every post is
         | attributable to a residential internet connection, and reliably
         | associable with other posts from that same connection.
        
           | blackjackfoe wrote:
           | Some datacenter IPs can post fine, mostly just not those
           | belonging to any large hosting company. I would mention a
           | list of ones I know aren't blocked, but, well, that might get
           | them blocked.
        
             | chatmasta wrote:
             | That's surprising to me. I assumed they were using some
             | service (like Cloudflare) with an updated list of non-
             | residential IP addresses.
             | 
             | I've only ever tried to post through Cloudflare WARP (or
             | Apple Private Relay, which is also Cloudflare but different
             | exit IP range). Once I realized that didn't work, I thought
             | maybe it wasn't worth posting at all :) I don't like the
             | idea of my ISP having any suspicion I posted to 4Chan (even
             | if it's technically https yadda yadda...)
        
           | gruez wrote:
           | What about users behind CGNAT, like mobile users?
        
             | chatmasta wrote:
             | That's attributable with the right warrant and correlation
             | with other data available to the ISP.
             | 
             | CGNAT is not an anonymity mechanism - at best it may be a
             | very crude one, but the carriers will make extra effort to
             | remove that anonymity through logging, retention, and
             | segmentation.
        
             | BlueTemplar wrote:
             | "Attributable" means by law enforcement, and mobile
             | carriers, like all ISPs, must keep logs. In this case, for
             | who had which IP address when.
             | 
             | (Otherwise, it's akin to the usual confusion between
             | anonymity and pseudonymity.)
        
               | chatmasta wrote:
               | That's true, but to be fair my original comment also said
               | posts would be reliably associable with other posts from
               | the same IP. With CGNAT, that association will be
               | slightly less reliable, but not meaningfully so. The
               | segment of the population who posts on 4chan is so low
               | that there is negligible chance of two 4chan users
               | sharing an exit IP and time window. Even with non-
               | overlapping time windows, the population will be low
               | enough for stylometry (and other factors) to remove any
               | remaining ambiguity.
        
             | Hamuko wrote:
             | Some mobile users can post but I think they've gone so far
             | as to ban entire ISP mobile IP ranges to prevent people
             | from constantly rolling new IPs on their phone.
        
       | antirez wrote:
       | Appropriate response by 4Chan to this: simplify the human work
       | given that anyway it's simple to solve via NNs. We are at a point
       | where designing very hard captchas has high probabilities to
       | increase the human annoyance without decreasing the machine
       | solvability.
        
         | hackernewds wrote:
         | Just use Worldcoin retina scans next
        
       | dmitrygr wrote:
       | > The official TensorFlow-to-TFJS model converter doesn't work on
       | Python 3.12. This doesn't seem to really be documented, and the
       | error messages thrown when you try to use it on Python 3.12 are
       | non-obvious. I tried an older version of Python (3.10) on a
       | hunch, using PyEnv, and it worked like a charm.
       | 
       | Amazing. And then people wonder why "just use python 2" is still
       | a thing.
        
         | orhmeh09 wrote:
         | Do you have examples of "just use python 2" still being a thing
         | in 2024?
        
           | dmitrygr wrote:
           | Yeah, whenever i need to write a quick script and have no
           | time to suffer "$library needs python 3.x, where x must be >
           | $value and <= $value2, and not a prime except when that ends
           | in a 3, except on leap days"
           | 
           | 2 is stable and does not change from under you. Which is what
           | you want in a programming langiuage
        
       | ChrisMarshallNY wrote:
       | That's like spending a few hours, learning to take the lid off
       | your septic tank.
        
         | blackjackfoe wrote:
         | Little bit, but at least you learned something :)
        
       | morkalork wrote:
       | Following the links to the captcha solving service you can read
       | profiles of the humans doing the work where its pitched as more
       | ethical than them working in hazardous factories!
        
       | cherryteastain wrote:
       | The part about bad Keras<->Tensorflow.js interop is classic
       | Tensorflow. Using TF always felt like using a bunch of vaguely
       | related tools put under the same umbrella rather than an
       | integrated, streamlined product.
       | 
       | Actually, I'll extend that to saying every open source Google
       | library/tool feels like that.
        
         | Retr0id wrote:
         | something something Conway's law
        
       | cchance wrote:
       | Jesus looking at both example captchas... as a human... i have no
       | fucking clue the answer lol
        
       | makifoxgirl wrote:
       | This project also solves the 4chan captcha
       | https://github.com/moffatman/chan
        
       | chad1n wrote:
       | I've built 3 iterations of captcha solvers for that crappy
       | website based on https://github.com/drunohazarb/4chan-captcha-
       | solver/issues/1 . The only thing I've learned along the way is
       | that it's mostly pointless outside of a "learning" exercise,
       | since they'll change the captcha (in terms of letter count or the
       | entropy background). Initially, it was 4 characters with pretty
       | obvious background, then it turned to 5, then it was both 4 and 5
       | and the current iteration which is also either 4 or 5, but with a
       | lot of entropy surrounding the characters.
        
         | bryan0 wrote:
         | In the article it mentions they changed the number of
         | characters in the captcha after he trained the model, and the
         | model could still solve it
        
         | blackjackfoe wrote:
         | This project was really my first decent introduction to
         | computer vision and machine learning (along with that of those
         | who helped me in various ways; none of them desired to be
         | credited here other than the guy who collected some of the data
         | for me.)
         | 
         | It was definitely a successful learning exercise, and it's made
         | me more confident tackling some other problems I've had in mind
         | for awhile.
        
       ___________________________________________________________________
       (page generated 2024-11-29 23:00 UTC)