[HN Gopher] Lightweight open source reCaptcha alternative
___________________________________________________________________
Lightweight open source reCaptcha alternative
Author : michalpleban
Score : 95 points
Date : 2025-05-13 09:19 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| rahimnathwani wrote:
| Is there a live demo anywhere? I didn't see one linked in the
| README or on the official site's home page.
| unixfox wrote:
| Yes found one: https://altcha.org/captcha/
| immibis wrote:
| The purpose of reCaptcha is to enhance your Google user profile
| and to deny legitimate users. How does this alternative
| accomplish those things?
|
| This appears to be a proof-of-work, like Anubis. Real captchas
| collect much more fingerprinting data to ensure that only users
| with the latest version of Chrome, the latest version of Windows,
| and an Nvidia graphics card, can use the site.
| literalAardvark wrote:
| Yeah, this fails at its most important task, making those
| filthy, dirty, Firefox users click on bridges for 1 hour a day.
|
| On topic though, how does this improve on hCaptcha?
| akimbostrawman wrote:
| Don't forget those criminals with non-residential IPs
| Semaphor wrote:
| > On topic though, how does this improve on hCaptcha?
|
| Cloud vs self-hosted, click annoying things challenge vs
| automatic proof of work. Or are there other hCaptcha versions
| and I just never realized it?
| Imustaskforhelp wrote:
| oh yes. as a firefox (now librewolf) user, it deeply saddens
| me.
| g-b-r wrote:
| I know some people who quickly give up and renounce using the
| service, when they run into hCaptcha puzzles.
|
| I've been bewildered for some time as well, honestly, it took
| me a while to figure out the first I ran into.
|
| And trying one now, fully knowing that I'd have to solve one,
| I was dumbfounded by the puzzle I've gotten, it took me a few
| seconds to understand it.
|
| Cloudflare's ones are horrible and a plague (although they
| might have slightly improved recently), but I'm not certain
| I'd prefer hCaptchas over them.
| Semaphor wrote:
| I know this is partially a joke, but I'd like to mention that
| as a Firefox user with uBo and uMatrix, I almost never have to
| solve challenges with ReCaptcha.
| hexagonwin wrote:
| how? do you just allow all cookies/scripts/xhr on your
| umatrix? i'm on a similar config and I get captchas far often
| than any other users on the same network for some reason.
| Semaphor wrote:
| I use the uMatrix plugin to automatically allow what's
| required for ReCaptcha, that tends to work. I do get the
| annoying picker sometimes and have an AI extension to solve
| it for me, but it's relatively rare, like 1 out of 5 times
| max (I generally don't see RC that often).
|
| No idea how I'd compare to others on my Network, that'd
| only be my wife and as a Linux user she'd probably get more
| than me with Windows ;)
| pabs3 wrote:
| I thought uMatrix got abandoned?
| piva00 wrote:
| On the other hand as a user of Firefox I simply cannot pass
| Cloudflare's verification at all, I always end up in a loop.
| It's been like that for more than a year... Sometimes it does
| work on a private window, no idea why or how since I have the
| same extensions enabled.
| xena wrote:
| Do you store cookies?
| serhack_ wrote:
| The real secret of an effective captcha-like system is to
| identify/collect lots of data, identify suspicious patterns,
| validate them (checking what kind of data exposes a bot-like
| system) and then use this for serving dynamic challenges based on
| a couple of information.
|
| Example: if the system identifies the user as a bot, it tries to
| give a less performant solution in terms of PoW.
| Imustaskforhelp wrote:
| Maybe somebody could explain me why your comment is in
| different contrast of grey?
|
| I think somebody might have flagged your comment, but it is a
| real fact.
|
| This is one of the reasons why people say cloudflare owns the
| majority of internet but I think I am okay with that since
| cloudflare is pretty chill. And they provide the best services
| but still it just shows that the internet isn't that
| decentralized.
|
| But google captcha is literally tracking you IIRC, I would
| personally prefer hcaptcha if you want centralized solution or
| anubis if you want to self host (I Prefer anubis I guess)
| ArinaS wrote:
| Cloudflare is not chill because they, either ignorantly or
| purposefully, block everything that's not Chromium or
| Firefox[1].
|
| Or sometimes everything that's not just Chromium[2].
|
| [1] - https://www.theregister.com/2025/03/04/cloudflare_block
| ing_n...
|
| [2] - https://www.techradar.com/pro/cloudflare-admits-
| security-too...
| Zak wrote:
| > _Maybe somebody could explain me why your comment is in
| different contrast of grey?_
|
| Downvotes. Comments with negative scores are shown with lower
| contrast. The more negative the score, the less contrast they
| get.
| Jleagle wrote:
| Can someone explain why a robot would not be able to calculate
| the PoW?
| jsheard wrote:
| It could, the idea is just to tip the economics such that it's
| not worth it for the bot operator. That kind of abuse typically
| happens at a vast scale where the cost of solving the
| challenges adds up fast.
| hombre_fatal wrote:
| Botnets don't even use their own hardware.
|
| Why would someone renting dirt cheap botnet time care if the
| requests take a few seconds longer to your site?
|
| Plus, the requests are still getting through after waiting a
| few seconds, so it does nothing for the website operator and
| just burns battery for legit users.
| jsheard wrote:
| Botnets just shift the bottleneck from "how much compute
| can they afford to buy legit" to "how many machines can
| they compromise or afford to buy on the black market".
| Either way it's a finite resource, so making each abusive
| request >10,000x more expensive still severely limits how
| much damage they can do, especially when a lot of botnet
| nodes are IoT junk with barely any CPU power to speak of.
| victorbjorklund wrote:
| There is still an opportunity cost. They can scrape just
| your site or they can scrape 100 other sites without POW
| (no idea if it is 10, 100 etc)
| Jleagle wrote:
| So it's the same as a sleep()
| ahofmann wrote:
| No, because the bot can just also sleep and scrape other
| sources in that time. With pow, you waste their CPU
| cycles and block them from doing other work.
| hombre_fatal wrote:
| Websites aren't really fungible like that, and where they
| are (like general search indexing for example), that's
| usually the least hostile sort of automated traffic. But
| if that's all you care about, I'll cede the point.
|
| Usually if you're going to go through the trouble of
| integrating a captcha, you want to protect against
| targeted attacks like a forum spammer where you don't
| want to let the abusive requests through at all, not just
| let it through after 5000ms.
| bityard wrote:
| If you're a botnet operator of a botnet that normally
| scraped a few dozen pages per second and then noticed a
| site suddenly taking multiple seconds per page, that's at
| least an order of magnitude (or two) decrease in
| performance. If you care at all about your efficiency, you
| step in and put that site on your blacklist.
|
| Even if the bot owner doesn't watch (or care) about about
| their crawling metrics, at least the botnot is not DDoSing
| the site in the meantime.
|
| This is essentially a client-side tarpit, which are
| actually pretty effective against all forms of bot traffic
| while not impacting legitimate users very much if at all.
| remram wrote:
| A tarpit is selective. You throw bad clients in the
| tarpit.
|
| This is something you throw everyone through. both your
| abusive clients (running on stolen or datacenter
| hardware) and your real clients (running on battery-
| powered laptops and phones). More like a tar-checkpoint.
| jrochkind1 wrote:
| That's definitely the idea.
|
| So the crazy decentralized mystery botnet(s) that are
| affecting many of us -- don't seem to be that worried about
| cost. They are making millions of duplicate requests for
| duplicate useless content, it's pretty wild.
|
| On the other hand, they ALSO dont' seem to be running user-
| agents that execute javascript.
|
| This is in the findings of a group of some of my colleagues
| at peer non-profits that have been sharing notes to try to
| understand what's going on.
|
| So the fact that they don't run JS at present means that PoW
| would stop them -- but so would something much simpler and
| cheaper relying on JS.
|
| If this becomes popular, could they afford to run JS and to
| calcualte the PoW?
|
| It's really unclear. The behavior of these things does not
| make sense to me enough to have much of a theory about what
| their cost/benefits or budgets are, it's all a mystery to me.
|
| Definitely hoping someone manages to figure out who's really
| behind this and why at some point. (i am definitely not
| assuming it's a single entity either).
| dpassens wrote:
| I think the general idea isn't that they can't but that they
| either won't, because they're not executing JS, or that it
| would slow them down enough to effectively cripple them.
| jrochkind1 wrote:
| As long as their not executing JS, they don't really need a
| PoW to stop them, though. Something much simpler that
| requires executing JS would do.
|
| i might at any rate set my PoW to be relatively cheap, which
| would do for anyone not executing JS.
| diggan wrote:
| I think this being called a "recaptcha alternative" to be
| slightly misleading.
|
| There are two problems some website hosters encounter:
|
| A) How do I ensure no one DDOS (real or inadvertently) me?
|
| B) How can I ensure this client is actually a human, not a
| robot?
|
| Things like ReCaptcha aimed to solve B, not A. But the
| submitted solution seems to be more for A, as calculating a PoW
| can be (probably _must_ be actually) calculated by a machine,
| not a human. While ReCaptcha is supposed to be the opposite,
| could only be solved by a human.
| progx wrote:
| In AI century, how you would detect a real person or an AI?
| ArinaS wrote:
| This thing, despite using "captcha" in its name, is not your
| typical captcha like hCaptcha or Google's one, because it uses
| a proof-of-work mechanism instead of writing answers in
| textboxes/clicking on images/other means of verification
| requiring user input.
|
| AI bots can't solve proof-of-work challenges because browsers
| they use for scraping don't support features needed to solve
| them. This is highlighted by existence of other proof-of-work
| solutions designed to specifically filter out AI bots, like go-
| away[1] or Anubis[2].
|
| And yes, they work - once GNOME deployed one of these proof-of-
| work challenges on their gitlab instance, traffic on it fell by
| 97%[3].
|
| [1] - https://git.gammaspectra.live/git/go-away
|
| [2] - https://github.com/TecharoHQ/anubis
|
| [3] - https://thelibre.news/foss-infrastructure-is-under-
| attack-by...: " _According to Bart Piotrowski, in around two
| hours and a half they received 81k total requests, and out of
| those only 3% passed Anubi 's proof of work, hinting at 97% of
| the traffic being bots._"
| graemep wrote:
| > AI bots can't solve proof-of-work challenges because
| browsers they use for scraping don't support features needed
| to solve them.
|
| At least sometimes. I do not know about AI scraping but there
| are plenty of scraping solutions that do run JS.
|
| It also puts of some genuine users like me who prefer to keep
| JS off.
|
| The 97% is only accurate if you assume a zero false positive
| rate.
| ArinaS wrote:
| > " _It also puts of some genuine users like me who prefer
| to keep JS off._ "
|
| Non-javascript challenges are also available[1].
|
| > " _The 97% is only accurate if you assume a zero false
| positive rate._ "
|
| GNOME's gitlab instance is not something people visit daily
| like Wikipedia, so it's a negligible amount of false
| positives.
|
| [1] - https://git.gammaspectra.live/git/go-
| away/wiki/Challenges#no...
| graemep wrote:
| > Non-javascript challenges are also available
|
| Did not know that. Good news
|
| > NOME's gitlab instance is not something people visit
| daily like Wikipedia, so it's a negligible amount of
| false positives.
|
| As an absolute number, yes, but as a proportion?
| diggan wrote:
| > AI bots can't solve proof-of-work challenges because
| browsers they use for scraping don't support features needed
| to solve them. This is highlighted by existence of other
| proof-of-work solutions designed to specifically filter out
| AI bots, like go-away[1] or Anubis[2].
|
| Huh, they definitely can?
|
| go-away and Anubis reduces the load on your servers as bot
| operators cannot just scrape N pages per second without any
| drawbacks. Instead it gets really expensive to make 1000s of
| requests, as they're all really slow.
|
| But for a user who uses their own AI agent, that browses the
| web, things like anubis and go-away aren't meant to (nor does
| it) stop them from accessing the websites at all, it'll just
| be a tiny bit slower.
|
| Those tools are meant to stop site-wide scraping, not
| individual automatic user-agents.
| Jleagle wrote:
| AI's scrape data from web pages just like anything else does. I
| don't think their existence makes a difference.
| immibis wrote:
| AIs don't. AI companies do.
|
| Well, maybe. As far as I can see, the overt ones are using
| pretty reasonable rate limits, even though they're scraping
| in useless ways (every combination of git hash and file path
| on gitea). Rather, it seems like he anonymous ones are the
| problem - and since they're anonymous, we have zero reason to
| believe they're AI companies. Some of them are running on
| Huawei Cloud. I doubt OpenAI is using Huawei Cloud.
| dvh wrote:
| Certainly! Distinguishing between a real person and an AI in
| the AI century can be tricky, but some key signs include
| emotional depth, unpredictable creativity, personal
| experiences, and complex human intuition. AI, on the other
| hand, tends to rely on data patterns, structured reasoning, and
| lacks genuine lived experiences.
| igorbark wrote:
| i enjoy that i cannot tell whether this is written by an AI,
| or by a human pretending to be an AI. my guess is human
| pretender!
| chrismorgan wrote:
| CAPTCHA stood for "Completely Automated Public Turing Test to
| tell Computers and Humans Apart".
|
| By this point, it's obvious that that has failed, and even that
| no general solution is possible any more.
|
| ALTCHA... telling Computers and Humans Apart? No, this is proof
| of work, meaning it's just about making things expensive--abuse
| control, not actually distinguishing between computers and
| humans.
|
| In fact, in https://altcha.org/captcha/ one of the headings is
| _Inclusive to Robots_! This is _so_ far the opposite of
| traditional CAPTCHA, on the technical side, that it's mildly
| hilarious. (Socially, they largely amount to the same thing--
| people never did actually care about _computers_ , just abusive
| bots.)
|
| Then the question is: what is the proof of work mechanism? How
| robust are things going to be, and can you ensure attacking will
| remain expensive, without burdening users too much?
|
| https://altcha.org/docs/proof-of-work/ indicates it's SHA
| hashing, not something like scrypt. Uh oh. The best specialised
| hardware is several million times as good as good laptops1, let
| alone cheap phones. If this were to become popular, bots would
| switch to such hardware, probably making the cost of attacking
| practically negligible. https://altcha.org/docs/complexity/ shows
| they've thought about these things, but I feel that although it
| will work for a while, it's ultimately a doomed game. And in the
| mean time, you can normally go _waaaay_ simpler and less
| intrusive: most bots are extremely dumb.
|
| Is "captcha" heading in the direction of meaning "bad rate
| limiting"?
|
| Because really that's what this stuff is: rate limiting that
| trusts that clients don't have lots of compute power conveniently
| available, but will get vaporised by powerful and intentional
| adversaries.
|
| --***--
|
| 1 On the https://altcha.org/docs/complexity/ test, a
| comparatively ideal browser on my 5800HS laptop might reach
| 500,000 SHA-256 hashes per second at a cost of at least 25W.
| (Chromium gets half this with ~50% CPU usage; Firefox one tenth,
| altogether failing to load the cores for some reason.) The most
| energy-efficient commercial Bitcoin miners seem to be doing
| around 80 _billion_ of these hashes per watt-second. That's _four
| million_ times as good. You cannot bridge such a divide.
| binary132 wrote:
| It's crazy how much of the internet and our app stacks depend on
| proprietary hosted service integrations that will almost
| certainly disappear or break in time. Sure it's convenient to get
| off the ground with but it doesn't make sense to me to gate your
| functionality on a third party that can easily break or slip out
| from under you. It would be one thing if proprietary software was
| distributed in a form you could keep operating and using on your
| own, but even that is obviously inferior to being able to "repair
| your own equipment".
| IgorPartola wrote:
| reCaptcha is routinely broken for me. Almost every time I see
| it I have to solve it about a dozen times, then it decides I'm
| not human. After 2-3 page refreshes it does let up but it's
| frustrating as hell.
| wkat4242 wrote:
| Are you on Linux by any chance? For some reason this is now
| deemed 'suspicious' by recaptcha and cloudflare :( Especially
| if you use Firefox. It's driving me crazy getting bombarded
| by these.
| worldsavior wrote:
| Did you try faking user agent?
| wkat4242 wrote:
| Yes but that made it worse.
|
| In fact I used to fake user agent all the time because
| Microsoft 365 is so retarded. With the Firefox/Linux user
| agent a lot of features don't work. When it pretends to
| be MS Edge it works fine. Clearly trying to force people
| to use the 'invented here' browser :(
|
| But as I was getting captcha's I moved to using it only
| for the MS365 sites and nowhere else. It seems to have
| reduced the captcha's somewhat, especially the ones that
| never end (keep looping). But I still get a ton of "Your
| browser is suspicious, here's an extra check" nonsense
| from Cloudflare in particular.
| godzillabrennus wrote:
| In the startup world it is a huge economic advantage if you can
| prototype an idea in days that would have taken months or
| years. The tradeoffs are acquiring technical debt but we seem
| capable of resolving that after the concept has found product
| market fit.
| graemep wrote:
| Yes, but its not just startups and people do not seem to
| actually resolve it.
|
| Lots of big businesses use recaptcha. Quite often
| unnecessarily. If I need to login with 2FA touse a service
| does it really need recaptcha?
|
| Similarly, cloudflare sends you emails telling you how many
| bots and attacks it has stopped - but you do not know how
| many false positives there were.
| theappsecguy wrote:
| Yes you still need recaptcha simply to avoid password
| stuffing attacks.
| damsalor wrote:
| Certainly not in the mentioned 2fa scenario.
|
| I would guess that simple rate limiting would do the
| trick for the rest
| Zak wrote:
| Rate limiting does not solve this problem because botnets
| often don't make repeated requests from the same IP
| address. 2FA does solve it.
| dsr_ wrote:
| Citation, as they say, is needed.
|
| As far as I can tell, most startups resolve their technical
| debt by failing, and the majority of the rest resolve their
| debt by being acquired by a company which replaces the
| original service entirely in 1-3 years because it's too hard
| to integrate as-is.
| binary132 wrote:
| Yes, and I certainly was not saying startups should roll
| their own fraud prevention
| palmotea wrote:
| > It's crazy how much of the internet and our app stacks depend
| on proprietary hosted service integrations that will almost
| certainly disappear or break in time. Sure it's convenient to
| get off the ground with but it doesn't make sense to me to gate
| your functionality on a third party that can easily break or
| slip out from under you.
|
| At least with captchas, it's somewhat understandable with the
| arms-race aspect. The third party does the work of engaging in
| the arms race, so you don't have to, but the tradeoff is what
| you describe.
| rendx wrote:
| Not only that, but it's also totally acceptable now to
| broadcast your user's data to a megaton of external services
| for no good reason. If people had some grasp of what is going
| on and it was visible to them, they would complain very loudly
| about it in your face.
| pabs3 wrote:
| Hmm, how do they know you have calculated the PoW without setting
| a cookie? Or do you have to calculate it on every page load?
| jrochkind1 wrote:
| yeah, I need more info to understand what's up.
|
| Maybe it's only used on individual form submit (like the
| classic captcha use-case), and not on a page load, and it does
| have to be recalculated on every form submit?
| alamsterdam wrote:
| Yes, I was wondering what is to stop you replaying the same PoW
| multiple times. All I can find is:
|
| _To prevent the vulnerability of "replay attacks," where a
| client resubmits the same solution multiple times, the server
| should implement measures that invalidate previously solved
| challenges.
|
| The server should maintain a registry of solved challenges and
| reject any submissions that attempt to reuse a challenge that
| has already been successfully solved._
|
| This doesn't seem very scaleable? Or am I missing something?
| dankobgd wrote:
| recaptcha is useless, only annoys actual users. I lost 15 minutes
| last week on miui site with their trash recaptcha. The point is
| to steal more data from you
| remram wrote:
| Why call it a CAPTCHA if it is not even trying to tell Computers
| and Humans Apart (CHA)?
|
| This is only trying to tell human browsers from bot browsers
| apart. Not even that, it seems all it does is slow all browsers
| down equally.
| immibis wrote:
| Because <s>human</s> western society is in its post-competence
| era. It doesn't matter whether you can do your job, only
| whether your manager thinks you are, and they don't understand
| your job so they use all the wrong metrics.
|
| Like whether there's a checkbox you have to click, and whether
| it spins for a while when you click it. That's a CAPTCHA now.
| And working is when your butt is in the chair. And investing is
| when you give someone money and they promise to give more back
| tater. And food is things that fit in your mouth and don't kill
| you. And free speech is when you get turned away at the border
| for disliking the president on social media. And top-of-the-
| line CPUs are ones that die within 24 months. Meanwhile the
| totalitarian dictatorship across the pond actually does all
| these things better somehow (except the politics).
| https://en.wikipedia.org/wiki/HyperNormalisation#Etymology
___________________________________________________________________
(page generated 2025-05-15 23:01 UTC)