[HN Gopher] mCaptcha - Proof of work based, privacy respecting C...
       ___________________________________________________________________
        
       mCaptcha - Proof of work based, privacy respecting CAPTCHA system
        
       Author : vincent_s
       Score  : 99 points
       Date   : 2022-08-04 07:51 UTC (15 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | latchkey wrote:
       | Consider using RandomX instead.
       | 
       | https://github.com/tevador/RandomX
       | 
       | This would tie the algo to CPU based compute and thus knock out
       | ASIC and GPU solutions.
        
       | _448 wrote:
       | > they will have to generate proof-of-work(a bunch of math that
       | will takes time to compute)
       | 
       | This was tried before and failed! The reason it failed was Maths!
       | Most favour clicking simple images or rearranging visual elements
       | instead of solving Maths problems.
        
         | realaravinth wrote:
         | The math is performed by the browser. All the user will have to
         | do is tick a checkbox.
         | 
         | Here's a demo:
         | https://demo.mcaptcha.org/widget/?sitekey=pHy0AktWyOKuxZDzFf...
         | 
         | disclosure: I'm the author of mCaptcha
        
           | _448 wrote:
           | Commercial spammers use high-end resources, where as users
           | don't have that luxury. So will it not be easy for the
           | spammers to scale the resources to overcome this?
        
             | web007 wrote:
             | It's not that they can't scale, it's that you want them to
             | HAVE to scale, to make it more expensive to attack your
             | site than another one.
        
       | turbocon wrote:
       | This is a really intriguing idea. What would stop a spammer from
       | taking the time to get a single token and sharing it among a pool
       | of bots?
        
         | realaravinth wrote:
         | Glad you ask!
         | 
         | There are protections against replay attacks within mCaptcha:
         | tokens are single use also have a lifetime beyond which they
         | are invalid.
         | 
         | Disclosure: author of mCaptcha
        
       | aeyes wrote:
       | As a spammer I wouldn't care much about this, they run thousands
       | of requests in parallel and don't care if they take 1 second or
       | 10 seconds to finish. After all, their machines run all day long.
       | 
       | Much of the more professional spammers are running on hacked
       | botnet machines anyways so they don't really care about maxing
       | the CPUs.
       | 
       | Compute is cheap and the only loser here is the end user with a
       | low end device.
        
       | mabbo wrote:
       | I think calling this a Captcha isn't a greay way to explain what
       | this is about.
       | 
       | Bots spamming your sign up page are playing an economics game.
       | Can I quickly and cheaply enough do whatever this bot action is
       | so that the end result is worth the cost? Captchas were good
       | because you need humans to verify them, which raised the cost to
       | around $0.05 per action. But now ML breaks most captchas cheaply
       | and easily.
       | 
       | Proof of work based 'captchas' are another solution. They slow
       | down the bots, and cost compute time, reducing the economic
       | efficiency of the attacker. They make it not worth the effort.
       | And if the attacks continue, you can autoscale the difficulty.
       | 
       | And for humans, it's a minor slowdown without requiring any
       | mental effort from them.
       | 
       | It's a nice solution.
        
       | realaravinth wrote:
       | Hello,
       | 
       | I'm the author of mCaptcha. I'll be happy to answer any questions
       | that you might have :)
        
         | tromp wrote:
         | Have you considered using a memory hard PoW instead of a
         | computation bound one?
        
           | realaravinth wrote:
           | Only recently, yes. WASM performance is tricky. A memory-
           | heavy algorithm will DoS visitors.
           | 
           | That said, there are protections within mCaptcha to protect
           | against ASICS(PoW result has expiry and variable difficulty
           | scaling), but they are yet to be validated. If they should
           | prove to be insufficient, then I'll try a different approach
           | with memory-heavy algorithms.
           | 
           | disclosure: author of mcaptcha
        
         | technion wrote:
         | I just want to say - people critique every service out there
         | that slows spam and bots. Those critiques are valid from the
         | "it won't stop everything" view, but it clearly stops a
         | proportion, and the wider variety of products out there the
         | less likely a spammer will have a canned answer for a
         | particular site.
        
           | realaravinth wrote:
           | Agreed, it's been an interesting discussion so far with lots
           | of interesting ideas. mCaptcha is a very niche software, that
           | will only work some use cases, but that's okay as long as
           | whoever's deploying it is aware of its drawbacks. :)
        
         | zazaulola wrote:
         | Check out the latest revision of the gist. I have added some
         | explanations. Do you think this implementation will work more
         | efficiently or less efficiently? What kind of statistics do you
         | collect in the database? Is there anything interesting in these
         | statistics? Is collecting statistics worth the performance
         | slowdown that occurs? How effective is banning a client by
         | IP/IPv6?
        
       | rufusroflpunch wrote:
       | I think the Lightning Network will be useful for this role
       | someday. The only proof of work that ever needs to be necessarily
       | run are on the Bitcoin network, and you can transfer Bitcoin or
       | satoshis or fractions of a satoshi as the representative of work,
       | instead of the doing the work directly.
        
       | fabiospampinato wrote:
       | Nice! I had written a little algorithm that one could use to
       | implement something like this (maybe interesting if you want to
       | understand how it could work):
       | https://github.com/fabiospampinato/crypto-puzzle
       | 
       | I think there's something to this, it costs you next to nothing
       | to generate these puzzles and you get a guaranteed, tunable,
       | slowdown factor on attackers (or cost increase for them I guess).
        
       | dustinmoris wrote:
       | This looks like a rate limiting API rather than a Captcha.
       | Captcha is a tool to distinguish automated bots from real users.
       | Let's not confuse the term like we already did with "crypto"
       | please.
        
       | red0point wrote:
       | I couldn't figure out how this scheme works - the only detail I
       | found is that there is ,,a bunch of maths" and replay protection
       | of some sort.
       | 
       | Is there a technical protocol description somewhere? I'd be
       | interested in reading this.
        
         | realaravinth wrote:
         | Apologies, the project isn't ready to be showcased yet. I
         | literally woke up to a message from a friend that said it was
         | on HN. I wish I could explain it on here, but I'm afraid it
         | isn't that easy. Here's the high level overview:
         | 
         | 1. mCaptcha sends a PoW configuration(first XHR request in the
         | demo widget[0]) which includes a challenge text("string"), a
         | salt and a difficulty factor
         | 
         | 2. Client generates proof of work by concatenating "string" +
         | salt until difficulty factor is met. If difficulty factor isn't
         | satisfied, it will continue trying to generate Proof of
         | Work(PoW) by appending nonce and incrementing it until the
         | difficulty factor is satisfied.
         | 
         | 3. Client sends PoW to mCaptcha, which includes nonce, original
         | salt and "string"(second XHR request in the demo widget)
         | 
         | 4.mCaptcha computes hash for "string" + salt + nonce. If
         | difficulty factor is met(i.e resultant hash > difficulty
         | factor), then mCaptcha responds with access token.
         | 
         | 5. Client sends access token to the web service.
         | 
         | 6. Web services authenticates access token with mCaptcha and
         | only grants access to protected resource, if the token checks
         | out.
         | 
         | I will work on a more detailed specification and report back
         | when it is ready(3 weeks, I think)
         | 
         | [0]:
         | https://demo.mcaptcha.org/widget/?sitekey=pHy0AktWyOKuxZDzFf...
         | 
         | disclosure: author of mCaptcha
        
       | oefrha wrote:
       | Just checked a commercial captcha solving service, Recaptcha rate
       | is currently at $1-$2 per thousand. Looks like this is only going
       | to be cheaper to operate commercially.
        
         | zhfliz wrote:
         | cheapest offers go down to $0.55-$0.6 for 1k recaptcha v2 with
         | solving times from close to a second up to 3 minutes depending
         | on provider and luck.
        
       | nialv7 wrote:
       | If you are going to waste people's CPU time anyway, might just as
       | well mine some bitcoins with it I guess?
       | 
       | Using users' browsers to mine bitcoins is doing almost exactly
       | the same thing as this.
        
       | AbacusAvenger wrote:
       | I looked into doing something like this once and decided it
       | wasn't going to be very effective, for a few different reasons.
       | 
       | JS engines (or even WASM) aren't going to be as fast at this kind
       | of work as native machine code would be. Especially when you
       | consider that libraries like OpenSSL have heavily tuned
       | implementations of the SHA algorithms. Any bot solving a SHA-
       | based challenge would be able to extract the challenge from the
       | page and execute it using native machine code faster than any
       | legitimate user's browser could. And if you increase the
       | difficulty of the challenge, it's just going to punish real users
       | running the challenge in their browser more than it would the
       | bots.
       | 
       | It's also based on the assumption that proof-of-work is going to
       | increase the cost of doing business for the bots in some way and
       | discourage their behavior. Many of the bots I was dealing with in
       | my case were either using cloud compute services fraudulently or
       | were running on compromised machines of unknowing people. And
       | they tended not to care about how long it took or how high-effort
       | the challenge was, they were very dedicated at getting past it
       | and continuing their malicious behavior.
       | 
       | There's also the risk that any challenge that's sufficiently
       | difficult may also make the user's browser angry that a script is
       | either going unresponsive or eating tons of CPU, which isn't much
       | different from cryptocurrency miner behavior.
        
         | jsnell wrote:
         | Yes, the range of applications where proof of work is viable is
         | really narrow. (In fact, so narrow that I suspect it can't work
         | for anything where the abuse has a monetary motive.)
         | 
         | One way to think about this is by comparing the cost of passing
         | the POW to the money the same compute resources would make when
         | mining a cryptocurrency. I believe that a low-end phone used
         | for mining a CPU-based cryptocurrency would be making O(1 cent)
         | per day. Let's say that you're willing to cause 1 minute of
         | friction for legit users on low-end devices (already something
         | that I'd expect will be unacceptable from a product
         | perspective). Congratulations: you just cost the attacker
         | 1/1500th of a cent. That's orders of magnitudes too low to have
         | any impact on the economics of spam, credential stuffing,
         | scraping, or other typical bulk abuse.
        
         | vintermann wrote:
         | Yes, the spam doesn't actually have to be profitable for the
         | seller of the spammed product. Not as long as he thinks it is,
         | and is willing to pay the person spamming on his behalf.
         | 
         | And as you say, it's often stolen resources. There may even be
         | another link in the "think it pays" chain: the spammer may buy
         | hacked instances out of a mistaken belief that he can make
         | money on it, by selling spamming services to merchants who also
         | only _think_ it pays. There 's a certain "crime premium" where
         | some people seem willing to pay extra (in money or effort) for
         | the feeling that they're fooling someone.
        
         | realaravinth wrote:
         | Thank you for your detailed response, you raise some very
         | interesting and valid points!
         | 
         | > JS engines (or even WASM) aren't going to be as fast at this
         | kind of work as native machine code would be
         | 
         | You are right. mCaptcha has a WASM and a JS polyfill
         | implementations. Native code will definitely be faster than
         | WASM but in an experiment I ran for fun[0], I discovered that
         | the WASM was roughly 2s slower than native implementation.
         | 
         | > It's also based on the assumption that proof-of-work is going
         | to increase the cost of doing business
         | 
         | mCaptcha is basically a rate-limiter. If an expensive
         | endpoint(say registration: hashing + other validation is
         | expensive) can handle 4k requests/seconds and has mCaptcha
         | installed, then the webmaster can force the attacker to slow
         | down to 1 request/second, significantly reducing the load on
         | their server. That isn't to say that the webmaster will be able
         | to protect themselves against sufficiently motivated attacker
         | who has botnets. :)
         | 
         | > There's also the risk that any challenge that's sufficiently
         | difficult may also make the user's browser angry that a script
         | is either going unresponsive or eating tons of CPU, which isn't
         | much different from cryptocurrency miner behavior.
         | 
         | Also correct. The trick is in finding optimum difficulty which
         | will work for the majority of the devices. A survey to
         | benchmark PoW performance of devices in the wild is WIP[1],
         | which will help webmasters configure their CAPTCHA better.
         | 
         | [0]: https://mcaptcha.org/blog/pow-performance Benchmarking
         | platforms weren't optimised for running benchmarks, kindly take
         | it with a grain of salt. It was a bored Sunday afternoon
         | experiment.
         | 
         | [1]: https://github.com/mcaptcha/survey
         | 
         | Full disclosure: I'm the author of mCaptcha
        
           | aeternum wrote:
           | I'd suggest you consider a new name. Captcha stands for
           | Completely Automated Public Turing Test and this
           | implementation has little to do with that.
        
           | tusharsoni wrote:
           | > mCaptcha is basically a rate-limiter
           | 
           | This is a much better explanation of what it does than
           | captcha where I expect "proof-of-human". A PoW based rate-
           | limiter is a really interesting idea! Usually, the challenge
           | with unauthenticated endpoints (ex. signups) is that the
           | server has to do more work (db queries) than the client (make
           | an http request) so it is really easy for the client to bring
           | the server down. With PoW, we're essentially flipping that
           | model where the client has to do more work than the server.
           | Good work!
        
           | AbacusAvenger wrote:
           | About the benchmark data:
           | 
           | It looks like your pow_sha256 library is using is the "sha2"
           | crate, which is a pure Rust implementation of SHA2. So your
           | benchmark is around the delta of your library compiled to
           | native code vs. your library compiled to WASM, which is an
           | interesting benchmark but I don't think it answers the right
           | question.
           | 
           | A more interesting benchmark would probably answer the
           | question "what would those looking to defeat mCaptcha use and
           | how does that performance compare?" So perhaps an
           | implementation of an mCaptcha challenge solver using OpenSSL
           | would be warranted for that.
        
             | realaravinth wrote:
             | That's a good idea, I'll be sure to do that!
        
               | Retr0id wrote:
               | You should compare against a GPU implementation of SHA256
               | (hashcat has a good one).
               | 
               | You may also want to consider ASICs - although you won't
               | be able to build your own ASIC, you can look at the
               | hashrates offered by bitcoin mining hardware, and
               | extrapolate.
        
               | AbacusAvenger wrote:
               | Also remember that native code can use multithreading, so
               | if your challenge is something that could be farmed out
               | to multiple CPU threads until one finds the solution,
               | that's another factor in favor of native code
               | performance.
        
               | paranoidrobot wrote:
               | Lets just assume that you solve the "it takes a while to
               | run" thing through some clever bits of hard-to-optimise
               | math, that's difficult to parallelise or multithread or
               | whatever.
               | 
               | If all it takes is computer time, then that's a cheap
               | thing for the botnet operator to solve. They can just
               | spin up another instance, and split up the crawling task
               | to another computer (or 200).
        
               | Retr0id wrote:
               | A captcha can _always_ be parallelised. All the attacker
               | has to do is attempt n different captchas in parallel.
        
               | realaravinth wrote:
               | I haven't encountered a case where multithreading will
               | make the algorithm weaker, but I do have a variation of
               | the benchmark code(on disk, at the moment) that will spin
               | up multiple worker threads to compute PoW.
        
           | mort96 wrote:
           | > mCaptcha is basically a rate-limiter.
           | 
           | Hmm, is it a better rate limiter than others? I know that
           | nginx, for example, makes it pretty easy to rate limit based
           | on IP address with the `limit_req` and `limit_req_zone`
           | directives.
           | 
           | In essence, ngix's rate limiter also works by making each
           | request consume a resource, but it makes the resource
           | consumed an IP address (or range) rather than compute
           | resources. It seems intuitive that a malicious actor would
           | have an easier time scaling compute than IP addresses, while
           | a legitimate user will _always_ have an IP address but might
           | be on a machine with 1/100000th the compute resources of the
           | malicious actor.
        
             | nextaccountic wrote:
             | You can and should use multiple kinds of rate limiters
        
               | mort96 wrote:
               | "Can" is true, and a good point. "Should" is a bit more
               | dubious though; if IP-range-based rate limiting is
               | enough, not wasting your users' battery with PoW-based
               | rate limiting seems like a good thing. It seems like a
               | potentially useful tool in your tool belt, which you
               | should probably only deploy if IP-address-based rate
               | limiting proves to be not enough.
        
           | operator-name wrote:
           | Can you elaborate on why you chose SHA256 as the hash
           | function?
           | 
           | Attackers aren't exactly limited to web apis, and SHA265 is
           | known to be trivially parallelizable on a GPU. RandomX is one
           | such example[0], which reminds me of a similar initiative
           | called RPC-Pay.
           | 
           | [0]: https://github.com/tevador/RandomX [1]:
           | https://www.monerooutreach.org/stories/RPC-Pay.html
        
         | jchw wrote:
         | Yep, I have also done this and come to a similar conclusion.
         | The best performance I got was with the WebCrypto API, where I
         | got, IIRC, 100,000 SHA512 hashes a second, with a very
         | optimized routine for iterating the hashcash string. SHA512
         | probably makes the most sense since AFAIK it's the most
         | intensive hash supported by WebCrypto, and the WebCrypto digest
         | API is async, so a whole lot of time is going to be spent
         | awaiting no matter what you do.
         | 
         | I think WebCrypto is available in workers, so you could
         | probably use numerous workers to get a better hashrate. Still,
         | I don't suspect that would jump it past a million, which seems
         | pretty bad for a desktop computer, and it would be a lot worse
         | on a mobile phone.
         | 
         | It might still be a meaningful impediment when combined with
         | other measures, but with the low bar you'd have to set for
         | preimage bits for mobile devices, it's a little questionable.
        
       | progx wrote:
       | This fights spammers, that use own server, but this will not
       | protect from hostile taken computers, that can use their hash
       | power to resolve the captcha.
        
         | tmikaeld wrote:
         | It does, why spend >4 seconds for a single spam when you can
         | move to a new target and send >40 spams in the same time?
        
       | fanf2 wrote:
       | Proof of work proves not to work
       | https://www.cl.cam.ac.uk/~rnc1/proofwork2.pdf (as an anti-spam
       | tactic)
        
       | redox99 wrote:
       | Wouldn't a spammer almost always be more able and more willing to
       | dedicate resources than an actual user?
       | 
       | Let's say it takes a legitimate user 60 seconds of PoW on their
       | phone. That's a dealbreaker. Now a spammer that takes them 15
       | seconds on their server, they will still spam.
       | 
       | I guess it could rate limit against high throughput of spam (more
       | like DoS prevention than antispam really), but an IP rate limit
       | would probably work as well if not better. An attacker with a
       | large pool of IPs will probably have a large pool of CPUs anyway.
        
       | Jenk wrote:
       | I've used SHA-256 HashCash before, for PoW authentication on an
       | API built to support a mobile app. The nice bit about using
       | HashCash is increasing the difficulty for successive failed
       | attempts.
       | 
       | I implemented this in response to our API being hammered by
       | exploratory bots/scripts sending malformed requests. By adding
       | this, it dropped failed requests by 99%.
        
       | zazaulola wrote:
       | Inspired by your idea, I threw a simple static code, compatible
       | with any web engine running on nodejs.
       | https://gist.github.com/zazaulola/6742e41611b85fc48931a79bac...
        
       | jqpabc123 wrote:
       | Much, much simpler solution --- an actual time based rate
       | limiter:
       | 
       | 1) Include a hidden, unique token when the login screen is served
       | to client.
       | 
       | 2) On the client, enforce a minimum 3 sec delay from time of
       | screen load before the login will be submitted with the hidden
       | token included.
       | 
       | 3) On the server, if the hidden token isn't returned or is
       | unknown/not found or if the delay from the time of issue is less
       | than 3 sec., then reject the login and optionally ban the IP
       | address after too many failed attempts.
       | 
       | The 3 sec delay is more than enough to discourage/prevent brute
       | force attacks but not enough to annoy legitimate users.
        
         | orheep wrote:
         | Wouldn't this exhaust the server resources with storing all the
         | tokens? Which you would need to do significantly longer than 3
         | secs? Banning the IP may help but then you could just do a
         | simple fail2ban instead?
        
           | jqpabc123 wrote:
           | _Wouldn't this exhaust the server resources with storing all
           | the tokens?_
           | 
           | Not really --- only 8 bytes per token and they can be
           | discarded on successful login. Tokens older than X minutes or
           | with more than X attempts can be discarded/rejected too.
           | 
           | How many users are legitimately attempting to log in to your
           | server at the same time?
           | 
           | If you worried about this, encode the current time into the
           | token using a hashing/encryption/checksum method of your
           | choice. This way, everything needed to validate the attempt
           | is submitted along with the credentials.
        
         | nneonneo wrote:
         | But this is totally useless - it doesn't consume any resources
         | of the spammer! They can just launch a large number of requests
         | (possibly from their army of bot machines to defeat IP
         | limiting) and have them each wait 3s while doing other
         | nefarious things in parallel.
         | 
         | A PoW has at least the redeeming feature of consuming some
         | resources for those 3s (or so...) making it self-limiting;
         | there's only so many such computations a computer can do in
         | parallel.
        
           | jqpabc123 wrote:
           | _...it doesn 't consume any resources of the spammer!_
           | 
           | It consumes the most valuable resource in the world --- time.
        
             | nneonneo wrote:
             | But it doesn't though! A typical laptop can easily hold
             | tens of thousands of I/O connections open at once in your
             | favorite async I/O environment - that number can be in the
             | millions with careful optimizations applied. Each
             | connection just needs a sleep(3) applied between the
             | initial form request and the submission.
             | 
             | A 3 second form delay just means the difference between a
             | spammer launching 1000000 requests and posting them
             | immediately to your database, vs them launching 1000000
             | requests and posting them to your database 3 seconds later.
        
               | jqpabc123 wrote:
               | Who are we kidding here --- most likely, your server
               | can't handle 1000000 simultaneous requests.
               | 
               | My servers don't have enough bandwidth for that. Most of
               | the connections are going to get dropped one way or the
               | other. In my case, they will be intentionally dropped as
               | being a likely denial of service attack.
        
       | figmaheart255 wrote:
       | is this built on top of HashCash[1]?
       | 
       | [1]: https://en.wikipedia.org/wiki/Hashcash
        
       | stuaxo wrote:
       | I've come to think PoW stands for "proof of waste", it's just
       | wasting electricity.
        
       | ranger_danger wrote:
       | > AGPL
       | 
       | Straight into the trash.
        
       | cush wrote:
       | This is a pro-bot rate-limiter. In reality, most humans use low-
       | spec feature phones with slow CPUs. This punishes those humans
       | and promotes bots running on GPUs and highjacked cloud infra.
        
         | ranger_danger wrote:
         | Why not just start out rate-limiting everyone equally with a
         | timer or something?
        
       | web007 wrote:
       | Looks like a decent implementation of HashCash - great! They
       | should implement TOTP-based nonce generation to avoid the need
       | for a database entirely, that would make the system much more
       | lightweight to deploy.
       | 
       | Some people miss the point of this. If you and another person are
       | running away from a cheetah (fastest land animal) then you don't
       | have to outrun the cheetah to survive, just the other person. The
       | same is true for sites getting away from spammers. They will go
       | for sites that are easy to attack before yours, so as long as the
       | effort outweighs the outcome you're safe. It doesn't matter if
       | the solution isn't perfect, it's better than nothing, and if it's
       | not widely adopted it's not worth the effort to bother cracking.
        
       | blackoil wrote:
       | If it became popular enough, people with weakest machines and
       | mobiles will suffer, while a spam bot can ruin optimized code or
       | ASIC.
        
         | [deleted]
        
         | [deleted]
        
         | realaravinth wrote:
         | Very good point!
         | 
         | Accessibility is a critical to mCaptcha. In fact, Codeberg is
         | trying out mCaptcha purely because of its more accessible[0].
         | That said, it is possible to choose a difficulty factor very
         | high to deny access to folks with older, slower devices. A
         | survey to benchmark mCaptcha performance on devices in the wild
         | is WIP[1]. I hope it will provide insights to help webmasters
         | integrating mCaptcha to select difficulty factors that work for
         | their visitors.
         | 
         | [0]:
         | https://codeberg.org/Codeberg/Community/issues/479#issuecomm...
         | 
         | [1]: https://github.com/mCaptcha/survey
        
           | arsome wrote:
           | The problem is that mCaptcha allows you to deny access to
           | older, slower devices, but does nothing to deny access to
           | bots. Which can likely run optimized implementations of the
           | PoW which run hundreds of times faster than the web based
           | version.
        
       | tuetuopay wrote:
       | Oh god no please no. The web is already heavy enough to browse.
       | 
       | Sincerely,
       | 
       | A user of a 7 year old smartphone.
        
         | paulnpace wrote:
         | It took less time for me calculate the hash (~2 seconds) than
         | to do a stupid puzzle game.
         | 
         | Sincerely,
         | 
         | A user of a 8 year old smartphone.
        
       | rogers18445 wrote:
       | Would only make sense to do if the browser exposes an optimized
       | argon2 API to web apps. And it would have to be argon2 or some
       | other memory hard hash for it to not be a joke.
        
       | supernes wrote:
       | The goal of CAPTCHA is to tell computers and humans apart. I
       | appreciate the effort towards better UX, but there are already
       | "invisible" CAPTCHAs like Botpoison that discriminate better than
       | this. PoW solutions are just more energy-intensive rate limits.
        
         | xnorswap wrote:
         | Yes, this feels like an ANTI-CAPTCHA. It's a task that
         | computers can perform dramatically better than humans.
        
         | realaravinth wrote:
         | > I appreciate the effort towards better UX, but there are
         | already "invisible" CAPTCHAs like Botpoison that discriminate
         | better than this.
         | 
         | Interesting project, thank you for sharing! From Botpoison's
         | website[0] under FAQ:
         | 
         | > Botpoison combines: > - Hashcash , a cryptographic hash-based
         | proof-of-work algorithm. > - IP reputation checks, cross-
         | referencing proprietary and 3rd party data sets. > - IP rate-
         | limits. > - Session and request analysis.
         | 
         | Seems like it is PoW + IP rate-limits. IP rate-limits. though
         | very effective at immediately identifying spam, it hurts folks
         | using Tor and those behind CG-NAT[1].
         | 
         | And as for invisibility, CAPTCHA solves in mCaptcha have a
         | lifetime, beyond which they are invalid. So generating PoW when
         | the checkbox is ticked gives optimum results. But should the
         | webmaster choose to hide, the widget, they can always choose to
         | hook the widget to a form submit event.
         | 
         | [0]: https://botpoison.com/ [1]:
         | https://en.wikipedia.org/wiki/Carrier-grade_NAT
         | 
         | full disclosure: I'm the author of mCaptcha
        
           | dspillett wrote:
           | _> and those behind CG-NAT_
           | 
           | I'm in two minds about how I feel inconveniencing those
           | behind CG-NAT. I don't want to punish the innocent, but the
           | ISPs aren't going to move towards better solutions (IPv6)
           | without a push from their paying customers, and they'll never
           | get that push with sufficient strength if we work tirelessly
           | to make the problem affect us and not affect those
           | subscribers.
        
             | 2000UltraDeluxe wrote:
             | In the real world, on the other hand, customers will simply
             | conclude that one site doesn't work while others do, and
             | that the problem must be with the one site that doesn't
             | work.
        
           | supernes wrote:
           | Thinking about it a bit more, systems like mCaptcha and
           | Botpoison aren't really CAPTCHA in the strict sense - they
           | solve a somewhat different problem than telling if there's a
           | human at the other end, and IMO that's an important
           | distinction to make (and doesn't necessarily make them
           | inferior to other solutions.)
           | 
           | I still think PoW alone is not enough as it can be automated,
           | albeit at a slower rate. Most of the time I worry more about
           | low-volume automated submissions than high-frequency garbage.
           | The real value is in the combination of factors, especially
           | what BP call the "session and request analysis" and other
           | fingerprinting solutions.
        
             | ivanhoe wrote:
             | strictly speaking it's a rate limiter, not captcha... but
             | frankly it's probably closer to what most ppl use captchas
             | for these days...
        
             | realaravinth wrote:
             | > Thinking about it a bit more, systems like mCaptcha and
             | Botpoison aren't really CAPTCHA in the strict sense
             | 
             | Very true! I chose to use "captcha" because it's easier to
             | convey what it does than, say, calling it a PoW-powered
             | rate-limter.
             | 
             | > The real value is in the combination of factors,
             | especially what BP call the "session and request analysis"
             | and other fingerprinting solutions.
             | 
             | Also true. I'm not sure if it is possible to implement
             | fingerprinting without tracking activity across the
             | internet --- something that a privacy-focused software
             | can't do.
             | 
             | I have been investigating privacy-focused, hash-based spam
             | detection that uses peer reputation[0] but the hash-based
             | mechanism can be broken with a slight modification to the
             | spam text.
             | 
             | I would love to implement spam detection but it shouldn't
             | compromise the visitor's privacy :)
             | 
             | [0]: please see "kavasam" under "Projects that I'm
             | currently working on". I should set up a website for the
             | project soon. https://batsense.net/about
             | 
             | Disclosure: author of mCaptcha.
        
               | supernes wrote:
               | Client-local fingerprinting is not inherently evil, just
               | when you combine it with additional signals and decide to
               | use it in violation of users' privacy. AFAIK it's the
               | most reliable way to distinguish unique visitor agents,
               | and under that use case it's far more respectful of
               | personal information than an IP address.
        
         | [deleted]
        
       | jarrell_mark wrote:
       | AGPL seems problematic here. If a company uses this as their
       | captcha system does it force them to open source their entire
       | code base?
        
       | bradhe wrote:
       | Sign a token, verify the signature, provide a nonce used in
       | subsequent requests.
       | 
       | Did I miss something or is that description not sufficiently
       | crypto-y enough?
        
       | beardyw wrote:
       | It's a good idea. Hopefully it doesn't end in huge server farms
       | wasting energy just for spam!
        
         | RektBoy wrote:
         | It will, or zombies.
        
         | paulnpace wrote:
         | Those farms will spend the energy anyway. It's a question of
         | what they spend the energy on.
        
       | password4321 wrote:
       | Or use one of the Monero miners typically injected by malicious
       | ads!
        
       | nneonneo wrote:
       | This isn't a CAPTCHA. A CAPTCHA should be something that can tell
       | a human user apart from a non-human user with a high degree of
       | accuracy, such that it should be hard for a bot to pass even a
       | single instance.
       | 
       | As noted elsewhere in the thread, this is a rate-limiting system.
       | It cannot protect low-rate critical resources like registrations,
       | but would be useful in mitigating large-scale spam attacks.
        
         | csense wrote:
         | According to Wikipedia, CAPTCHA stands for Completely Automated
         | Public Turing test to tell Computers and Humans Apart. This
         | code shouldn't be called CAPTCHA as it makes no attempt to tell
         | humans from machines. (If anything, SHA256 proof-of-work is a
         | problem that's easy for a computer but hard for a human, the
         | _opposite_ of a Turing test.) I 'd call it "Proof-Of-Work
         | Client-side RAte Protection" -- it's a POWCRAP, not a CAPTCHA.
         | 
         | (I invented the acronym just now in this post; it could perhaps
         | use some work.)
        
           | waynesonfire wrote:
           | CRAP is sufficient
        
       | bArray wrote:
       | From what I understand, this would allow somebody with reasonable
       | resources to easily crack captchas with a high-end GPU. You could
       | increase the complexity of the PoW, but ultimately you then just
       | end up in a resource race.
       | 
       | Ideally you want an attacker to spend as long as possible using
       | up computation resources, whilst your normal clients spend on
       | average little. I believe you would want a multi-stage approach
       | that involves some human interaction, which is statistically
       | failed more often by computers, and keeps them busy for a long
       | time doing pointless math.
       | 
       | The reason it would have to be multi-staged is that you don't
       | want the attacker doing the computing to realize they are solving
       | a pointless list of problems with no hope of getting a token. A
       | normal user might on average do a -> b -> c, whereas you would
       | make an attacking machine do a -> b -> c -> d -> e -> f, and the
       | not get a token for example. (It would have to be statistically
       | setup as not to encourage the attacker to bail early.)
       | 
       | I think I would make it a slightly RAM heavy problem that limits
       | how many of these problems could be solved in parallel. You would
       | obviously setup the problem in such a way that networking is
       | minimal.
       | 
       | For example, you could send a variation of a Langton's Ant map
       | pre-seeded with random patches for a significantly large map, and
       | then tell the client to run for X time steps, then give you the
       | result in some specific area.
        
         | operator-name wrote:
         | The term for describing is memory hard functions. RandomX[0] is
         | one such example where GPU parallelism does not net them a
         | large advantage over CPUs.
         | 
         | [0]: https://github.com/tevador/RandomX
        
           | bArray wrote:
           | Thinking about it, this could be the way forwards. Memory
           | offers several natural bottlenecks:
           | 
           | 1. Memory size - Memory is somewhat costly (even now), with
           | most entry laptops being stuck in the range of 8GB.
           | 
           | 2. Access bandwidth - Getting a CPU to communicate with the
           | RAM takes some time, improvements are incremental and are
           | fundamentally limited.
           | 
           | 3. Thread access - Threads compete for bandwidth, as long as
           | cache hits are low.
        
       | quickthrower2 wrote:
       | Has to be easy enough for an old laptop to prove in under 5
       | seconds but hard enough that (if this becomes popular) someone
       | with a mining rig can't be cracking millions of these a day.
       | 
       | Also the idea ain't so new!: http://www.hashcash.org/hashcash.pdf
        
       | BiteCode_dev wrote:
       | I'm going to implement a captcha that requires proof of life.
       | 
       | You need to send a severed finger by mail to pass it.
        
       ___________________________________________________________________
       (page generated 2022-08-04 23:02 UTC)