[HN Gopher] mCaptcha - Proof of work based, privacy respecting C...
___________________________________________________________________
mCaptcha - Proof of work based, privacy respecting CAPTCHA system
Author : vincent_s
Score : 99 points
Date : 2022-08-04 07:51 UTC (15 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| latchkey wrote:
| Consider using RandomX instead.
|
| https://github.com/tevador/RandomX
|
| This would tie the algo to CPU based compute and thus knock out
| ASIC and GPU solutions.
| _448 wrote:
| > they will have to generate proof-of-work(a bunch of math that
| will takes time to compute)
|
| This was tried before and failed! The reason it failed was Maths!
| Most favour clicking simple images or rearranging visual elements
| instead of solving Maths problems.
| realaravinth wrote:
| The math is performed by the browser. All the user will have to
| do is tick a checkbox.
|
| Here's a demo:
| https://demo.mcaptcha.org/widget/?sitekey=pHy0AktWyOKuxZDzFf...
|
| disclosure: I'm the author of mCaptcha
| _448 wrote:
| Commercial spammers use high-end resources, where as users
| don't have that luxury. So will it not be easy for the
| spammers to scale the resources to overcome this?
| web007 wrote:
| It's not that they can't scale, it's that you want them to
| HAVE to scale, to make it more expensive to attack your
| site than another one.
| turbocon wrote:
| This is a really intriguing idea. What would stop a spammer from
| taking the time to get a single token and sharing it among a pool
| of bots?
| realaravinth wrote:
| Glad you ask!
|
| There are protections against replay attacks within mCaptcha:
| tokens are single use also have a lifetime beyond which they
| are invalid.
|
| Disclosure: author of mCaptcha
| aeyes wrote:
| As a spammer I wouldn't care much about this, they run thousands
| of requests in parallel and don't care if they take 1 second or
| 10 seconds to finish. After all, their machines run all day long.
|
| Much of the more professional spammers are running on hacked
| botnet machines anyways so they don't really care about maxing
| the CPUs.
|
| Compute is cheap and the only loser here is the end user with a
| low end device.
| mabbo wrote:
| I think calling this a Captcha isn't a greay way to explain what
| this is about.
|
| Bots spamming your sign up page are playing an economics game.
| Can I quickly and cheaply enough do whatever this bot action is
| so that the end result is worth the cost? Captchas were good
| because you need humans to verify them, which raised the cost to
| around $0.05 per action. But now ML breaks most captchas cheaply
| and easily.
|
| Proof of work based 'captchas' are another solution. They slow
| down the bots, and cost compute time, reducing the economic
| efficiency of the attacker. They make it not worth the effort.
| And if the attacks continue, you can autoscale the difficulty.
|
| And for humans, it's a minor slowdown without requiring any
| mental effort from them.
|
| It's a nice solution.
| realaravinth wrote:
| Hello,
|
| I'm the author of mCaptcha. I'll be happy to answer any questions
| that you might have :)
| tromp wrote:
| Have you considered using a memory hard PoW instead of a
| computation bound one?
| realaravinth wrote:
| Only recently, yes. WASM performance is tricky. A memory-
| heavy algorithm will DoS visitors.
|
| That said, there are protections within mCaptcha to protect
| against ASICS(PoW result has expiry and variable difficulty
| scaling), but they are yet to be validated. If they should
| prove to be insufficient, then I'll try a different approach
| with memory-heavy algorithms.
|
| disclosure: author of mcaptcha
| technion wrote:
| I just want to say - people critique every service out there
| that slows spam and bots. Those critiques are valid from the
| "it won't stop everything" view, but it clearly stops a
| proportion, and the wider variety of products out there the
| less likely a spammer will have a canned answer for a
| particular site.
| realaravinth wrote:
| Agreed, it's been an interesting discussion so far with lots
| of interesting ideas. mCaptcha is a very niche software, that
| will only work some use cases, but that's okay as long as
| whoever's deploying it is aware of its drawbacks. :)
| zazaulola wrote:
| Check out the latest revision of the gist. I have added some
| explanations. Do you think this implementation will work more
| efficiently or less efficiently? What kind of statistics do you
| collect in the database? Is there anything interesting in these
| statistics? Is collecting statistics worth the performance
| slowdown that occurs? How effective is banning a client by
| IP/IPv6?
| rufusroflpunch wrote:
| I think the Lightning Network will be useful for this role
| someday. The only proof of work that ever needs to be necessarily
| run are on the Bitcoin network, and you can transfer Bitcoin or
| satoshis or fractions of a satoshi as the representative of work,
| instead of the doing the work directly.
| fabiospampinato wrote:
| Nice! I had written a little algorithm that one could use to
| implement something like this (maybe interesting if you want to
| understand how it could work):
| https://github.com/fabiospampinato/crypto-puzzle
|
| I think there's something to this, it costs you next to nothing
| to generate these puzzles and you get a guaranteed, tunable,
| slowdown factor on attackers (or cost increase for them I guess).
| dustinmoris wrote:
| This looks like a rate limiting API rather than a Captcha.
| Captcha is a tool to distinguish automated bots from real users.
| Let's not confuse the term like we already did with "crypto"
| please.
| red0point wrote:
| I couldn't figure out how this scheme works - the only detail I
| found is that there is ,,a bunch of maths" and replay protection
| of some sort.
|
| Is there a technical protocol description somewhere? I'd be
| interested in reading this.
| realaravinth wrote:
| Apologies, the project isn't ready to be showcased yet. I
| literally woke up to a message from a friend that said it was
| on HN. I wish I could explain it on here, but I'm afraid it
| isn't that easy. Here's the high level overview:
|
| 1. mCaptcha sends a PoW configuration(first XHR request in the
| demo widget[0]) which includes a challenge text("string"), a
| salt and a difficulty factor
|
| 2. Client generates proof of work by concatenating "string" +
| salt until difficulty factor is met. If difficulty factor isn't
| satisfied, it will continue trying to generate Proof of
| Work(PoW) by appending nonce and incrementing it until the
| difficulty factor is satisfied.
|
| 3. Client sends PoW to mCaptcha, which includes nonce, original
| salt and "string"(second XHR request in the demo widget)
|
| 4.mCaptcha computes hash for "string" + salt + nonce. If
| difficulty factor is met(i.e resultant hash > difficulty
| factor), then mCaptcha responds with access token.
|
| 5. Client sends access token to the web service.
|
| 6. Web services authenticates access token with mCaptcha and
| only grants access to protected resource, if the token checks
| out.
|
| I will work on a more detailed specification and report back
| when it is ready(3 weeks, I think)
|
| [0]:
| https://demo.mcaptcha.org/widget/?sitekey=pHy0AktWyOKuxZDzFf...
|
| disclosure: author of mCaptcha
| oefrha wrote:
| Just checked a commercial captcha solving service, Recaptcha rate
| is currently at $1-$2 per thousand. Looks like this is only going
| to be cheaper to operate commercially.
| zhfliz wrote:
| cheapest offers go down to $0.55-$0.6 for 1k recaptcha v2 with
| solving times from close to a second up to 3 minutes depending
| on provider and luck.
| nialv7 wrote:
| If you are going to waste people's CPU time anyway, might just as
| well mine some bitcoins with it I guess?
|
| Using users' browsers to mine bitcoins is doing almost exactly
| the same thing as this.
| AbacusAvenger wrote:
| I looked into doing something like this once and decided it
| wasn't going to be very effective, for a few different reasons.
|
| JS engines (or even WASM) aren't going to be as fast at this kind
| of work as native machine code would be. Especially when you
| consider that libraries like OpenSSL have heavily tuned
| implementations of the SHA algorithms. Any bot solving a SHA-
| based challenge would be able to extract the challenge from the
| page and execute it using native machine code faster than any
| legitimate user's browser could. And if you increase the
| difficulty of the challenge, it's just going to punish real users
| running the challenge in their browser more than it would the
| bots.
|
| It's also based on the assumption that proof-of-work is going to
| increase the cost of doing business for the bots in some way and
| discourage their behavior. Many of the bots I was dealing with in
| my case were either using cloud compute services fraudulently or
| were running on compromised machines of unknowing people. And
| they tended not to care about how long it took or how high-effort
| the challenge was, they were very dedicated at getting past it
| and continuing their malicious behavior.
|
| There's also the risk that any challenge that's sufficiently
| difficult may also make the user's browser angry that a script is
| either going unresponsive or eating tons of CPU, which isn't much
| different from cryptocurrency miner behavior.
| jsnell wrote:
| Yes, the range of applications where proof of work is viable is
| really narrow. (In fact, so narrow that I suspect it can't work
| for anything where the abuse has a monetary motive.)
|
| One way to think about this is by comparing the cost of passing
| the POW to the money the same compute resources would make when
| mining a cryptocurrency. I believe that a low-end phone used
| for mining a CPU-based cryptocurrency would be making O(1 cent)
| per day. Let's say that you're willing to cause 1 minute of
| friction for legit users on low-end devices (already something
| that I'd expect will be unacceptable from a product
| perspective). Congratulations: you just cost the attacker
| 1/1500th of a cent. That's orders of magnitudes too low to have
| any impact on the economics of spam, credential stuffing,
| scraping, or other typical bulk abuse.
| vintermann wrote:
| Yes, the spam doesn't actually have to be profitable for the
| seller of the spammed product. Not as long as he thinks it is,
| and is willing to pay the person spamming on his behalf.
|
| And as you say, it's often stolen resources. There may even be
| another link in the "think it pays" chain: the spammer may buy
| hacked instances out of a mistaken belief that he can make
| money on it, by selling spamming services to merchants who also
| only _think_ it pays. There 's a certain "crime premium" where
| some people seem willing to pay extra (in money or effort) for
| the feeling that they're fooling someone.
| realaravinth wrote:
| Thank you for your detailed response, you raise some very
| interesting and valid points!
|
| > JS engines (or even WASM) aren't going to be as fast at this
| kind of work as native machine code would be
|
| You are right. mCaptcha has a WASM and a JS polyfill
| implementations. Native code will definitely be faster than
| WASM but in an experiment I ran for fun[0], I discovered that
| the WASM was roughly 2s slower than native implementation.
|
| > It's also based on the assumption that proof-of-work is going
| to increase the cost of doing business
|
| mCaptcha is basically a rate-limiter. If an expensive
| endpoint(say registration: hashing + other validation is
| expensive) can handle 4k requests/seconds and has mCaptcha
| installed, then the webmaster can force the attacker to slow
| down to 1 request/second, significantly reducing the load on
| their server. That isn't to say that the webmaster will be able
| to protect themselves against sufficiently motivated attacker
| who has botnets. :)
|
| > There's also the risk that any challenge that's sufficiently
| difficult may also make the user's browser angry that a script
| is either going unresponsive or eating tons of CPU, which isn't
| much different from cryptocurrency miner behavior.
|
| Also correct. The trick is in finding optimum difficulty which
| will work for the majority of the devices. A survey to
| benchmark PoW performance of devices in the wild is WIP[1],
| which will help webmasters configure their CAPTCHA better.
|
| [0]: https://mcaptcha.org/blog/pow-performance Benchmarking
| platforms weren't optimised for running benchmarks, kindly take
| it with a grain of salt. It was a bored Sunday afternoon
| experiment.
|
| [1]: https://github.com/mcaptcha/survey
|
| Full disclosure: I'm the author of mCaptcha
| aeternum wrote:
| I'd suggest you consider a new name. Captcha stands for
| Completely Automated Public Turing Test and this
| implementation has little to do with that.
| tusharsoni wrote:
| > mCaptcha is basically a rate-limiter
|
| This is a much better explanation of what it does than
| captcha where I expect "proof-of-human". A PoW based rate-
| limiter is a really interesting idea! Usually, the challenge
| with unauthenticated endpoints (ex. signups) is that the
| server has to do more work (db queries) than the client (make
| an http request) so it is really easy for the client to bring
| the server down. With PoW, we're essentially flipping that
| model where the client has to do more work than the server.
| Good work!
| AbacusAvenger wrote:
| About the benchmark data:
|
| It looks like your pow_sha256 library is using is the "sha2"
| crate, which is a pure Rust implementation of SHA2. So your
| benchmark is around the delta of your library compiled to
| native code vs. your library compiled to WASM, which is an
| interesting benchmark but I don't think it answers the right
| question.
|
| A more interesting benchmark would probably answer the
| question "what would those looking to defeat mCaptcha use and
| how does that performance compare?" So perhaps an
| implementation of an mCaptcha challenge solver using OpenSSL
| would be warranted for that.
| realaravinth wrote:
| That's a good idea, I'll be sure to do that!
| Retr0id wrote:
| You should compare against a GPU implementation of SHA256
| (hashcat has a good one).
|
| You may also want to consider ASICs - although you won't
| be able to build your own ASIC, you can look at the
| hashrates offered by bitcoin mining hardware, and
| extrapolate.
| AbacusAvenger wrote:
| Also remember that native code can use multithreading, so
| if your challenge is something that could be farmed out
| to multiple CPU threads until one finds the solution,
| that's another factor in favor of native code
| performance.
| paranoidrobot wrote:
| Lets just assume that you solve the "it takes a while to
| run" thing through some clever bits of hard-to-optimise
| math, that's difficult to parallelise or multithread or
| whatever.
|
| If all it takes is computer time, then that's a cheap
| thing for the botnet operator to solve. They can just
| spin up another instance, and split up the crawling task
| to another computer (or 200).
| Retr0id wrote:
| A captcha can _always_ be parallelised. All the attacker
| has to do is attempt n different captchas in parallel.
| realaravinth wrote:
| I haven't encountered a case where multithreading will
| make the algorithm weaker, but I do have a variation of
| the benchmark code(on disk, at the moment) that will spin
| up multiple worker threads to compute PoW.
| mort96 wrote:
| > mCaptcha is basically a rate-limiter.
|
| Hmm, is it a better rate limiter than others? I know that
| nginx, for example, makes it pretty easy to rate limit based
| on IP address with the `limit_req` and `limit_req_zone`
| directives.
|
| In essence, ngix's rate limiter also works by making each
| request consume a resource, but it makes the resource
| consumed an IP address (or range) rather than compute
| resources. It seems intuitive that a malicious actor would
| have an easier time scaling compute than IP addresses, while
| a legitimate user will _always_ have an IP address but might
| be on a machine with 1/100000th the compute resources of the
| malicious actor.
| nextaccountic wrote:
| You can and should use multiple kinds of rate limiters
| mort96 wrote:
| "Can" is true, and a good point. "Should" is a bit more
| dubious though; if IP-range-based rate limiting is
| enough, not wasting your users' battery with PoW-based
| rate limiting seems like a good thing. It seems like a
| potentially useful tool in your tool belt, which you
| should probably only deploy if IP-address-based rate
| limiting proves to be not enough.
| operator-name wrote:
| Can you elaborate on why you chose SHA256 as the hash
| function?
|
| Attackers aren't exactly limited to web apis, and SHA265 is
| known to be trivially parallelizable on a GPU. RandomX is one
| such example[0], which reminds me of a similar initiative
| called RPC-Pay.
|
| [0]: https://github.com/tevador/RandomX [1]:
| https://www.monerooutreach.org/stories/RPC-Pay.html
| jchw wrote:
| Yep, I have also done this and come to a similar conclusion.
| The best performance I got was with the WebCrypto API, where I
| got, IIRC, 100,000 SHA512 hashes a second, with a very
| optimized routine for iterating the hashcash string. SHA512
| probably makes the most sense since AFAIK it's the most
| intensive hash supported by WebCrypto, and the WebCrypto digest
| API is async, so a whole lot of time is going to be spent
| awaiting no matter what you do.
|
| I think WebCrypto is available in workers, so you could
| probably use numerous workers to get a better hashrate. Still,
| I don't suspect that would jump it past a million, which seems
| pretty bad for a desktop computer, and it would be a lot worse
| on a mobile phone.
|
| It might still be a meaningful impediment when combined with
| other measures, but with the low bar you'd have to set for
| preimage bits for mobile devices, it's a little questionable.
| progx wrote:
| This fights spammers, that use own server, but this will not
| protect from hostile taken computers, that can use their hash
| power to resolve the captcha.
| tmikaeld wrote:
| It does, why spend >4 seconds for a single spam when you can
| move to a new target and send >40 spams in the same time?
| fanf2 wrote:
| Proof of work proves not to work
| https://www.cl.cam.ac.uk/~rnc1/proofwork2.pdf (as an anti-spam
| tactic)
| redox99 wrote:
| Wouldn't a spammer almost always be more able and more willing to
| dedicate resources than an actual user?
|
| Let's say it takes a legitimate user 60 seconds of PoW on their
| phone. That's a dealbreaker. Now a spammer that takes them 15
| seconds on their server, they will still spam.
|
| I guess it could rate limit against high throughput of spam (more
| like DoS prevention than antispam really), but an IP rate limit
| would probably work as well if not better. An attacker with a
| large pool of IPs will probably have a large pool of CPUs anyway.
| Jenk wrote:
| I've used SHA-256 HashCash before, for PoW authentication on an
| API built to support a mobile app. The nice bit about using
| HashCash is increasing the difficulty for successive failed
| attempts.
|
| I implemented this in response to our API being hammered by
| exploratory bots/scripts sending malformed requests. By adding
| this, it dropped failed requests by 99%.
| zazaulola wrote:
| Inspired by your idea, I threw a simple static code, compatible
| with any web engine running on nodejs.
| https://gist.github.com/zazaulola/6742e41611b85fc48931a79bac...
| jqpabc123 wrote:
| Much, much simpler solution --- an actual time based rate
| limiter:
|
| 1) Include a hidden, unique token when the login screen is served
| to client.
|
| 2) On the client, enforce a minimum 3 sec delay from time of
| screen load before the login will be submitted with the hidden
| token included.
|
| 3) On the server, if the hidden token isn't returned or is
| unknown/not found or if the delay from the time of issue is less
| than 3 sec., then reject the login and optionally ban the IP
| address after too many failed attempts.
|
| The 3 sec delay is more than enough to discourage/prevent brute
| force attacks but not enough to annoy legitimate users.
| orheep wrote:
| Wouldn't this exhaust the server resources with storing all the
| tokens? Which you would need to do significantly longer than 3
| secs? Banning the IP may help but then you could just do a
| simple fail2ban instead?
| jqpabc123 wrote:
| _Wouldn't this exhaust the server resources with storing all
| the tokens?_
|
| Not really --- only 8 bytes per token and they can be
| discarded on successful login. Tokens older than X minutes or
| with more than X attempts can be discarded/rejected too.
|
| How many users are legitimately attempting to log in to your
| server at the same time?
|
| If you worried about this, encode the current time into the
| token using a hashing/encryption/checksum method of your
| choice. This way, everything needed to validate the attempt
| is submitted along with the credentials.
| nneonneo wrote:
| But this is totally useless - it doesn't consume any resources
| of the spammer! They can just launch a large number of requests
| (possibly from their army of bot machines to defeat IP
| limiting) and have them each wait 3s while doing other
| nefarious things in parallel.
|
| A PoW has at least the redeeming feature of consuming some
| resources for those 3s (or so...) making it self-limiting;
| there's only so many such computations a computer can do in
| parallel.
| jqpabc123 wrote:
| _...it doesn 't consume any resources of the spammer!_
|
| It consumes the most valuable resource in the world --- time.
| nneonneo wrote:
| But it doesn't though! A typical laptop can easily hold
| tens of thousands of I/O connections open at once in your
| favorite async I/O environment - that number can be in the
| millions with careful optimizations applied. Each
| connection just needs a sleep(3) applied between the
| initial form request and the submission.
|
| A 3 second form delay just means the difference between a
| spammer launching 1000000 requests and posting them
| immediately to your database, vs them launching 1000000
| requests and posting them to your database 3 seconds later.
| jqpabc123 wrote:
| Who are we kidding here --- most likely, your server
| can't handle 1000000 simultaneous requests.
|
| My servers don't have enough bandwidth for that. Most of
| the connections are going to get dropped one way or the
| other. In my case, they will be intentionally dropped as
| being a likely denial of service attack.
| figmaheart255 wrote:
| is this built on top of HashCash[1]?
|
| [1]: https://en.wikipedia.org/wiki/Hashcash
| stuaxo wrote:
| I've come to think PoW stands for "proof of waste", it's just
| wasting electricity.
| ranger_danger wrote:
| > AGPL
|
| Straight into the trash.
| cush wrote:
| This is a pro-bot rate-limiter. In reality, most humans use low-
| spec feature phones with slow CPUs. This punishes those humans
| and promotes bots running on GPUs and highjacked cloud infra.
| ranger_danger wrote:
| Why not just start out rate-limiting everyone equally with a
| timer or something?
| web007 wrote:
| Looks like a decent implementation of HashCash - great! They
| should implement TOTP-based nonce generation to avoid the need
| for a database entirely, that would make the system much more
| lightweight to deploy.
|
| Some people miss the point of this. If you and another person are
| running away from a cheetah (fastest land animal) then you don't
| have to outrun the cheetah to survive, just the other person. The
| same is true for sites getting away from spammers. They will go
| for sites that are easy to attack before yours, so as long as the
| effort outweighs the outcome you're safe. It doesn't matter if
| the solution isn't perfect, it's better than nothing, and if it's
| not widely adopted it's not worth the effort to bother cracking.
| blackoil wrote:
| If it became popular enough, people with weakest machines and
| mobiles will suffer, while a spam bot can ruin optimized code or
| ASIC.
| [deleted]
| [deleted]
| realaravinth wrote:
| Very good point!
|
| Accessibility is a critical to mCaptcha. In fact, Codeberg is
| trying out mCaptcha purely because of its more accessible[0].
| That said, it is possible to choose a difficulty factor very
| high to deny access to folks with older, slower devices. A
| survey to benchmark mCaptcha performance on devices in the wild
| is WIP[1]. I hope it will provide insights to help webmasters
| integrating mCaptcha to select difficulty factors that work for
| their visitors.
|
| [0]:
| https://codeberg.org/Codeberg/Community/issues/479#issuecomm...
|
| [1]: https://github.com/mCaptcha/survey
| arsome wrote:
| The problem is that mCaptcha allows you to deny access to
| older, slower devices, but does nothing to deny access to
| bots. Which can likely run optimized implementations of the
| PoW which run hundreds of times faster than the web based
| version.
| tuetuopay wrote:
| Oh god no please no. The web is already heavy enough to browse.
|
| Sincerely,
|
| A user of a 7 year old smartphone.
| paulnpace wrote:
| It took less time for me calculate the hash (~2 seconds) than
| to do a stupid puzzle game.
|
| Sincerely,
|
| A user of a 8 year old smartphone.
| rogers18445 wrote:
| Would only make sense to do if the browser exposes an optimized
| argon2 API to web apps. And it would have to be argon2 or some
| other memory hard hash for it to not be a joke.
| supernes wrote:
| The goal of CAPTCHA is to tell computers and humans apart. I
| appreciate the effort towards better UX, but there are already
| "invisible" CAPTCHAs like Botpoison that discriminate better than
| this. PoW solutions are just more energy-intensive rate limits.
| xnorswap wrote:
| Yes, this feels like an ANTI-CAPTCHA. It's a task that
| computers can perform dramatically better than humans.
| realaravinth wrote:
| > I appreciate the effort towards better UX, but there are
| already "invisible" CAPTCHAs like Botpoison that discriminate
| better than this.
|
| Interesting project, thank you for sharing! From Botpoison's
| website[0] under FAQ:
|
| > Botpoison combines: > - Hashcash , a cryptographic hash-based
| proof-of-work algorithm. > - IP reputation checks, cross-
| referencing proprietary and 3rd party data sets. > - IP rate-
| limits. > - Session and request analysis.
|
| Seems like it is PoW + IP rate-limits. IP rate-limits. though
| very effective at immediately identifying spam, it hurts folks
| using Tor and those behind CG-NAT[1].
|
| And as for invisibility, CAPTCHA solves in mCaptcha have a
| lifetime, beyond which they are invalid. So generating PoW when
| the checkbox is ticked gives optimum results. But should the
| webmaster choose to hide, the widget, they can always choose to
| hook the widget to a form submit event.
|
| [0]: https://botpoison.com/ [1]:
| https://en.wikipedia.org/wiki/Carrier-grade_NAT
|
| full disclosure: I'm the author of mCaptcha
| dspillett wrote:
| _> and those behind CG-NAT_
|
| I'm in two minds about how I feel inconveniencing those
| behind CG-NAT. I don't want to punish the innocent, but the
| ISPs aren't going to move towards better solutions (IPv6)
| without a push from their paying customers, and they'll never
| get that push with sufficient strength if we work tirelessly
| to make the problem affect us and not affect those
| subscribers.
| 2000UltraDeluxe wrote:
| In the real world, on the other hand, customers will simply
| conclude that one site doesn't work while others do, and
| that the problem must be with the one site that doesn't
| work.
| supernes wrote:
| Thinking about it a bit more, systems like mCaptcha and
| Botpoison aren't really CAPTCHA in the strict sense - they
| solve a somewhat different problem than telling if there's a
| human at the other end, and IMO that's an important
| distinction to make (and doesn't necessarily make them
| inferior to other solutions.)
|
| I still think PoW alone is not enough as it can be automated,
| albeit at a slower rate. Most of the time I worry more about
| low-volume automated submissions than high-frequency garbage.
| The real value is in the combination of factors, especially
| what BP call the "session and request analysis" and other
| fingerprinting solutions.
| ivanhoe wrote:
| strictly speaking it's a rate limiter, not captcha... but
| frankly it's probably closer to what most ppl use captchas
| for these days...
| realaravinth wrote:
| > Thinking about it a bit more, systems like mCaptcha and
| Botpoison aren't really CAPTCHA in the strict sense
|
| Very true! I chose to use "captcha" because it's easier to
| convey what it does than, say, calling it a PoW-powered
| rate-limter.
|
| > The real value is in the combination of factors,
| especially what BP call the "session and request analysis"
| and other fingerprinting solutions.
|
| Also true. I'm not sure if it is possible to implement
| fingerprinting without tracking activity across the
| internet --- something that a privacy-focused software
| can't do.
|
| I have been investigating privacy-focused, hash-based spam
| detection that uses peer reputation[0] but the hash-based
| mechanism can be broken with a slight modification to the
| spam text.
|
| I would love to implement spam detection but it shouldn't
| compromise the visitor's privacy :)
|
| [0]: please see "kavasam" under "Projects that I'm
| currently working on". I should set up a website for the
| project soon. https://batsense.net/about
|
| Disclosure: author of mCaptcha.
| supernes wrote:
| Client-local fingerprinting is not inherently evil, just
| when you combine it with additional signals and decide to
| use it in violation of users' privacy. AFAIK it's the
| most reliable way to distinguish unique visitor agents,
| and under that use case it's far more respectful of
| personal information than an IP address.
| [deleted]
| jarrell_mark wrote:
| AGPL seems problematic here. If a company uses this as their
| captcha system does it force them to open source their entire
| code base?
| bradhe wrote:
| Sign a token, verify the signature, provide a nonce used in
| subsequent requests.
|
| Did I miss something or is that description not sufficiently
| crypto-y enough?
| beardyw wrote:
| It's a good idea. Hopefully it doesn't end in huge server farms
| wasting energy just for spam!
| RektBoy wrote:
| It will, or zombies.
| paulnpace wrote:
| Those farms will spend the energy anyway. It's a question of
| what they spend the energy on.
| password4321 wrote:
| Or use one of the Monero miners typically injected by malicious
| ads!
| nneonneo wrote:
| This isn't a CAPTCHA. A CAPTCHA should be something that can tell
| a human user apart from a non-human user with a high degree of
| accuracy, such that it should be hard for a bot to pass even a
| single instance.
|
| As noted elsewhere in the thread, this is a rate-limiting system.
| It cannot protect low-rate critical resources like registrations,
| but would be useful in mitigating large-scale spam attacks.
| csense wrote:
| According to Wikipedia, CAPTCHA stands for Completely Automated
| Public Turing test to tell Computers and Humans Apart. This
| code shouldn't be called CAPTCHA as it makes no attempt to tell
| humans from machines. (If anything, SHA256 proof-of-work is a
| problem that's easy for a computer but hard for a human, the
| _opposite_ of a Turing test.) I 'd call it "Proof-Of-Work
| Client-side RAte Protection" -- it's a POWCRAP, not a CAPTCHA.
|
| (I invented the acronym just now in this post; it could perhaps
| use some work.)
| waynesonfire wrote:
| CRAP is sufficient
| bArray wrote:
| From what I understand, this would allow somebody with reasonable
| resources to easily crack captchas with a high-end GPU. You could
| increase the complexity of the PoW, but ultimately you then just
| end up in a resource race.
|
| Ideally you want an attacker to spend as long as possible using
| up computation resources, whilst your normal clients spend on
| average little. I believe you would want a multi-stage approach
| that involves some human interaction, which is statistically
| failed more often by computers, and keeps them busy for a long
| time doing pointless math.
|
| The reason it would have to be multi-staged is that you don't
| want the attacker doing the computing to realize they are solving
| a pointless list of problems with no hope of getting a token. A
| normal user might on average do a -> b -> c, whereas you would
| make an attacking machine do a -> b -> c -> d -> e -> f, and the
| not get a token for example. (It would have to be statistically
| setup as not to encourage the attacker to bail early.)
|
| I think I would make it a slightly RAM heavy problem that limits
| how many of these problems could be solved in parallel. You would
| obviously setup the problem in such a way that networking is
| minimal.
|
| For example, you could send a variation of a Langton's Ant map
| pre-seeded with random patches for a significantly large map, and
| then tell the client to run for X time steps, then give you the
| result in some specific area.
| operator-name wrote:
| The term for describing is memory hard functions. RandomX[0] is
| one such example where GPU parallelism does not net them a
| large advantage over CPUs.
|
| [0]: https://github.com/tevador/RandomX
| bArray wrote:
| Thinking about it, this could be the way forwards. Memory
| offers several natural bottlenecks:
|
| 1. Memory size - Memory is somewhat costly (even now), with
| most entry laptops being stuck in the range of 8GB.
|
| 2. Access bandwidth - Getting a CPU to communicate with the
| RAM takes some time, improvements are incremental and are
| fundamentally limited.
|
| 3. Thread access - Threads compete for bandwidth, as long as
| cache hits are low.
| quickthrower2 wrote:
| Has to be easy enough for an old laptop to prove in under 5
| seconds but hard enough that (if this becomes popular) someone
| with a mining rig can't be cracking millions of these a day.
|
| Also the idea ain't so new!: http://www.hashcash.org/hashcash.pdf
| BiteCode_dev wrote:
| I'm going to implement a captcha that requires proof of life.
|
| You need to send a severed finger by mail to pass it.
___________________________________________________________________
(page generated 2022-08-04 23:02 UTC)