[HN Gopher] mCaptcha: Open-source proof-of-work captcha for webs...
       ___________________________________________________________________
        
       mCaptcha: Open-source proof-of-work captcha for websites
        
       Author : notpushkin
       Score  : 61 points
       Date   : 2023-08-08 19:59 UTC (3 hours ago)
        
 (HTM) web link (mcaptcha.org)
 (TXT) w3m dump (mcaptcha.org)
        
       | tromp wrote:
       | SHA256-based Hashcash seems like a poor choice of PoW for a
       | captcha that's supposed to incur a nontrivial cost for spammers.
       | They can simply employ a SHA256 ASIC to crack the captcha at
       | practically no cost.
        
         | lucb1e wrote:
         | I agree, but you literally have no other option:
         | https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypt...
         | 
         | Web cryptography is stuck in the 1990s. PBKDF2 is the only
         | available algorithm, which gives an attacker's GPUs a big
         | advantage over honest users, let alone ASICs.
         | 
         | Maybe a webassembly-based solution, implementing something like
         | Bcrypt or Scrypt/Argon2, is comparable to a browser-native
         | implementation, but that would have to be verified before
         | taking my word for it. These algorithms provide varying amounts
         | of memory-hardness (Bcrypt only 4KB but even that proved
         | surprisingly effective), which causes contention on the GPU
         | memory bus (they're only a bit faster than the CPU's memory
         | bus, making the GPU have only a small advantage, on the order
         | of 5x instead of 100x) and causes larger ASIC die sizes (which
         | the Argon2 paper argues is what causes cost for the attacker).
         | 
         | Source for the latter: https://github.com/P-H-C/phc-winner-
         | argon2/blob/16d3df698db2... section 2.1
         | 
         | > We aim to maximize the cost of password cracking on ASICs.
         | There can be different approaches to measure this cost, but we
         | turn to one of the most popular - the time-area product [4,
         | 16]. [...]
         | 
         | > - The 50-nm DRAM implementation [10] takes 550 mm2 per GByte;
         | 
         | > - The Blake2b implementation in the 65-nm process should take
         | about 0.1 mm2 (using Blake-512 implementation in [11]);
         | 
         | I understand from this that adding more memory on the ASIC/chip
         | is more expensive than adding more computing power.
        
           | notpushkin wrote:
           | Ohhh, actualy mCaptcha seems to be using WASM already:
           | https://mcaptcha.org/docs/api/browser
           | 
           | So I think it should be possible to add another algo?
        
       | notpushkin wrote:
       | Related:
       | 
       |  _mCaptcha - Proof of work based, privacy respecting CAPTCHA
       | system_ - https://news.ycombinator.com/item?id=32340305 - Aug
       | 2022 (96 comments)
       | 
       |  _MCaptcha: FOSS privacy focused captcha system using proof-of-
       | work_ - https://news.ycombinator.com/item?id=32340590 - Aug 2022
       | (5 comments)
        
       | remram wrote:
       | I don't understand the premise. The point of a CAPTCHA is to tell
       | Computers and Humans Apart, that's what the CHA stands for. You
       | cannot hope to do this test using a proof-of-work system where
       | the work is computer work.
       | 
       | Call this a client rate-limiter, or whatever else, but it is
       | _obviously not_ a CAPTCHA and cannot function in this way.
       | 
       | Another obvious problem is that server hardware is vastly more
       | powerful than the average user's device. If you set your
       | challenge to an amount of work that doesn't meaningfully drive
       | users away and/or drain their batteries, you are allowing a
       | malicious server to pass your challenge tens of thousands of
       | times an hour.
        
         | tracker1 wrote:
         | Is server hardware vastly more powerful? If you use a hashing
         | algorithm that isn't easily parallel, then you're dedicating a
         | single CPU core for that exercise. Now a server may have more
         | cores, but they are often slower per-core than a client
         | machine. And dedicating server resources has a cost. You'd slow
         | a brute force attack to a relative crawl, especially if the
         | target has a large volume of pre-defined work and answers.
         | 
         | PBKDF2, as an example on 100k iterations can easily pin a CPU
         | core for a few seconds. This is part of why I always have my
         | authentication services separate from my applications, it
         | reduces the DDoS vector. Now, you can shift work to the client
         | as kind of an inverse-ddos rate limiter.
         | 
         | Combine that with a websocket connection, where the browser is
         | sending user events like mouse movement, touch, scroll,
         | focus/blur and input/paste... the two, combined with event
         | timing analysis can give you a pretty good guess if something
         | is a real user. And if it isn't, definitely slowing down bots.
        
       | voytec wrote:
       | > Try mCaptcha without joining
       | 
       | > user: aaronsw password: password
       | 
       | mixed feelings about this.
        
       | thih9 wrote:
       | Note, there are credentials for a test account, to try it without
       | signing up.
       | 
       | This is listed on the sign in page [1], just not very visible.
       | 
       | > user: aaronsw password: password
       | 
       | [1]: https://demo.mcaptcha.org/login
        
         | smusamashah wrote:
         | It says 'Account not found'
        
         | zb3 wrote:
         | "Account not found" :<
        
       | abetusk wrote:
       | Note that this was the original intent of proof of work (or very
       | near it) [0].
       | 
       | Should you want to visit a site that has proof of work as a
       | requirement but allows it to be done
       | offline/deferred/transferred, then you've essentially re-invented
       | some of the major aspects of cryptocurrency.
       | 
       | [0] https://en.wikipedia.org/wiki/Proof_of_work#cite_note-
       | DwoNao...
        
       | RcouF1uZ4gsC wrote:
       | Proof of work = waste electricity.
       | 
       | Basically, we are incentivizing people to waste electricity. The
       | proof of work is basically proving you wasted electricity doing
       | something useless.
       | 
       | While I appreciate the goals behind this, I think proof of work
       | is unethical in our current energy situation.
        
         | fruitreunion1 wrote:
         | When the alternative is sacrificing privacy or anonymity, I
         | think it's at least useful, even if not ideal.
        
         | codetrotter wrote:
         | > proof of work is unethical in our current energy situation
         | 
         | Stop generating electricity using coal etc.
         | 
         | Generate electricity only from:
         | 
         | - Solar
         | 
         | - Wind
         | 
         | - Hydro
         | 
         | - Nuclear
         | 
         | And other non-fossil sources.
         | 
         | Using fossil sources for electricity is unethical, full stop.
         | Doesn't matter if you are using the electricity for PoW, or for
         | baking cookies or for feeding kittens, or what-have-you.
         | 
         | Fossil energy is the problem. Not PoW.
        
         | pikrzyszto wrote:
         | Other captchas also waste (your and captchas' provider)
         | electricity. For example reCaptcha requires tons of resources
         | to track your moves to ensuring you're "not a robot". Sure, the
         | data is also used to serve you ads but resources are still
         | wasted.
        
         | manmal wrote:
         | Do you think this needs more electricity than the whole
         | recaptcha cloud? I seriously doubt it. My laptop would waste
         | something like 0.006 watt hours per proof (0.5 seconds at 40
         | watt). Also, per the default setting, the proof complexity is
         | lowered to almost zero when the server is at normal load.
        
       | f_devd wrote:
       | Funny coincidence I made the similar thing into a webcomponent a
       | few weeks ago for a different purpose: click-to-reveal buttons to
       | prevent scraping of public emails in static websites. It works by
       | encrypting the content using the same deriveKey method varying
       | the iterations to determine time 'cost'.
       | 
       | Imo it's not really fit for most captchas situations since you
       | can easily parallelize the execution by simply running multiple
       | tabs/browsers or even hooking crypto/subtle up to an GPU/ASIC
       | with a bit of hackery.
        
       | fanf2 wrote:
       | Proof of work proves not to work (2004)
       | https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf
        
         | xtracto wrote:
         | Pdf download warning.
        
           | jeroenhd wrote:
           | I respect your dislike of direct download links, but what
           | browsers still download PDFs these days?
        
             | notpushkin wrote:
             | Firefox on Android, for example. I guess they just didn't
             | want to add mobile UI to pdf.js?
        
             | xtracto wrote:
             | Im on brave mobile and it downloaded the pdf automatically
             | when I clicked the link.
        
         | lucb1e wrote:
         | TL;DR /
         | TooAnnoyingPdf;Didn'tDownloadAndTryToReadOnAPhoneScreen:
         | 
         | > We analyse [anti-email-spam PoW] both from an economic
         | perspective, "how can we stop it being cost-effective to send
         | spam", and from a security perspective, "spammers can access
         | insecure end-user machines and will steal processing cycles to
         | solve puzzles". Both analyses lead to similar values of puzzle
         | difficulty. Unfortunately, real-world data from a large ISP
         | shows that these difficulty levels would mean that significant
         | numbers of senders of legitimate email would be unable to
         | continue their current levels of activity.
         | 
         | So it wouldn't work for mass senders, I think this means in the
         | abstract? Reading into the details, page 6 says:
         | 
         | > We examined logging data from the large UK ISP for those
         | customers who use the outbound "smarthost" (about 50,000 on
         | this particular weekday).
         | 
         | Not sure I agree with the conclusion if this is their premise.
         | This smarthost (an SMTP server sitting on the edge doing
         | $magic, preventing client PCs from directly sending email to
         | any which internet destination) is handling a ton of emails for
         | free. Why should _it_ solve the PoW? The residential client
         | that is really trying to send the email is the one that wants
         | to send the email and should attach the PoW already before
         | sending it on to a relaying server.
         | 
         | I do agree it is probably undesirable to require that honest
         | senders outcompete attackers on CPU power (=electricity =CO2,
         | at least in the immediate future) to get any email delivered
        
       | dimmke wrote:
       | I recently conducted an experiment - I removed client side
       | CAPTCHAs from a form that had reCAPTCHA V2 and a ton of spam was
       | getting through and instead sent the content to Akismet for
       | scanning. It cut the spam getting through to 0.
       | 
       | It made me think, are client side CAPTCHAs really worth it? They
       | add so much friction (and page weight - reCAPTCHA v3 adds several
       | hundreds KBs) to the experience (especially when you have to
       | solve puzzles or identify objects) and are gamed heavily. I know
       | these get used for more than form submissions, to stop bot sign
       | ups etc...
       | 
       | I feel like it'd be just as/more effective to use other
       | heuristics on the backend: IP Address, blacklisting certain email
       | domains, requiring email validation or phone validation, scanning
       | logs, analyzing content submitted through forms
        
         | base wrote:
         | Akismet is a paid service and their apis are tailored for
         | comments. An advantage with comments is that you can just mark
         | as spam if some contents have dubious links or keywords.
         | 
         | An issue you have in many forms (e.g.: login form) is that
         | there is limited data to decide if it's a real user or a bot.
        
           | dimmke wrote:
           | Agreed on all points. That's why I said in my original
           | comment: "I know these get used for more than [contact] form
           | submissions, to stop bot sign ups etc..."
           | 
           | I picked up Akismet because it's been around forever, and
           | while it is paid, it is very cheap for my use case.
           | 
           | This is a bit of an aside, but I feel like Automattic is
           | sitting on several companies/products and not doing a whole
           | lot with them.
           | 
           | Akismet could be expanded into a more fully featured server
           | side spam detection SAAS with a flexible API etc...
           | 
           | Gravatar could be expanded into something like OpenID.
           | 
           | Just seems like a waste to me.
        
         | hackermatic wrote:
         | The answer is something like "yes, and..." because reCAPTCHA
         | already decides whether and how to challenge the user based on
         | its own internal risk score.
        
           | dimmke wrote:
           | But if a server side only solution seems to work fine, why
           | add a client side element?
           | 
           | I ran into this because I was doing some freelance work on a
           | website that had worked its ass off to cut loading size as
           | small as possible. For reCaptcha to develop its risk score,
           | you have to load it far in advance of a form - you can't lazy
           | load it, they specifically say not to:
           | https://developers.google.com/recaptcha/docs/loading.
           | 
           | It also spawned its own service worker on top of adding like
           | 300kb of page weight. The API documentation is garbage, you
           | have to fuck around with Google Cloud to get API keys now too
           | which is confusing. It also pollutes the global scope of the
           | page. It's all around terrible to work with.
        
         | kccqzy wrote:
         | Then you'd just give visitors of your websites no recourse and
         | no information whatsoever on how to fix the problem. The
         | benefit of client-side CAPTCHA is that humans at least can pass
         | it and fix the problem even if something they don't control
         | (such as their IP address having bad reputation due to shitty
         | ISP) is causing problems.
         | 
         | As a website operator it's easy to look at the spam that is
         | getting through and be happy that's it's zero. But do you get
         | any idea how many actual humans that you have incorrectly
         | rejected? You don't have that data and it's really easy to
         | screw up there.
         | 
         | Of course if your website is small nobody cares. If you are
         | bigger like Stripe you simply get bad publicity on HN. People
         | on HN love to hate on mysterious bans and blocks just because
         | they do something slightly unusual and your backend-only
         | analysis flags them as suspicious.
         | 
         | Abuse fighting is hard.
        
           | dimmke wrote:
           | >Then you'd just give visitors of your websites no recourse
           | and no information whatsoever on how to fix the problem.
           | 
           | This is a weird assumption. What's preventing a backend
           | system from saying "Hey, we think you're a bot. Here's an
           | alternative way to contact us."
           | 
           | You obviously don't want to give away enough to help bot
           | developers get through your system, but that's not the same
           | as no resource and no information.
           | 
           | >But do you get any idea how many actual humans that you have
           | incorrectly rejected?
           | 
           | Yes - like I said in my other comment, this new system
           | actually logs all submissions. It just puts the ones it
           | identifies as spam into a separate folder. Akismet also has
           | the ability to mark things as false positives or false
           | negatives.
           | 
           | I think that automated form submissions are very context
           | specific. So, the example I wrote about is for a marketing
           | site, and it's a business that primarily targets other
           | businesses. Most of the spam it gets is for scummy SAAS
           | software, SEO optimization, etc...
           | 
           | But my personal website has a very simple subscribe by email
           | form. There were definitely a few spam submissions - someone
           | just blasting out an email address and signing it up to
           | whatever form would accept it. When I implemented double opt
           | in - gone entirely.
           | 
           | My larger point was that as an industry, we seem to have just
           | capitulated to client side CAPTCHAs. And it sucks. It's one
           | of the many shitty things about the modern web. But I think
           | it's become just an assumption that it's needed, and we
           | haven't reexamined that assumption in a while.
           | 
           | I think it'd almost be better for there to be something could
           | spin up in a container that has a base machine learning
           | model, but can "learn" as you manually indicate messages
           | etc... and then you can also choose a threshold based off
           | your comfort level.
        
             | kccqzy wrote:
             | > This is a weird assumption. What's preventing a backend
             | system from saying "Hey, we think you're a bot. Here's an
             | alternative way to contact us."
             | 
             | Not a weird assumption, but a necessary assumption based on
             | considerations of scale.
             | 
             | A small-scale website that doesn't receive too much spam
             | attempt can manually classify spam by human agents. A
             | medium-scale website can have CAPTCHA to let through some
             | visitors and the rest goes to human verification. You
             | appear to be in this bucket. When the scale is huge, _no
             | other alternative way_ to contact exists. CAPTCHA becomes
             | your only tool.
             | 
             | In other words, CAPTCHA is only necessary because of scale;
             | what do you think the first A stands for? But because of
             | scale, alternate ways stop working.
        
               | dimmke wrote:
               | >When the scale is huge, no other alternative way to
               | contact exists
               | 
               | 1. This still doesn't preclude giving a blocked user
               | recourse or information. Like how a streaming website
               | will say "Hey, you're using a VPN. We don't allow that" -
               | the user's recourse is to turn off the VPN, or find a new
               | VPN that their service won't detect.
               | 
               | 2. The case you're outlining is different from the
               | scenario that most users are presented with a CAPTCHA. I
               | encounter it when I am using a VPN and Googling something
               | with Incognito mode. That means Google has already
               | applied some heuristics and thinks that chances are
               | higher than normal that I'm a bot (not logged in, no
               | cookies allowed, masking IP address) before presenting
               | the challenge. In those cases, you're probably correct
               | that presenting a CAPTCHA is a reasonable option. I just
               | think it's weird to have CAPTCHA be the default/first
               | line in many cases. Especially with the focus on things
               | like converting users.
        
             | digging wrote:
             | > What's preventing a backend system from saying "Hey, we
             | think you're a bot. Here's an alternative way to contact
             | us."
             | 
             | In what way? If I got flagged and had to take additional
             | steps to remedy a form submission I would probably just
             | never go back to the site. The only way this could work is
             | by identifying the issue in real-time and then sending a
             | CAPTCHA to be completed by the user client-side while
             | they're still handling the form.
        
           | jsnell wrote:
           | And just to be clear, it doesn't need to be either captchas
           | or doing heuristic abuse detection on the backend. In the
           | ideal case you're making a decision in the backend using all
           | these heuristics and signals, but the outcome is not binary.
           | Instead the outcome of the abuse detection logic is to choose
           | from a range of options like blocking the request entirely,
           | allowing it through, using a captcha, or if you're
           | sophisticated enough doing other kinds of challenges.
           | 
           | But proof of work has basically no value as am abuse
           | challenge even in this kind of setup, the economics just
           | can't work.
        
             | [deleted]
        
         | nikvaes wrote:
         | Did you also look at the false positives, e.g., how many non-
         | spam content was filtered by Akismet?
        
           | dimmke wrote:
           | Of course. It would be bonkers not to. It just doesn't send a
           | notification if the submission is flagged as spam and puts it
           | in a separate folder. So I have the ability to look at every
           | submission.
           | 
           | I put the system into effect on August 1st. There have not
           | been any false positives. There was even a submission to the
           | form that was clearly a B2B sales pitch, but because it was
           | an actual person submitting the form and not an automated
           | system it went into the "real" entries list (I think this is
           | reasonable. Any business is going to have to field B2B sales
           | solicitations)
           | 
           | I put together a few rows in a spreadsheet of legitimate
           | submissions (with info blocked out):
           | https://imgur.com/a/stxja1Z
           | 
           | Here's an example of one flagged as spam by Akismet that was
           | submitted about an hour ago: https://imgur.com/a/PmN3t80
           | 
           | Overall, removing reCAPTCHA has increased the total amount of
           | submissions to the form, but the amount of submissions
           | actually being seen by a real person who then has to waste
           | time reading it, identifying that it's spam and discarding it
           | has dropped to 0.
        
       | emurlin wrote:
       | Exploring Proof of Work (PoW) as a substitute for CAPTCHAs is an
       | interesting idea (PoW was originally conceived as a spam
       | deterrent, after all), and one that I have considered (and use)
       | in some web properties I manage. Not only does it obviate
       | 'trusted' third parties, but it also has the potential to reduce
       | the risk of accessibility issues often associated with
       | traditional CAPTCHA. It also seems like a solution that scales
       | nicely, as each 'proof' is made by the client and verification is
       | cheap, _and_ like a solution that finally ends the arms race
       | against malicious traffic by bypassing the need to  'prove
       | humanity'.
       | 
       | However, it's one of those solutions that look good on paper, but
       | upon close inspection break down entirely or come with rather
       | substantial tradeoffs. Ignore the environmental discussion about
       | energy consumption for a moment, and let's face the reality that
       | computational power is ridiculously inexpensive.
       | 
       | As a thought exercise, imagine you're trying to use PoW to ward
       | off spammers (or the attack du jour), and you decide that a
       | 1-cent expenditure on computation would be a sufficient
       | deterrent. Let's say that renting a server costs $100/month (a
       | bit on the higher end), or 0.004 cents per second.
       | 
       | So, if you wanted a PoW system that would cost the spammer 1
       | cent, you'd need to come up with a computational task that takes
       | about 250 seconds, or over 4 minutes, to solve. That kind of
       | latency just isn't practical in real-world applications. And that
       | ignores that 1 cent is probably a ridiculously low price for
       | protecting anything valuable.
       | 
       | Of course, you may consider this as an alternative to regular
       | CAPTCHA services. A quick search gives me that this costs
       | something like $3 for 1000 CAPTCHAs solved, or 0.3 cents per
       | CAPTCHA. This changes the above calculation to about 1 minute of
       | compute, which still seems rather unacceptable considering that
       | you might, e.g., drain your users' battery.
       | 
       | So, overall, while I'd like for something like this to work, it
       | probably only acts as a deterrent against attackers not running a
       | full browser and who also aren't targeting you in particular.
        
       ___________________________________________________________________
       (page generated 2023-08-08 23:00 UTC)