[HN Gopher] mCaptcha: Open-source proof-of-work captcha for webs...
___________________________________________________________________
mCaptcha: Open-source proof-of-work captcha for websites
Author : notpushkin
Score : 61 points
Date : 2023-08-08 19:59 UTC (3 hours ago)
(HTM) web link (mcaptcha.org)
(TXT) w3m dump (mcaptcha.org)
| tromp wrote:
| SHA256-based Hashcash seems like a poor choice of PoW for a
| captcha that's supposed to incur a nontrivial cost for spammers.
| They can simply employ a SHA256 ASIC to crack the captcha at
| practically no cost.
| lucb1e wrote:
| I agree, but you literally have no other option:
| https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypt...
|
| Web cryptography is stuck in the 1990s. PBKDF2 is the only
| available algorithm, which gives an attacker's GPUs a big
| advantage over honest users, let alone ASICs.
|
| Maybe a webassembly-based solution, implementing something like
| Bcrypt or Scrypt/Argon2, is comparable to a browser-native
| implementation, but that would have to be verified before
| taking my word for it. These algorithms provide varying amounts
| of memory-hardness (Bcrypt only 4KB but even that proved
| surprisingly effective), which causes contention on the GPU
| memory bus (they're only a bit faster than the CPU's memory
| bus, making the GPU have only a small advantage, on the order
| of 5x instead of 100x) and causes larger ASIC die sizes (which
| the Argon2 paper argues is what causes cost for the attacker).
|
| Source for the latter: https://github.com/P-H-C/phc-winner-
| argon2/blob/16d3df698db2... section 2.1
|
| > We aim to maximize the cost of password cracking on ASICs.
| There can be different approaches to measure this cost, but we
| turn to one of the most popular - the time-area product [4,
| 16]. [...]
|
| > - The 50-nm DRAM implementation [10] takes 550 mm2 per GByte;
|
| > - The Blake2b implementation in the 65-nm process should take
| about 0.1 mm2 (using Blake-512 implementation in [11]);
|
| I understand from this that adding more memory on the ASIC/chip
| is more expensive than adding more computing power.
| notpushkin wrote:
| Ohhh, actualy mCaptcha seems to be using WASM already:
| https://mcaptcha.org/docs/api/browser
|
| So I think it should be possible to add another algo?
| notpushkin wrote:
| Related:
|
| _mCaptcha - Proof of work based, privacy respecting CAPTCHA
| system_ - https://news.ycombinator.com/item?id=32340305 - Aug
| 2022 (96 comments)
|
| _MCaptcha: FOSS privacy focused captcha system using proof-of-
| work_ - https://news.ycombinator.com/item?id=32340590 - Aug 2022
| (5 comments)
| remram wrote:
| I don't understand the premise. The point of a CAPTCHA is to tell
| Computers and Humans Apart, that's what the CHA stands for. You
| cannot hope to do this test using a proof-of-work system where
| the work is computer work.
|
| Call this a client rate-limiter, or whatever else, but it is
| _obviously not_ a CAPTCHA and cannot function in this way.
|
| Another obvious problem is that server hardware is vastly more
| powerful than the average user's device. If you set your
| challenge to an amount of work that doesn't meaningfully drive
| users away and/or drain their batteries, you are allowing a
| malicious server to pass your challenge tens of thousands of
| times an hour.
| tracker1 wrote:
| Is server hardware vastly more powerful? If you use a hashing
| algorithm that isn't easily parallel, then you're dedicating a
| single CPU core for that exercise. Now a server may have more
| cores, but they are often slower per-core than a client
| machine. And dedicating server resources has a cost. You'd slow
| a brute force attack to a relative crawl, especially if the
| target has a large volume of pre-defined work and answers.
|
| PBKDF2, as an example on 100k iterations can easily pin a CPU
| core for a few seconds. This is part of why I always have my
| authentication services separate from my applications, it
| reduces the DDoS vector. Now, you can shift work to the client
| as kind of an inverse-ddos rate limiter.
|
| Combine that with a websocket connection, where the browser is
| sending user events like mouse movement, touch, scroll,
| focus/blur and input/paste... the two, combined with event
| timing analysis can give you a pretty good guess if something
| is a real user. And if it isn't, definitely slowing down bots.
| voytec wrote:
| > Try mCaptcha without joining
|
| > user: aaronsw password: password
|
| mixed feelings about this.
| thih9 wrote:
| Note, there are credentials for a test account, to try it without
| signing up.
|
| This is listed on the sign in page [1], just not very visible.
|
| > user: aaronsw password: password
|
| [1]: https://demo.mcaptcha.org/login
| smusamashah wrote:
| It says 'Account not found'
| zb3 wrote:
| "Account not found" :<
| abetusk wrote:
| Note that this was the original intent of proof of work (or very
| near it) [0].
|
| Should you want to visit a site that has proof of work as a
| requirement but allows it to be done
| offline/deferred/transferred, then you've essentially re-invented
| some of the major aspects of cryptocurrency.
|
| [0] https://en.wikipedia.org/wiki/Proof_of_work#cite_note-
| DwoNao...
| RcouF1uZ4gsC wrote:
| Proof of work = waste electricity.
|
| Basically, we are incentivizing people to waste electricity. The
| proof of work is basically proving you wasted electricity doing
| something useless.
|
| While I appreciate the goals behind this, I think proof of work
| is unethical in our current energy situation.
| fruitreunion1 wrote:
| When the alternative is sacrificing privacy or anonymity, I
| think it's at least useful, even if not ideal.
| codetrotter wrote:
| > proof of work is unethical in our current energy situation
|
| Stop generating electricity using coal etc.
|
| Generate electricity only from:
|
| - Solar
|
| - Wind
|
| - Hydro
|
| - Nuclear
|
| And other non-fossil sources.
|
| Using fossil sources for electricity is unethical, full stop.
| Doesn't matter if you are using the electricity for PoW, or for
| baking cookies or for feeding kittens, or what-have-you.
|
| Fossil energy is the problem. Not PoW.
| pikrzyszto wrote:
| Other captchas also waste (your and captchas' provider)
| electricity. For example reCaptcha requires tons of resources
| to track your moves to ensuring you're "not a robot". Sure, the
| data is also used to serve you ads but resources are still
| wasted.
| manmal wrote:
| Do you think this needs more electricity than the whole
| recaptcha cloud? I seriously doubt it. My laptop would waste
| something like 0.006 watt hours per proof (0.5 seconds at 40
| watt). Also, per the default setting, the proof complexity is
| lowered to almost zero when the server is at normal load.
| f_devd wrote:
| Funny coincidence I made the similar thing into a webcomponent a
| few weeks ago for a different purpose: click-to-reveal buttons to
| prevent scraping of public emails in static websites. It works by
| encrypting the content using the same deriveKey method varying
| the iterations to determine time 'cost'.
|
| Imo it's not really fit for most captchas situations since you
| can easily parallelize the execution by simply running multiple
| tabs/browsers or even hooking crypto/subtle up to an GPU/ASIC
| with a bit of hackery.
| fanf2 wrote:
| Proof of work proves not to work (2004)
| https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf
| xtracto wrote:
| Pdf download warning.
| jeroenhd wrote:
| I respect your dislike of direct download links, but what
| browsers still download PDFs these days?
| notpushkin wrote:
| Firefox on Android, for example. I guess they just didn't
| want to add mobile UI to pdf.js?
| xtracto wrote:
| Im on brave mobile and it downloaded the pdf automatically
| when I clicked the link.
| lucb1e wrote:
| TL;DR /
| TooAnnoyingPdf;Didn'tDownloadAndTryToReadOnAPhoneScreen:
|
| > We analyse [anti-email-spam PoW] both from an economic
| perspective, "how can we stop it being cost-effective to send
| spam", and from a security perspective, "spammers can access
| insecure end-user machines and will steal processing cycles to
| solve puzzles". Both analyses lead to similar values of puzzle
| difficulty. Unfortunately, real-world data from a large ISP
| shows that these difficulty levels would mean that significant
| numbers of senders of legitimate email would be unable to
| continue their current levels of activity.
|
| So it wouldn't work for mass senders, I think this means in the
| abstract? Reading into the details, page 6 says:
|
| > We examined logging data from the large UK ISP for those
| customers who use the outbound "smarthost" (about 50,000 on
| this particular weekday).
|
| Not sure I agree with the conclusion if this is their premise.
| This smarthost (an SMTP server sitting on the edge doing
| $magic, preventing client PCs from directly sending email to
| any which internet destination) is handling a ton of emails for
| free. Why should _it_ solve the PoW? The residential client
| that is really trying to send the email is the one that wants
| to send the email and should attach the PoW already before
| sending it on to a relaying server.
|
| I do agree it is probably undesirable to require that honest
| senders outcompete attackers on CPU power (=electricity =CO2,
| at least in the immediate future) to get any email delivered
| dimmke wrote:
| I recently conducted an experiment - I removed client side
| CAPTCHAs from a form that had reCAPTCHA V2 and a ton of spam was
| getting through and instead sent the content to Akismet for
| scanning. It cut the spam getting through to 0.
|
| It made me think, are client side CAPTCHAs really worth it? They
| add so much friction (and page weight - reCAPTCHA v3 adds several
| hundreds KBs) to the experience (especially when you have to
| solve puzzles or identify objects) and are gamed heavily. I know
| these get used for more than form submissions, to stop bot sign
| ups etc...
|
| I feel like it'd be just as/more effective to use other
| heuristics on the backend: IP Address, blacklisting certain email
| domains, requiring email validation or phone validation, scanning
| logs, analyzing content submitted through forms
| base wrote:
| Akismet is a paid service and their apis are tailored for
| comments. An advantage with comments is that you can just mark
| as spam if some contents have dubious links or keywords.
|
| An issue you have in many forms (e.g.: login form) is that
| there is limited data to decide if it's a real user or a bot.
| dimmke wrote:
| Agreed on all points. That's why I said in my original
| comment: "I know these get used for more than [contact] form
| submissions, to stop bot sign ups etc..."
|
| I picked up Akismet because it's been around forever, and
| while it is paid, it is very cheap for my use case.
|
| This is a bit of an aside, but I feel like Automattic is
| sitting on several companies/products and not doing a whole
| lot with them.
|
| Akismet could be expanded into a more fully featured server
| side spam detection SAAS with a flexible API etc...
|
| Gravatar could be expanded into something like OpenID.
|
| Just seems like a waste to me.
| hackermatic wrote:
| The answer is something like "yes, and..." because reCAPTCHA
| already decides whether and how to challenge the user based on
| its own internal risk score.
| dimmke wrote:
| But if a server side only solution seems to work fine, why
| add a client side element?
|
| I ran into this because I was doing some freelance work on a
| website that had worked its ass off to cut loading size as
| small as possible. For reCaptcha to develop its risk score,
| you have to load it far in advance of a form - you can't lazy
| load it, they specifically say not to:
| https://developers.google.com/recaptcha/docs/loading.
|
| It also spawned its own service worker on top of adding like
| 300kb of page weight. The API documentation is garbage, you
| have to fuck around with Google Cloud to get API keys now too
| which is confusing. It also pollutes the global scope of the
| page. It's all around terrible to work with.
| kccqzy wrote:
| Then you'd just give visitors of your websites no recourse and
| no information whatsoever on how to fix the problem. The
| benefit of client-side CAPTCHA is that humans at least can pass
| it and fix the problem even if something they don't control
| (such as their IP address having bad reputation due to shitty
| ISP) is causing problems.
|
| As a website operator it's easy to look at the spam that is
| getting through and be happy that's it's zero. But do you get
| any idea how many actual humans that you have incorrectly
| rejected? You don't have that data and it's really easy to
| screw up there.
|
| Of course if your website is small nobody cares. If you are
| bigger like Stripe you simply get bad publicity on HN. People
| on HN love to hate on mysterious bans and blocks just because
| they do something slightly unusual and your backend-only
| analysis flags them as suspicious.
|
| Abuse fighting is hard.
| dimmke wrote:
| >Then you'd just give visitors of your websites no recourse
| and no information whatsoever on how to fix the problem.
|
| This is a weird assumption. What's preventing a backend
| system from saying "Hey, we think you're a bot. Here's an
| alternative way to contact us."
|
| You obviously don't want to give away enough to help bot
| developers get through your system, but that's not the same
| as no resource and no information.
|
| >But do you get any idea how many actual humans that you have
| incorrectly rejected?
|
| Yes - like I said in my other comment, this new system
| actually logs all submissions. It just puts the ones it
| identifies as spam into a separate folder. Akismet also has
| the ability to mark things as false positives or false
| negatives.
|
| I think that automated form submissions are very context
| specific. So, the example I wrote about is for a marketing
| site, and it's a business that primarily targets other
| businesses. Most of the spam it gets is for scummy SAAS
| software, SEO optimization, etc...
|
| But my personal website has a very simple subscribe by email
| form. There were definitely a few spam submissions - someone
| just blasting out an email address and signing it up to
| whatever form would accept it. When I implemented double opt
| in - gone entirely.
|
| My larger point was that as an industry, we seem to have just
| capitulated to client side CAPTCHAs. And it sucks. It's one
| of the many shitty things about the modern web. But I think
| it's become just an assumption that it's needed, and we
| haven't reexamined that assumption in a while.
|
| I think it'd almost be better for there to be something could
| spin up in a container that has a base machine learning
| model, but can "learn" as you manually indicate messages
| etc... and then you can also choose a threshold based off
| your comfort level.
| kccqzy wrote:
| > This is a weird assumption. What's preventing a backend
| system from saying "Hey, we think you're a bot. Here's an
| alternative way to contact us."
|
| Not a weird assumption, but a necessary assumption based on
| considerations of scale.
|
| A small-scale website that doesn't receive too much spam
| attempt can manually classify spam by human agents. A
| medium-scale website can have CAPTCHA to let through some
| visitors and the rest goes to human verification. You
| appear to be in this bucket. When the scale is huge, _no
| other alternative way_ to contact exists. CAPTCHA becomes
| your only tool.
|
| In other words, CAPTCHA is only necessary because of scale;
| what do you think the first A stands for? But because of
| scale, alternate ways stop working.
| dimmke wrote:
| >When the scale is huge, no other alternative way to
| contact exists
|
| 1. This still doesn't preclude giving a blocked user
| recourse or information. Like how a streaming website
| will say "Hey, you're using a VPN. We don't allow that" -
| the user's recourse is to turn off the VPN, or find a new
| VPN that their service won't detect.
|
| 2. The case you're outlining is different from the
| scenario that most users are presented with a CAPTCHA. I
| encounter it when I am using a VPN and Googling something
| with Incognito mode. That means Google has already
| applied some heuristics and thinks that chances are
| higher than normal that I'm a bot (not logged in, no
| cookies allowed, masking IP address) before presenting
| the challenge. In those cases, you're probably correct
| that presenting a CAPTCHA is a reasonable option. I just
| think it's weird to have CAPTCHA be the default/first
| line in many cases. Especially with the focus on things
| like converting users.
| digging wrote:
| > What's preventing a backend system from saying "Hey, we
| think you're a bot. Here's an alternative way to contact
| us."
|
| In what way? If I got flagged and had to take additional
| steps to remedy a form submission I would probably just
| never go back to the site. The only way this could work is
| by identifying the issue in real-time and then sending a
| CAPTCHA to be completed by the user client-side while
| they're still handling the form.
| jsnell wrote:
| And just to be clear, it doesn't need to be either captchas
| or doing heuristic abuse detection on the backend. In the
| ideal case you're making a decision in the backend using all
| these heuristics and signals, but the outcome is not binary.
| Instead the outcome of the abuse detection logic is to choose
| from a range of options like blocking the request entirely,
| allowing it through, using a captcha, or if you're
| sophisticated enough doing other kinds of challenges.
|
| But proof of work has basically no value as am abuse
| challenge even in this kind of setup, the economics just
| can't work.
| [deleted]
| nikvaes wrote:
| Did you also look at the false positives, e.g., how many non-
| spam content was filtered by Akismet?
| dimmke wrote:
| Of course. It would be bonkers not to. It just doesn't send a
| notification if the submission is flagged as spam and puts it
| in a separate folder. So I have the ability to look at every
| submission.
|
| I put the system into effect on August 1st. There have not
| been any false positives. There was even a submission to the
| form that was clearly a B2B sales pitch, but because it was
| an actual person submitting the form and not an automated
| system it went into the "real" entries list (I think this is
| reasonable. Any business is going to have to field B2B sales
| solicitations)
|
| I put together a few rows in a spreadsheet of legitimate
| submissions (with info blocked out):
| https://imgur.com/a/stxja1Z
|
| Here's an example of one flagged as spam by Akismet that was
| submitted about an hour ago: https://imgur.com/a/PmN3t80
|
| Overall, removing reCAPTCHA has increased the total amount of
| submissions to the form, but the amount of submissions
| actually being seen by a real person who then has to waste
| time reading it, identifying that it's spam and discarding it
| has dropped to 0.
| emurlin wrote:
| Exploring Proof of Work (PoW) as a substitute for CAPTCHAs is an
| interesting idea (PoW was originally conceived as a spam
| deterrent, after all), and one that I have considered (and use)
| in some web properties I manage. Not only does it obviate
| 'trusted' third parties, but it also has the potential to reduce
| the risk of accessibility issues often associated with
| traditional CAPTCHA. It also seems like a solution that scales
| nicely, as each 'proof' is made by the client and verification is
| cheap, _and_ like a solution that finally ends the arms race
| against malicious traffic by bypassing the need to 'prove
| humanity'.
|
| However, it's one of those solutions that look good on paper, but
| upon close inspection break down entirely or come with rather
| substantial tradeoffs. Ignore the environmental discussion about
| energy consumption for a moment, and let's face the reality that
| computational power is ridiculously inexpensive.
|
| As a thought exercise, imagine you're trying to use PoW to ward
| off spammers (or the attack du jour), and you decide that a
| 1-cent expenditure on computation would be a sufficient
| deterrent. Let's say that renting a server costs $100/month (a
| bit on the higher end), or 0.004 cents per second.
|
| So, if you wanted a PoW system that would cost the spammer 1
| cent, you'd need to come up with a computational task that takes
| about 250 seconds, or over 4 minutes, to solve. That kind of
| latency just isn't practical in real-world applications. And that
| ignores that 1 cent is probably a ridiculously low price for
| protecting anything valuable.
|
| Of course, you may consider this as an alternative to regular
| CAPTCHA services. A quick search gives me that this costs
| something like $3 for 1000 CAPTCHAs solved, or 0.3 cents per
| CAPTCHA. This changes the above calculation to about 1 minute of
| compute, which still seems rather unacceptable considering that
| you might, e.g., drain your users' battery.
|
| So, overall, while I'd like for something like this to work, it
| probably only acts as a deterrent against attackers not running a
| full browser and who also aren't targeting you in particular.
___________________________________________________________________
(page generated 2023-08-08 23:00 UTC)