[HN Gopher] CAPTCHAs: 'a tracking cookie farm for profit masquer...
___________________________________________________________________
CAPTCHAs: 'a tracking cookie farm for profit masquerading as a
security service'
Author : ghuroo1
Score : 99 points
Date : 2025-02-10 16:59 UTC (6 hours ago)
(HTM) web link (www.pcgamer.com)
(TXT) w3m dump (www.pcgamer.com)
| ghuroo1 wrote:
| That made us spend 819 million hours clicking on traffic lights
| to generate nearly $1 trillion for Google.
| voisin wrote:
| At an approx 750,000 hours in a human lifespan, they wasted
| 1100 human lives in totality. Unbelievable.
| thechao wrote:
| There's a dystopian short story in your comment about AI that
| can't self-bootstrap without ground-truth from humans, so
| they keep us around just to mark images, music, etc. Lives
| wasted annotating things. I like to think they'd drag us from
| solar system to solar system for this purpose.
| taftster wrote:
| Gosh. This is too perfect. I feel like you've just captured
| the exact moment we're living in.
| nonrandomstring wrote:
| You can get people to do almost anything if you lie to them that
| it's for "security".
| catlikesshrimp wrote:
| Except captcha is not supposed to be security for the user, but
| security for the website.
|
| But in the end it is not (effective) security for a website, is
| an antifeature for users and is profit for google.
| jisnsm wrote:
| As a website developer and host, I can assure you recaptcha
| works very well to stop spam and automated login requests. It
| is not perfect, but no system is.
| internetter wrote:
| yeah, a sufficiently motivated attacker can deploy some
| countermeasures to bypass it, but only really worth it for
| targeted attacks. Anyone who has a form on the internet
| knows that without any sort of captcha, you get lots of
| stupid bots just typing in jumbo. Likely you could tone
| back the captchas and still get a similar result in
| stopping the dumb bots[0]
|
| [0] on my contact page my email is protected via a custom
| cypher. if the bots execute javascript and wait 0.5s they
| can read it, but most don't. It's the dumbest PoW
| imaginable, but it works
| nonrandomstring wrote:
| > It's the dumbest PoW imaginable, but it works
|
| Nice one! I guess you mainly need to get above a certain
| novelty threshold, because all ML is based on what has
| already been seen/learned rather than actually
| outsmarting the defence.
| johnmaguire wrote:
| > Anyone who has a form on the internet knows that
| without any sort of captcha, you get lots of stupid bots
| just typing in jumbo.
|
| I recall a form of "CAPTCHA" that involved a text input
| which was hidden via CSS, but which bots would fill in
| anyway. Any text in the input caused the entire form to
| be rejected. I wonder if that style still works today.
| phoronixrly wrote:
| I've had an issue with this approach -- many browsers
| (via autofill/autocomplete) and many password managers
| (when filling in password, e-mail, etc.) tend to also get
| trapped in this honeypot... The spam _does_ still get
| stopped though.
| nonrandomstring wrote:
| I had a great conversation about this last week. I'll just
| casually leave this [0] here for anyone who has time (50
| mins - ausio only) for a deep-dive into machine learning to
| protect sites (APIs). TLDR - a lot of serious defenders
| have given up on PoW/CAPTCHA human filters because the cost
| to AI solve them has dropped to almost nothing. YMMV.
|
| [0] https://cybershow.uk/episodes.php?id=39
| phoronixrly wrote:
| As a website developer and host can you compare running
| your own CAPTCHA in place of any CAPTCHA-as-a-service? In
| my experience even a simple _static_ how much is 3 + 39
| stops the flood of spam in a form... It is also not
| perfect, but as you say no system is, and it does not
| pilfer my users ' data...
| pupppet wrote:
| What's the alternative?
| atoav wrote:
| Building your own captcha or running one that doesn't sell your
| users data to the highest bidder?
|
| What a time where people on a site called "Hacker News" ask
| such a question..
| phoronixrly wrote:
| And _if_ you ever get so big that people start writing
| bespoke software to break your CAPTCHA, then investing some
| more engineering effort into it will quite likely not be a
| problem.
|
| Of course reCAPTCHA is also still vulnerable to the use of a
| mechanical turk so even giving away your users' data won't
| save you.
| nonchalantsui wrote:
| Since this was focused on v2 and other interactive captcha, the
| alternative is to upgrade to new versions that don't do so.
| Still some downsides (and the study does address very briefly
| the use of AI to trick v3), but at the very least it does
| address some of the concerns.
|
| Important to note though that as AI gets more accessible then
| the downsides of v3 start to weigh more.
| e2le wrote:
| There are two alternatives I'm aware of, one is Attestation of
| Personhood[1] proposed by Cloudflare, the other is a proof-of-
| work[2] which the Tor project have themselves introduced[3].
|
| [1]: https://blog.cloudflare.com/introducing-cryptographic-
| attest...
|
| [2]: https://github.com/mCaptcha/mCaptcha
|
| [3]: https://blog.torproject.org/introducing-proof-of-work-
| defens...
| jszymborski wrote:
| While I get the draw, I never understood how PoW is ever
| supposed to work practically.
|
| PoW tasks are meant to work on a wide range of mobile phones,
| desktops, single-board computers, etc... you have vastly
| different compute budgets in every environment. For a PoW
| task that is usable on a five year old mobile phone, an
| adversary with a consumer RTX 50 series card (or potentially
| even an ASIC) can easily perform it many, many, many orders
| of magnitude faster.
|
| Am I missing something?
| johnmaguire wrote:
| PoW isn't meant to make something impossible, it's meant to
| attach a cost to it. Now you need to extract a value higher
| than the cost.
| jszymborski wrote:
| I understand that, but what I'm saying is that due to the
| wide gulf between the compute budget of the slowest
| device one is meant to support and a couple commodity
| VPSs adversaries need anyway to conduct a DDoS or to
| spam, there is ostensibly no extra cost.
|
| In fact, all you are doing is slowing down legitimate
| clients with old equipment and doing nothing against
| adversaries.
| phoronixrly wrote:
| I've seen a PoW CAPTCHA
| https://github.com/mCaptcha/mCaptcha and at the time it
| did not make any sense to me. I would still get spam,
| just a tiny bit slower, and spammers would have to expend
| more resources for just my site, which would barely
| register on their bill.
|
| I bet that requiring JS stops more spam than the PoW
| itself. Can anyone who tried it chime in?
|
| Oh, I see, it's effective against 'someone [who] wants to
| hammer your site'. That is usually never the case with my
| sites. I do get a steady stream of spam, but it is quite
| gentle as to not trigger any WAFs. The load comes from
| LLMs scraping this everliving shit of my sites and
| fortunately they don't seem to bother with filling in
| forms...
| lq9AJ8yrfs wrote:
| You are not missing something, you are finding it: the game
| theory of bots vs anti-bots is subtle and somewhat
| different from regular software engineering and cyber
| security.
|
| For the most part bots wish to be hidden and sites wish to
| reveal them, and this plays out over repeat games on small
| and large scales. Can be near-constantly or intermittently.
|
| The bot usually gets to make the first move against a
| backdrop that the anti-bot may or may not have a hand in.
| jszymborski wrote:
| Are you suggesting that ultra-quick solves would be a
| signal that a user-agent is malicious? That's
| interesting...
| Zak wrote:
| For a lot of places where I've encountered captchas, they could
| just _do nothing_. Simple rate limiting should probably be the
| next step. It 's not one-size-fits-all of course.
| josefresco wrote:
| I can tell you on the small level asking a simple question to
| activate the form action stops 99% of spam. Something like
| "What color is snow?" Granted, with a well trained "AI" system
| solving these questions would be trivial but I have yet to see
| it in practice.
| phoronixrly wrote:
| Sorry for nitpicking but you need a puzzle that is knowledge-
| agnostic (be it cultural or scientific), otherwise you're
| guarding your site from both bots and people unfamiliar with
| the concept of or lacking the pre-existing knowledge
| necessary to solve the puzzle.
|
| What colour is snow is close but you can't assume that
| everyone knows what snow is, let alone what colour it is.
| This includes both people with disabilities and in parts of
| the world where there is no snow...
| idunnoman1222 wrote:
| There are no humans that know the word snow who don't know
| what colours Snow is
| phoronixrly wrote:
| > There are no humans that know the word snow who don't
| know what colours Snow is
|
| Sorry, I don't follow, English is a second language to
| me, but how does this stand against my statement that
| 'many people don't know the concept of snow, let alone
| what colour it is'?
| harshreality wrote:
| There's no reason for an English language website to
| cater to people who don't know what snow is. How can it
| be discriminatory to have a question a user can't
| comprehend, when they won't be able to comprehend the
| rest of the website either? Even blind people who can
| read English Braille and input text in English know that
| snow is white, even if they've never seen it.
|
| If a website is multilingual, it can offer
| language/region selection and add appropriate questions
| for each of them.
| phoronixrly wrote:
| I did not say it was discriminatory -- I stick to basic
| terms -- you may inadvertently be guarding against people
| who for one reason or another don't possess the knowledge
| to solve the puzzle. For example I could copy over an
| integral from one of my undergrad exams. 'Please
| calculate the value of the integral and enter it in the
| field below' (completely accessible to screen readers as
| well). This would effectively ban not only people who
| have not taken a calculus class, but many of my uni
| colleagues who have happily forgotten everything about
| calculus after they took their exams 10 years ago...
|
| Another example for an inadvertently hard puzzle, this
| time due to a lack knowledge as a consequence of being
| part of a different culture, would be asking US people
| what colour is the edelweiss. In my country children
| learn about it in first grade if not in kindergarten.
| Another -- asking Europeans/US people what colour is
| romduol... I don't consider this discriminatory, I don't
| consider people in the US or Europe uneducated because
| they cannot solve such a simple puzzle... It is just
| poor/lazy/stupid design that _fails the single
| requirement to block bots and only bots_.
|
| You _would_ indeed be fine with the 'snow' question if
| your site _must_ only be visited and used by fellow
| citizens of your country (where _citizens_ implies
| similar education -- both cultural and scientific). You
| _would_ indeed be fine if you can _make sure_ the puzzle
| will be translated intelligently (including the solution)
| if your site _may_ be used in a foreign country or by
| users speaking the language in your own country.
|
| I usually cannot make any of these assumptions for any of
| the projects I work on. The site's audience is but a whim
| of the Product team, and I18n is outsourced to (once)
| translation agencies and now directly to an LLM... This
| can even be done (and frankly should be done) without the
| knowledge or input of the dev team. Also, neither
| translators nor LLMs can be expected to understand that
| they must come up with basically a new puzzle that will
| not be hard for people that use the specific language.
| And I as a developer that does not speak the specific
| foreign language while I can roughly validate their
| translation (if by any chance it passes by me for review
| and I go above and beyond what is expected of me and pass
| it trough a translation service) and return it with
| feedback for fixes, I cannot rely that they will abide by
| the feedback, or how long it would take... Those are a
| lot unknowns to consider these assumptions reliable, and
| it seems much less effort to come up with a simpler
| puzzle that contains the answer in itself... Its
| effectiveness against spam will be exactly the same.
|
| Also, you will definitely not be fine if your puzzle
| contains a concept foreign for a considerable part of
| people who can't for example see or hear. You would also
| not be fine if your puzzle's technical implementation
| makes it impossible to be perceived by them. The latter
| part is very simple to get wrong. For example, one of the
| best ways to protect any site from blind people is to
| implement a hero image slidshow that steals the focus on
| each slide. Their screen readers' focus gets moved each
| second and they literally cannot perceive, let alone
| navigate the site...
| josefresco wrote:
| I agree, and thankfully we're dealing with mostly regional
| visitors to small local business/organization websites. Not
| a global audience. That being said, it's hard to think of a
| simple question, with little to no ambiguity.
|
| Once example is for a landscaper: What is the color of
| healthy grass?
|
| The answer is "green" of course, but grass is common in our
| region. That question would not work in a culture or region
| unfamiliar with "lawn grass".
| phoronixrly wrote:
| Yes, I would go for simpler stuff (word or digit puzzles)
| and package it in a way that is friendly for screen
| readers. So... No images or video, or at least one
| alternative to them that at the same time does not make
| it easy for the bots...
|
| This has the added benefit that translators will be
| forced to come up with a translation that makes sense
| when your projects gets to a point that it needs i18n.
| dewey wrote:
| Sounds easy, but at this point everyone is trained to solve
| these captchas and implementing the questions is not a quick
| thing either on a bigger scale (Translations, cultural
| differences, bots easily bypassing them etc.). I've used
| captchas on my sites before because bots were just hammering
| the login form, checking checkboxes and causing me to rack up
| email sending bills.
| gtsop wrote:
| I think we need to critically re-evaluate what is it exactly we
| are doing on the internet, how we do it, and examine existing
| assumptions. For instance, do we really need all services to be
| centralised? Do we really need services to be "free" (part of
| the payment is selling your data ok). A server serving static
| files doesn't care about bot users, but apps... why would you
| let a stranger use your cpu/ram over the internet? I know i am
| not providing an answer but i believe we need to take a look
| again at all of these before we try to come up with an answer
| cccbbbaaa wrote:
| I've heard about form fields hidden with CSS multiple times. No
| idea how effective this is though.
| jdietrich wrote:
| -
| loloquwowndueo wrote:
| It's not three years, it's thirteen.
|
| > A lifetime value of $888 billion for all of reCAPTCHAv2's
| tracking cookies produced between 2010 and 2023.
| phoronixrly wrote:
| jdietrich, I feel your pain, I am also completely convinced
| that 2010 was 3 years ago :(
| jp191919 wrote:
| I'm at the point now that if I get a CAPTCHA, I'm just going to
| leave the site. I'll spend my money elsewhere or find an
| alternative
| a2128 wrote:
| My government's websites require solving a reCAPTCHA for basic
| services, which is horrifying. They also use Cloudflare which
| blocks me sometimes. This is in the EU
| phoronixrly wrote:
| Confirming this. I am also completely certain that gratuitous
| CAPTCHA use is banned for government systems by my country's
| set of laws governing their implementation. The judicial
| system and the community have not matured enough to consider
| this a breach of law worthy of fighting against...
| openplatypus wrote:
| Name and shame, please!!
|
| ReCAPTCHA due lack of opt out is effectively illegal in the
| EU.
| phoronixrly wrote:
| reCAPTCHA (and others based outside of EU) is illegal on
| privacy ground (in _any site_ , not just owned by EU
| entities). Homebrew CAPTCHAs are illegal due to their
| general lack of accessibility (in any site owned by an EU
| entity), and in Bulgaria their gratuitous use is banned in
| government sites on account of them being poor UX (not
| enforced unless caught during the acceptance phase of a
| project).
| cyberax wrote:
| This automatically means that you're penalizing smaller
| websites. And killing off the independent alternatives to
| Reddit/Disqus. Do you want this?
|
| Large sites like Amazon or CNN can afford to eat the bot
| traffic. Smaller sites can't.
| noah_buddy wrote:
| Sounds a heck of a lot like the bots are killing off these
| websites. Gross overuse of automated scraping is a fact of
| life but individual choice is intolerable. What if I told you
| they were the same thing?
| cyberax wrote:
| Yes, bot traffic is killing the open web. What's your
| point?
| mouse_ wrote:
| Did you read the article? What you said directly goes against
| the study's conclusion.
| cyberax wrote:
| I'm helping a neighbor to run a small e-commerce website
| with reviews. Review forms are being spammed by bots that
| get even through CAPTCHAs, and the owner needs to clean
| them up constantly. Without CAPTCHAs, it becomes
| unsustainable.
|
| They don't get a lot of bots trying stolen credit cards,
| but mostly because they are pretty niche.
| cryptoegorophy wrote:
| Problem isn't a bot traffic. I run an Ecommerce site and
| scammers run python scripts to test 1000s of cards per hour
| if there is no captcha. I hate it, my customers hate it,
| scammers hate it, but it is the only thing that keeps my
| merchant account running. Any advise is welcome!
| technion wrote:
| Logon forms are another whole issue. "Lock out the account"
| is just a DoS vector. People are quick to talk about
| systems that can defeat a captcha but if the brute force
| goes from 50 passwords/sec to one password/10 sec it's
| mission accomplished.
| unethical_ban wrote:
| What proof of humanity is sufficient? Today it is a phone call,
| or a verification sent to a real address (limit one registration
| per household), or a video call. How will we verify humanity in
| 20 years when audio and video emulation is foolproof?
|
| We'll have to have in-person attestation or make all services
| paid, perhaps.
| thatguy0900 wrote:
| Realistically it will be a government or private service that
| everyone will have to have to verify that it is a real person.
| Or at least tied to a real person so that banning will be more
| sticky.
| phoronixrly wrote:
| I would wager all services will be linked to a verified credit
| or debit (non-temporary) card. Most of them are now...
|
| How are you going to connect the physical person with an
| identity with in-person attestation? Many (several of which
| major English-speaking) countries don't have mandatory
| government IDs...
|
| A commenter below suggests that government eIDs could be used.
| I bet this will be harder to implement and will have much worse
| conversion rates than (the already terrible) mandatory
| credit/debit cards... Not to mention the hell that we as non-US
| citizens will have to endure if anyone tries to impose any form
| of mandatory ID there... One can only take so much complaining
| about government overreach about something that is basic
| necessity here in the EU...
| eykanal wrote:
| The problem with this paper is that, while technically true,
| there are many website owners who have found that CAPTCHAs have
| effectively reduced the spam on their site to zero. The fact that
| a CAPTCHA _can_ be bypassed doesn't mean that it _will_, and most
| spam bots are not using cutting-edge tech because that's
| expensive.
|
| To say "it's worthless from a security perspective" is a pretty
| harsh and largely inaccurate representation. It's been
| tremendously useful to those who have used it. If it wasn't
| valuable, it wouldn't be so widely used.
|
| Definitely agree with the whole "tons of free $$$ for Google",
| but that's kind of their business model, so yeah, Google is being
| Google. In other breaking news, water is still wet.
| chrbr wrote:
| Yeah, we've used CAPTCHAs to great effect as gracefully-
| degraded service protection for unauthenticated form
| submissions. When we detect that a particular form is being
| spammed, we automatically flip on a feature flag for it to
| require CAPTCHAs to submit, and the flood immediately stops.
| Definitely saves our databases from being pummeled, and I
| haven't seen a scenario since we implemented it a few years ago
| where the CAPTCHA didn't help immediately.
|
| Reminds me of the advice around the deadbolt on your house - it
| won't stop a determined attacker, but it will deter less-
| determined ones.
| Scaevolus wrote:
| Far too many people talk about security as if it's a simple
| binary and not about effort levels and dissuading attackers.
| rachofsunshine wrote:
| People _really_ struggle with things that have measurable,
| probabilistic effects. You see it with healthcare ( "Steve
| smoked his whole life and never got cancer, so cigarettes
| aren't bad for you!"), environmental effects ("Alice was poor
| and she didn't rob anyone, so poverty is no excuse!"), hiring
| ("Charlie is a great employee and he had no experience, so
| you should never look at backgrounds!"), etc.
|
| It should be a general standard of proof for any sort of
| sociological claim that you look at _rates_ , not just
| examples, but it usually isn't.
| ChrisArchitect wrote:
| [dupe] Earlier: https://news.ycombinator.com/item?id=42997755
|
| https://news.ycombinator.com/item?id=42970780
| darkwater wrote:
| Naive question: how can clicking on the motorbike or traffic
| light image help to train an ML algorithm if they already know
| what image has a motorbike in it, or otherwise the captcha would
| not make sense. Maybe they put 3 image which are already with a
| score of >0.90 and one which is just 0.40?
| mbb70 wrote:
| Yes, known images are used for validation, unknown images are
| used for training.
| jsheard wrote:
| That used to be really obvious back when they focused on
| transcribing books, they would give you two words to type in
| and one would often be conspicuously more difficult to read
| than the other. The easy one was for validation and the
| difficult one was unknown, so you could half-ass it by
| entering whatever for that one.
| woleium wrote:
| they ask you to solve two. one they know, the other they don't
| michaelt wrote:
| Hypothetically speaking, if they've got a 97% good ML model,
| they could implement a captcha where if you disagree with their
| model you have to do a second image, and a third image and so
| on. Then they could show each image to several different
| humans, and only if a bunch of people disagree with the model
| do they take a closer look.
|
| Frankly a lot of the images I get are... kinda easy? This isn't
| the classic book-reading recaptcha where you could see why the
| text had confused the OCR.
| breppp wrote:
| I get that people are here to hate on Google, but I am just here
| to say that reCAPTCHA albeit acquired, is an absolutely brilliant
| idea. The kind that solves two (three? if you count tracking)
| problems so elegantly
| phoronixrly wrote:
| Absolutely agreed on the 'very elegant solution for global-
| scale tracking' part!
| therein wrote:
| Multi-purpose trojan horse. Not only will it look beautiful in
| your city but you can use it as scaffolding to repair tall
| buildings or children in your community could use it as a play
| gym.
___________________________________________________________________
(page generated 2025-02-10 23:00 UTC)