[HN Gopher] CAPTCHAs: 'a tracking cookie farm for profit masquer...
       ___________________________________________________________________
        
       CAPTCHAs: 'a tracking cookie farm for profit masquerading as a
       security service'
        
       Author : ghuroo1
       Score  : 99 points
       Date   : 2025-02-10 16:59 UTC (6 hours ago)
        
 (HTM) web link (www.pcgamer.com)
 (TXT) w3m dump (www.pcgamer.com)
        
       | ghuroo1 wrote:
       | That made us spend 819 million hours clicking on traffic lights
       | to generate nearly $1 trillion for Google.
        
         | voisin wrote:
         | At an approx 750,000 hours in a human lifespan, they wasted
         | 1100 human lives in totality. Unbelievable.
        
           | thechao wrote:
           | There's a dystopian short story in your comment about AI that
           | can't self-bootstrap without ground-truth from humans, so
           | they keep us around just to mark images, music, etc. Lives
           | wasted annotating things. I like to think they'd drag us from
           | solar system to solar system for this purpose.
        
             | taftster wrote:
             | Gosh. This is too perfect. I feel like you've just captured
             | the exact moment we're living in.
        
       | nonrandomstring wrote:
       | You can get people to do almost anything if you lie to them that
       | it's for "security".
        
         | catlikesshrimp wrote:
         | Except captcha is not supposed to be security for the user, but
         | security for the website.
         | 
         | But in the end it is not (effective) security for a website, is
         | an antifeature for users and is profit for google.
        
           | jisnsm wrote:
           | As a website developer and host, I can assure you recaptcha
           | works very well to stop spam and automated login requests. It
           | is not perfect, but no system is.
        
             | internetter wrote:
             | yeah, a sufficiently motivated attacker can deploy some
             | countermeasures to bypass it, but only really worth it for
             | targeted attacks. Anyone who has a form on the internet
             | knows that without any sort of captcha, you get lots of
             | stupid bots just typing in jumbo. Likely you could tone
             | back the captchas and still get a similar result in
             | stopping the dumb bots[0]
             | 
             | [0] on my contact page my email is protected via a custom
             | cypher. if the bots execute javascript and wait 0.5s they
             | can read it, but most don't. It's the dumbest PoW
             | imaginable, but it works
        
               | nonrandomstring wrote:
               | > It's the dumbest PoW imaginable, but it works
               | 
               | Nice one! I guess you mainly need to get above a certain
               | novelty threshold, because all ML is based on what has
               | already been seen/learned rather than actually
               | outsmarting the defence.
        
               | johnmaguire wrote:
               | > Anyone who has a form on the internet knows that
               | without any sort of captcha, you get lots of stupid bots
               | just typing in jumbo.
               | 
               | I recall a form of "CAPTCHA" that involved a text input
               | which was hidden via CSS, but which bots would fill in
               | anyway. Any text in the input caused the entire form to
               | be rejected. I wonder if that style still works today.
        
               | phoronixrly wrote:
               | I've had an issue with this approach -- many browsers
               | (via autofill/autocomplete) and many password managers
               | (when filling in password, e-mail, etc.) tend to also get
               | trapped in this honeypot... The spam _does_ still get
               | stopped though.
        
             | nonrandomstring wrote:
             | I had a great conversation about this last week. I'll just
             | casually leave this [0] here for anyone who has time (50
             | mins - ausio only) for a deep-dive into machine learning to
             | protect sites (APIs). TLDR - a lot of serious defenders
             | have given up on PoW/CAPTCHA human filters because the cost
             | to AI solve them has dropped to almost nothing. YMMV.
             | 
             | [0] https://cybershow.uk/episodes.php?id=39
        
             | phoronixrly wrote:
             | As a website developer and host can you compare running
             | your own CAPTCHA in place of any CAPTCHA-as-a-service? In
             | my experience even a simple _static_ how much is 3 + 39
             | stops the flood of spam in a form... It is also not
             | perfect, but as you say no system is, and it does not
             | pilfer my users ' data...
        
       | pupppet wrote:
       | What's the alternative?
        
         | atoav wrote:
         | Building your own captcha or running one that doesn't sell your
         | users data to the highest bidder?
         | 
         | What a time where people on a site called "Hacker News" ask
         | such a question..
        
           | phoronixrly wrote:
           | And _if_ you ever get so big that people start writing
           | bespoke software to break your CAPTCHA, then investing some
           | more engineering effort into it will quite likely not be a
           | problem.
           | 
           | Of course reCAPTCHA is also still vulnerable to the use of a
           | mechanical turk so even giving away your users' data won't
           | save you.
        
         | nonchalantsui wrote:
         | Since this was focused on v2 and other interactive captcha, the
         | alternative is to upgrade to new versions that don't do so.
         | Still some downsides (and the study does address very briefly
         | the use of AI to trick v3), but at the very least it does
         | address some of the concerns.
         | 
         | Important to note though that as AI gets more accessible then
         | the downsides of v3 start to weigh more.
        
         | e2le wrote:
         | There are two alternatives I'm aware of, one is Attestation of
         | Personhood[1] proposed by Cloudflare, the other is a proof-of-
         | work[2] which the Tor project have themselves introduced[3].
         | 
         | [1]: https://blog.cloudflare.com/introducing-cryptographic-
         | attest...
         | 
         | [2]: https://github.com/mCaptcha/mCaptcha
         | 
         | [3]: https://blog.torproject.org/introducing-proof-of-work-
         | defens...
        
           | jszymborski wrote:
           | While I get the draw, I never understood how PoW is ever
           | supposed to work practically.
           | 
           | PoW tasks are meant to work on a wide range of mobile phones,
           | desktops, single-board computers, etc... you have vastly
           | different compute budgets in every environment. For a PoW
           | task that is usable on a five year old mobile phone, an
           | adversary with a consumer RTX 50 series card (or potentially
           | even an ASIC) can easily perform it many, many, many orders
           | of magnitude faster.
           | 
           | Am I missing something?
        
             | johnmaguire wrote:
             | PoW isn't meant to make something impossible, it's meant to
             | attach a cost to it. Now you need to extract a value higher
             | than the cost.
        
               | jszymborski wrote:
               | I understand that, but what I'm saying is that due to the
               | wide gulf between the compute budget of the slowest
               | device one is meant to support and a couple commodity
               | VPSs adversaries need anyway to conduct a DDoS or to
               | spam, there is ostensibly no extra cost.
               | 
               | In fact, all you are doing is slowing down legitimate
               | clients with old equipment and doing nothing against
               | adversaries.
        
               | phoronixrly wrote:
               | I've seen a PoW CAPTCHA
               | https://github.com/mCaptcha/mCaptcha and at the time it
               | did not make any sense to me. I would still get spam,
               | just a tiny bit slower, and spammers would have to expend
               | more resources for just my site, which would barely
               | register on their bill.
               | 
               | I bet that requiring JS stops more spam than the PoW
               | itself. Can anyone who tried it chime in?
               | 
               | Oh, I see, it's effective against 'someone [who] wants to
               | hammer your site'. That is usually never the case with my
               | sites. I do get a steady stream of spam, but it is quite
               | gentle as to not trigger any WAFs. The load comes from
               | LLMs scraping this everliving shit of my sites and
               | fortunately they don't seem to bother with filling in
               | forms...
        
             | lq9AJ8yrfs wrote:
             | You are not missing something, you are finding it: the game
             | theory of bots vs anti-bots is subtle and somewhat
             | different from regular software engineering and cyber
             | security.
             | 
             | For the most part bots wish to be hidden and sites wish to
             | reveal them, and this plays out over repeat games on small
             | and large scales. Can be near-constantly or intermittently.
             | 
             | The bot usually gets to make the first move against a
             | backdrop that the anti-bot may or may not have a hand in.
        
               | jszymborski wrote:
               | Are you suggesting that ultra-quick solves would be a
               | signal that a user-agent is malicious? That's
               | interesting...
        
         | Zak wrote:
         | For a lot of places where I've encountered captchas, they could
         | just _do nothing_. Simple rate limiting should probably be the
         | next step. It 's not one-size-fits-all of course.
        
         | josefresco wrote:
         | I can tell you on the small level asking a simple question to
         | activate the form action stops 99% of spam. Something like
         | "What color is snow?" Granted, with a well trained "AI" system
         | solving these questions would be trivial but I have yet to see
         | it in practice.
        
           | phoronixrly wrote:
           | Sorry for nitpicking but you need a puzzle that is knowledge-
           | agnostic (be it cultural or scientific), otherwise you're
           | guarding your site from both bots and people unfamiliar with
           | the concept of or lacking the pre-existing knowledge
           | necessary to solve the puzzle.
           | 
           | What colour is snow is close but you can't assume that
           | everyone knows what snow is, let alone what colour it is.
           | This includes both people with disabilities and in parts of
           | the world where there is no snow...
        
             | idunnoman1222 wrote:
             | There are no humans that know the word snow who don't know
             | what colours Snow is
        
               | phoronixrly wrote:
               | > There are no humans that know the word snow who don't
               | know what colours Snow is
               | 
               | Sorry, I don't follow, English is a second language to
               | me, but how does this stand against my statement that
               | 'many people don't know the concept of snow, let alone
               | what colour it is'?
        
               | harshreality wrote:
               | There's no reason for an English language website to
               | cater to people who don't know what snow is. How can it
               | be discriminatory to have a question a user can't
               | comprehend, when they won't be able to comprehend the
               | rest of the website either? Even blind people who can
               | read English Braille and input text in English know that
               | snow is white, even if they've never seen it.
               | 
               | If a website is multilingual, it can offer
               | language/region selection and add appropriate questions
               | for each of them.
        
               | phoronixrly wrote:
               | I did not say it was discriminatory -- I stick to basic
               | terms -- you may inadvertently be guarding against people
               | who for one reason or another don't possess the knowledge
               | to solve the puzzle. For example I could copy over an
               | integral from one of my undergrad exams. 'Please
               | calculate the value of the integral and enter it in the
               | field below' (completely accessible to screen readers as
               | well). This would effectively ban not only people who
               | have not taken a calculus class, but many of my uni
               | colleagues who have happily forgotten everything about
               | calculus after they took their exams 10 years ago...
               | 
               | Another example for an inadvertently hard puzzle, this
               | time due to a lack knowledge as a consequence of being
               | part of a different culture, would be asking US people
               | what colour is the edelweiss. In my country children
               | learn about it in first grade if not in kindergarten.
               | Another -- asking Europeans/US people what colour is
               | romduol... I don't consider this discriminatory, I don't
               | consider people in the US or Europe uneducated because
               | they cannot solve such a simple puzzle... It is just
               | poor/lazy/stupid design that _fails the single
               | requirement to block bots and only bots_.
               | 
               | You _would_ indeed be fine with the  'snow' question if
               | your site _must_ only be visited and used by fellow
               | citizens of your country (where _citizens_ implies
               | similar education -- both cultural and scientific). You
               | _would_ indeed be fine if you can _make sure_ the puzzle
               | will be translated intelligently (including the solution)
               | if your site _may_ be used in a foreign country or by
               | users speaking the language in your own country.
               | 
               | I usually cannot make any of these assumptions for any of
               | the projects I work on. The site's audience is but a whim
               | of the Product team, and I18n is outsourced to (once)
               | translation agencies and now directly to an LLM... This
               | can even be done (and frankly should be done) without the
               | knowledge or input of the dev team. Also, neither
               | translators nor LLMs can be expected to understand that
               | they must come up with basically a new puzzle that will
               | not be hard for people that use the specific language.
               | And I as a developer that does not speak the specific
               | foreign language while I can roughly validate their
               | translation (if by any chance it passes by me for review
               | and I go above and beyond what is expected of me and pass
               | it trough a translation service) and return it with
               | feedback for fixes, I cannot rely that they will abide by
               | the feedback, or how long it would take... Those are a
               | lot unknowns to consider these assumptions reliable, and
               | it seems much less effort to come up with a simpler
               | puzzle that contains the answer in itself... Its
               | effectiveness against spam will be exactly the same.
               | 
               | Also, you will definitely not be fine if your puzzle
               | contains a concept foreign for a considerable part of
               | people who can't for example see or hear. You would also
               | not be fine if your puzzle's technical implementation
               | makes it impossible to be perceived by them. The latter
               | part is very simple to get wrong. For example, one of the
               | best ways to protect any site from blind people is to
               | implement a hero image slidshow that steals the focus on
               | each slide. Their screen readers' focus gets moved each
               | second and they literally cannot perceive, let alone
               | navigate the site...
        
             | josefresco wrote:
             | I agree, and thankfully we're dealing with mostly regional
             | visitors to small local business/organization websites. Not
             | a global audience. That being said, it's hard to think of a
             | simple question, with little to no ambiguity.
             | 
             | Once example is for a landscaper: What is the color of
             | healthy grass?
             | 
             | The answer is "green" of course, but grass is common in our
             | region. That question would not work in a culture or region
             | unfamiliar with "lawn grass".
        
               | phoronixrly wrote:
               | Yes, I would go for simpler stuff (word or digit puzzles)
               | and package it in a way that is friendly for screen
               | readers. So... No images or video, or at least one
               | alternative to them that at the same time does not make
               | it easy for the bots...
               | 
               | This has the added benefit that translators will be
               | forced to come up with a translation that makes sense
               | when your projects gets to a point that it needs i18n.
        
           | dewey wrote:
           | Sounds easy, but at this point everyone is trained to solve
           | these captchas and implementing the questions is not a quick
           | thing either on a bigger scale (Translations, cultural
           | differences, bots easily bypassing them etc.). I've used
           | captchas on my sites before because bots were just hammering
           | the login form, checking checkboxes and causing me to rack up
           | email sending bills.
        
         | gtsop wrote:
         | I think we need to critically re-evaluate what is it exactly we
         | are doing on the internet, how we do it, and examine existing
         | assumptions. For instance, do we really need all services to be
         | centralised? Do we really need services to be "free" (part of
         | the payment is selling your data ok). A server serving static
         | files doesn't care about bot users, but apps... why would you
         | let a stranger use your cpu/ram over the internet? I know i am
         | not providing an answer but i believe we need to take a look
         | again at all of these before we try to come up with an answer
        
         | cccbbbaaa wrote:
         | I've heard about form fields hidden with CSS multiple times. No
         | idea how effective this is though.
        
       | jdietrich wrote:
       | -
        
         | loloquwowndueo wrote:
         | It's not three years, it's thirteen.
         | 
         | > A lifetime value of $888 billion for all of reCAPTCHAv2's
         | tracking cookies produced between 2010 and 2023.
        
           | phoronixrly wrote:
           | jdietrich, I feel your pain, I am also completely convinced
           | that 2010 was 3 years ago :(
        
       | jp191919 wrote:
       | I'm at the point now that if I get a CAPTCHA, I'm just going to
       | leave the site. I'll spend my money elsewhere or find an
       | alternative
        
         | a2128 wrote:
         | My government's websites require solving a reCAPTCHA for basic
         | services, which is horrifying. They also use Cloudflare which
         | blocks me sometimes. This is in the EU
        
           | phoronixrly wrote:
           | Confirming this. I am also completely certain that gratuitous
           | CAPTCHA use is banned for government systems by my country's
           | set of laws governing their implementation. The judicial
           | system and the community have not matured enough to consider
           | this a breach of law worthy of fighting against...
        
           | openplatypus wrote:
           | Name and shame, please!!
           | 
           | ReCAPTCHA due lack of opt out is effectively illegal in the
           | EU.
        
             | phoronixrly wrote:
             | reCAPTCHA (and others based outside of EU) is illegal on
             | privacy ground (in _any site_ , not just owned by EU
             | entities). Homebrew CAPTCHAs are illegal due to their
             | general lack of accessibility (in any site owned by an EU
             | entity), and in Bulgaria their gratuitous use is banned in
             | government sites on account of them being poor UX (not
             | enforced unless caught during the acceptance phase of a
             | project).
        
         | cyberax wrote:
         | This automatically means that you're penalizing smaller
         | websites. And killing off the independent alternatives to
         | Reddit/Disqus. Do you want this?
         | 
         | Large sites like Amazon or CNN can afford to eat the bot
         | traffic. Smaller sites can't.
        
           | noah_buddy wrote:
           | Sounds a heck of a lot like the bots are killing off these
           | websites. Gross overuse of automated scraping is a fact of
           | life but individual choice is intolerable. What if I told you
           | they were the same thing?
        
             | cyberax wrote:
             | Yes, bot traffic is killing the open web. What's your
             | point?
        
           | mouse_ wrote:
           | Did you read the article? What you said directly goes against
           | the study's conclusion.
        
             | cyberax wrote:
             | I'm helping a neighbor to run a small e-commerce website
             | with reviews. Review forms are being spammed by bots that
             | get even through CAPTCHAs, and the owner needs to clean
             | them up constantly. Without CAPTCHAs, it becomes
             | unsustainable.
             | 
             | They don't get a lot of bots trying stolen credit cards,
             | but mostly because they are pretty niche.
        
           | cryptoegorophy wrote:
           | Problem isn't a bot traffic. I run an Ecommerce site and
           | scammers run python scripts to test 1000s of cards per hour
           | if there is no captcha. I hate it, my customers hate it,
           | scammers hate it, but it is the only thing that keeps my
           | merchant account running. Any advise is welcome!
        
             | technion wrote:
             | Logon forms are another whole issue. "Lock out the account"
             | is just a DoS vector. People are quick to talk about
             | systems that can defeat a captcha but if the brute force
             | goes from 50 passwords/sec to one password/10 sec it's
             | mission accomplished.
        
       | unethical_ban wrote:
       | What proof of humanity is sufficient? Today it is a phone call,
       | or a verification sent to a real address (limit one registration
       | per household), or a video call. How will we verify humanity in
       | 20 years when audio and video emulation is foolproof?
       | 
       | We'll have to have in-person attestation or make all services
       | paid, perhaps.
        
         | thatguy0900 wrote:
         | Realistically it will be a government or private service that
         | everyone will have to have to verify that it is a real person.
         | Or at least tied to a real person so that banning will be more
         | sticky.
        
         | phoronixrly wrote:
         | I would wager all services will be linked to a verified credit
         | or debit (non-temporary) card. Most of them are now...
         | 
         | How are you going to connect the physical person with an
         | identity with in-person attestation? Many (several of which
         | major English-speaking) countries don't have mandatory
         | government IDs...
         | 
         | A commenter below suggests that government eIDs could be used.
         | I bet this will be harder to implement and will have much worse
         | conversion rates than (the already terrible) mandatory
         | credit/debit cards... Not to mention the hell that we as non-US
         | citizens will have to endure if anyone tries to impose any form
         | of mandatory ID there... One can only take so much complaining
         | about government overreach about something that is basic
         | necessity here in the EU...
        
       | eykanal wrote:
       | The problem with this paper is that, while technically true,
       | there are many website owners who have found that CAPTCHAs have
       | effectively reduced the spam on their site to zero. The fact that
       | a CAPTCHA _can_ be bypassed doesn't mean that it _will_, and most
       | spam bots are not using cutting-edge tech because that's
       | expensive.
       | 
       | To say "it's worthless from a security perspective" is a pretty
       | harsh and largely inaccurate representation. It's been
       | tremendously useful to those who have used it. If it wasn't
       | valuable, it wouldn't be so widely used.
       | 
       | Definitely agree with the whole "tons of free $$$ for Google",
       | but that's kind of their business model, so yeah, Google is being
       | Google. In other breaking news, water is still wet.
        
         | chrbr wrote:
         | Yeah, we've used CAPTCHAs to great effect as gracefully-
         | degraded service protection for unauthenticated form
         | submissions. When we detect that a particular form is being
         | spammed, we automatically flip on a feature flag for it to
         | require CAPTCHAs to submit, and the flood immediately stops.
         | Definitely saves our databases from being pummeled, and I
         | haven't seen a scenario since we implemented it a few years ago
         | where the CAPTCHA didn't help immediately.
         | 
         | Reminds me of the advice around the deadbolt on your house - it
         | won't stop a determined attacker, but it will deter less-
         | determined ones.
        
         | Scaevolus wrote:
         | Far too many people talk about security as if it's a simple
         | binary and not about effort levels and dissuading attackers.
        
           | rachofsunshine wrote:
           | People _really_ struggle with things that have measurable,
           | probabilistic effects. You see it with healthcare ( "Steve
           | smoked his whole life and never got cancer, so cigarettes
           | aren't bad for you!"), environmental effects ("Alice was poor
           | and she didn't rob anyone, so poverty is no excuse!"), hiring
           | ("Charlie is a great employee and he had no experience, so
           | you should never look at backgrounds!"), etc.
           | 
           | It should be a general standard of proof for any sort of
           | sociological claim that you look at _rates_ , not just
           | examples, but it usually isn't.
        
       | ChrisArchitect wrote:
       | [dupe] Earlier: https://news.ycombinator.com/item?id=42997755
       | 
       | https://news.ycombinator.com/item?id=42970780
        
       | darkwater wrote:
       | Naive question: how can clicking on the motorbike or traffic
       | light image help to train an ML algorithm if they already know
       | what image has a motorbike in it, or otherwise the captcha would
       | not make sense. Maybe they put 3 image which are already with a
       | score of >0.90 and one which is just 0.40?
        
         | mbb70 wrote:
         | Yes, known images are used for validation, unknown images are
         | used for training.
        
           | jsheard wrote:
           | That used to be really obvious back when they focused on
           | transcribing books, they would give you two words to type in
           | and one would often be conspicuously more difficult to read
           | than the other. The easy one was for validation and the
           | difficult one was unknown, so you could half-ass it by
           | entering whatever for that one.
        
         | woleium wrote:
         | they ask you to solve two. one they know, the other they don't
        
         | michaelt wrote:
         | Hypothetically speaking, if they've got a 97% good ML model,
         | they could implement a captcha where if you disagree with their
         | model you have to do a second image, and a third image and so
         | on. Then they could show each image to several different
         | humans, and only if a bunch of people disagree with the model
         | do they take a closer look.
         | 
         | Frankly a lot of the images I get are... kinda easy? This isn't
         | the classic book-reading recaptcha where you could see why the
         | text had confused the OCR.
        
       | breppp wrote:
       | I get that people are here to hate on Google, but I am just here
       | to say that reCAPTCHA albeit acquired, is an absolutely brilliant
       | idea. The kind that solves two (three? if you count tracking)
       | problems so elegantly
        
         | phoronixrly wrote:
         | Absolutely agreed on the 'very elegant solution for global-
         | scale tracking' part!
        
         | therein wrote:
         | Multi-purpose trojan horse. Not only will it look beautiful in
         | your city but you can use it as scaffolding to repair tall
         | buildings or children in your community could use it as a play
         | gym.
        
       ___________________________________________________________________
       (page generated 2025-02-10 23:00 UTC)