[HN Gopher] We Do Not Support Opt-Out Forms (2025)
___________________________________________________________________
We Do Not Support Opt-Out Forms (2025)
Author : mefengl
Score : 79 points
Date : 2026-01-27 09:35 UTC (21 hours ago)
(HTM) web link (consciousdigital.org)
(TXT) w3m dump (consciousdigital.org)
| drcongo wrote:
| That site doesn't seem to support pages loading either.
|
| edit: I feel their pain - I've spent the past week fighting AI
| scrapers on multiple sites hitting routes that somehow bypass
| Cloudflare's cache. Thousands of requests per minute, often to
| URLs that have _never_ even existed. Baidu and OpenAI, I 'm
| looking at you.
| jen729w wrote:
| > often to URLs that have never even existed
|
| Oh you're _so_ deterministic.
| trollbridge wrote:
| There is currently some AI scraper that uses residential IP
| addresses and a variety of techniques to conceal itself that
| likes downloading Swagger generated docs over... and over...
| and over.
|
| Plus hitting the endpoints for authentication that return 403
| over and over.
| comrade1234 wrote:
| Are they hitting non-existent pages? I had ip addresses
| scanning my personal server including hitting pages that don't
| exist. I had fail2ban running already so I just turned on the
| nginx filters (and had to modify the regexs a bit to get them
| working). I turned on the recididiv jail too. It's been working
| great.
| tommek4077 wrote:
| Why are "thousands" of requests noticable in any way?
| Webservers are so powerful nowadays.
| drcongo wrote:
| It's not just one scraper.
| SoftTalker wrote:
| Small, cheap VPSs that are ideal for running a small niche-
| interest blog or forum will easily fall over if they suddenly
| get thousands of requests in a short time.
|
| Look at how many sites still get "HN hugged" (formerly known
| as "slashdotted").
| ronsor wrote:
| I remember my first project posted to HN was hosted on a
| router with 32MB of RAM and a puny MIPS CPU; despite
| hitting the front page, it did not crash.
|
| At this point, I have to assume that most software is too
| inefficient to be exposed to the Internet, and that becomes
| obvious with any real load.
| SoftTalker wrote:
| While true, it's also true that it was (presumably) able
| to run and serve its intended audience until the scrapers
| came along.
| ndriscoll wrote:
| My n100 minipc can serve over 20k requests per second with
| nginx (well, it could, if not for the gigabit NIC limiting it).
| Actually IIRC it can (again, modulo uplink) do more like 40k
| rps for 404 or 304s.
| mystraline wrote:
| IP blocking Asia took my abusive scans down 95%.
|
| I also do not have a robots.txt so google doesnt index.
|
| Got some scanners who left a message how to index or dei dex,
| but was like 3 lines total in my log (thats not abusive).
|
| But yeah, blocking the whole of Asia stopped soooo much of the
| net-shit.
| blenderob wrote:
| > I also do not have a robots.txt so google doesnt index.
|
| That doesn't sound right. I don't have robots.txt too but
| Google indexes everything for me.
| mystraline wrote:
| https://news.ycombinator.com/item?id=46681454
|
| I think this is a recent change.
| daveoc64 wrote:
| All the comments there seem to suggest that there has
| been no change and that robots.txt isn't required.
| Citizen_Lame wrote:
| How did you block Asia, cloudflare or something else?
| mystraline wrote:
| You can download weekly IP blocks of regions.
|
| I import them into iptables and wholesale block them all.
|
| I dont deal with eastdakota's pile of shit.
| kjs3 wrote:
| You can block at your gateway/router. Lots of places have
| country IP ranges[1], and there are even more or less
| frequently updated lists of 'malicious' IP ranges[2]. Some
| gateway providers include 'block by country' and/or
| 'download blocklists automatically' as a feature.
|
| [1] e.g. https://github.com/ipverse/geo-ip-blocks
|
| [2] e.g. https://github.com/bitwire-it/ipblocklist
| storystarling wrote:
| Might be worth checking if they are appending random query
| strings to force cache misses. Usually you can normalize the
| request at the edge to strip those out and protect the origin.
| lambdaone wrote:
| Archive link:
|
| https://web.archive.org/web/20251009081648/https://conscious...
| dcminter wrote:
| That wasn't working for me, but this one was:
| https://archive.ph/QCMjJ
| rubinlinux wrote:
| | Since emails are sent from the individual's email account, they
| are already verified.
|
| This is not how email works, though.
| blenderob wrote:
| This.
|
| I wonder if it is a generation gap thing. The young folks these
| days have probably used only Gmail, Proton or one of these big
| email services that abstract away all the technical details of
| sending and receiving emails. Without some visibility into the
| technical details of how emails are composed and sent they
| might not have ever known that the email headers are not some
| definite source of truth but totally user defined and can be
| set to anything.
| pif wrote:
| Eh, nice times, when you could type an email just by
| telnetting to port 25...
| bradleyy wrote:
| I've certainly sent thousands of emails this way. It was a
| simpler time.
| SoftTalker wrote:
| 98% of email users of any generation don't have the first
| clue how the protocol works.
| kro wrote:
| +1, Even if they validate DKIM/SPF+alignment (aka DMARC) that
| would only verify the domain. There is no local part
| verification possible for the receiver, the sending server
| needs to be trusted with proper auth
| veverkap wrote:
| https://archive.ph/QCMjJ if it helps
| augusteo wrote:
| The irony of a site about AI opt-outs getting hammered by AI
| scrapers is almost too on the nose.
|
| trollbridge's point about scrapers using residential IPs and
| targeting authentication endpoints matches what we've seen. The
| scrapers have gotten sophisticated. They're not just crawling,
| they're probing.
|
| The economics are broken. Running a small site used to cost
| almost nothing. Now you need to either pay for CDN/protection or
| spend time playing whack-a-mole with bad actors.
|
| ronsor hosting a front-page HN project on 32MB RAM is impressive
| and also highlights how much bloat we've normalized. The scraper
| problem is real, but so is the software efficiency problem.
| wincy wrote:
| It's wild when I read a professional looking website like this
| and Conscious Digital misspells their own org name as "Consious
| Digital" in the first paragraph. I'm glad they're fighting
| against email spam but it just raises all sorts of red flags in
| my mind, or at least it used to.
|
| Funny enough, these days it indicates the article was written by
| a human. I had a dev join my team and made a few typos and it
| gave me a chuckle, as it's a whole class of mistake I hadn't seen
| in awhile.
| nabbed wrote:
| The "required login" pattern is particularly a problem. I seem to
| have namesakes around the US and UK that use my email address as
| their own when signing up for various services (mobile phone
| services, Shopify, Uber, various banks and investment firms,
| landscaper services, real estate services, home and car
| insurance, car repair shops, even _Silver Daddies_!!).
|
| I can't open an issue (to ask the service to remove my email)
| without logging in to an account I don't have control over.
|
| I don't want to use "forgot my password", because I don't want my
| IP address to be associated with a login to the account, because
| in some cases (particularly Shopify), the services were obviously
| used for fraud.
| Mordisquitos wrote:
| > _I can 't open an issue (to ask the service to remove my
| email) without logging in to an account I don't have control
| over._
|
| > _I don 't want to use "forgot my password", because I don't
| want my IP address to be associated with a login to the
| account_
|
| As a fellow victim of worldwide technically-illiterate
| namesakes, I used to do this using the TOR browser until I had
| a paid VPN service which is what I use now. Out of sheer
| paranoia, I always use a secondary browser profile while using
| a false userAgent extension.
| hilsdev wrote:
| I was pretty early to Gmail, I paid $5 for an invite to the
| beta, and secured my first(.)last@gmail.com. But now I pay for
| my own domain and my own hosted email just to avoid any
| collisions
| burnte wrote:
| So, they're trying to be an online privacy service for users but
| they require companies work in the way THEY want the companies to
| operate. This is not a serious organization I need to care about
| as a user or a service provider. They're just setting themselves
| up for failure by requiring the world around them to change.
| aklemm wrote:
| Their detailed explanation of compliance issues in the space is
| interesting and enlightening.
___________________________________________________________________
(page generated 2026-01-28 07:01 UTC)