[HN Gopher] The Web Is Broken - Botnet Part 2
       ___________________________________________________________________
        
       The Web Is Broken - Botnet Part 2
        
       Author : todsacerdoti
       Score  : 387 points
       Date   : 2025-04-19 18:59 UTC (1 days ago)
        
 (HTM) web link (jan.wildeboer.net)
 (TXT) w3m dump (jan.wildeboer.net)
        
       | api wrote:
       | This is nasty in other ways too. What happens when someone uses
       | these B2P residential proxies to commit crimes that get traced
       | back to you?
       | 
       | Anything incorporating anything like this is malware.
        
         | reconnecting wrote:
         | Many years ago cybercriminals used to hack computers to use
         | them as residential proxies, now they purchase them online as a
         | service.
         | 
         | In most cases they are used for conducting real financial
         | crimes, but the police investigators are also aware that there
         | is a very low chance that sophisticated fraud is committed
         | directly from a residential IP address.
        
       | kastden wrote:
       | Are there any lists with known c&c servers for these services
       | that can be added to Pihole/etc?
        
         | udev4096 wrote:
         | You can use one of the list from here:
         | https://github.com/hagezi/dns-blocklists
        
       | Liftyee wrote:
       | I don't know if I should be surprised about what's described in
       | this article, given the current state of the world. Certainly I
       | didn't know about it before, and I agree with the article's
       | conclusion.
       | 
       | Personally, I think the "network sharing" software bundled with
       | apps should fall into the category of potentially unwanted
       | applications along with adware and spyware. All of the above "tag
       | along" with something the user DID want to install, and quietly
       | misuse the user's resources. Proxies like this definitely have an
       | impact for metered/slow connections - I'm tempted to start
       | Wireshark'ing my devices now to look for suspicious activity.
       | 
       | There should be a public repository of apps known to have these
       | shady behaviours. Having done some light web scraping for
       | archival/automation before, it's a pity that it'll become
       | collateral damage in the anti-AI-botfarm fight.
        
         | zzo38computer wrote:
         | I agree, this should be called spyware, and malware. There are
         | many other kind of software that also should, but netcat and
         | ncat (probably) aren't malware.
        
         | akoboldfrying wrote:
         | I agree, but the harm done to the users is only one part of the
         | total harm. I think it's quite plausible that many users
         | wouldn't mind some small amount of their bandwidth being used,
         | if it meant being able to use a handy browser extension that
         | they would otherwise have to pay actual dollars for -- but the
         | harm done to those running the servers remains.
        
       | arewethereyeta wrote:
       | I have some success in catching most of them at
       | https://visitorquery.com
        
         | lq9AJ8yrfs wrote:
         | I went to your website.
         | 
         | Is the premise that users should not be allowed to use vpns in
         | order to participate in ecommerce?
        
           | arewethereyeta wrote:
           | Nobody said that, it's your choice to take whatever action
           | fits your scenario. I have clients where VPNs are blocked
           | yes, it depends on the industry, fraud rate, chargeback rates
           | etc.
        
         | ivas wrote:
         | Checked my connection via VPN by Google/Cloudflare WARP:
         | "Proxy/VPN not detected"
        
           | arewethereyeta wrote:
           | Could be, I don't claim 100% success rate. I'll have a look
           | at one of those and see why I missed it. Thank you for
           | letting me know.
        
             | nickphx wrote:
             | measuring latency between different endpoints? I see the
             | webrtc turn relay request..
        
       | karmanGO wrote:
       | Has anyone tried to compile a list of software that uses these
       | libraries? It would be great to know what apps to avoid
        
         | arewethereyeta wrote:
         | No but here's the thing. Being in the industry for many years I
         | know they are required to mention it in the TOS when using the
         | SDKs. A crawler pulling app TOSs and parsing them could be a
         | thing. List or not, it won't be too useful outside this tech
         | community.
        
         | mzajc wrote:
         | In the case of Android, exodus has one[1], though I couldn't
         | find the malware library listed in TFA. Aurora Store[2], a FOSS
         | Google Play Store client, also integrates it.
         | 
         | [1] https://reports.exodus-privacy.eu.org/en/trackers/ [2]
         | https://f-droid.org/packages/com.aurora.store/
        
           | takluyver wrote:
           | That seems to be looking at tracking and data collection
           | libraries, though, for things like advertising and crash
           | reporting. I don't see any mention of the kind of 'network
           | sharing' libraries that this article is about. Have I missed
           | it?
        
         | lelanthran wrote:
         | > Has anyone tried to compile a list of software that uses
         | these libraries? It would be great to know what apps to avoid
         | 
         | I wouldn't mind reading a comprehensive report on SOTA with
         | regard to bot-blocking.
         | 
         | Sure, there's Anubis (although someone elsethread called it a
         | half-measure, and I'd like to know why), there's captcha's,
         | there's relying on a monopoly (cloudflare, etc) who probably
         | also wants to run their own bots at some point, but what else
         | is there?
        
         | il-b wrote:
         | A good portion of free VPN apps sell their traffic. This was
         | the thing even before the AI bot explosion.
        
       | amiga-workbench wrote:
       | What is the point of app stores holding up releases for review if
       | they don't even catch obvious malware like this?
        
         | SoftTalker wrote:
         | Money
        
         | _Algernon_ wrote:
         | They pretend to do a review to justify their 30% cartel tax.
        
           | klabb3 wrote:
           | Oh no, they review thoroughly, to make sure you don't try to
           | avoid the tax.
        
         | politelemon wrote:
         | Their marketing tells you it's for protection. What they fail
         | to omit is it's for _their_ revenue protection - observe that
         | as long as you do not threaten their revenue models, or the
         | revenue models of their partners, you are allowed through. It
         | has never been about the users or developers.
        
         | charcircuit wrote:
         | The definition of malware is fuzzy.
        
         | wyck wrote:
         | This isn't obvious, 99% of apps make multiple calls to multiple
         | services, and these SDK's are embedded into the app. How can
         | you tell whats legit outbound/inbound? Doing a fingerprint
         | search for the worst culprits might help catch some, but it
         | would likely be a game of cat and mouse.
        
           | nottorp wrote:
           | > How can you tell whats legit outbound/inbound?
           | 
           | If the app isn't a web browser, none are legit?
        
       | vlan121 wrote:
       | when the shit hits the fan, this seems like the product.
        
       | ChrisMarshallNY wrote:
       | _> So if you as an app developer include such a 3rd party SDK in
       | your app to make some money -- you are part of the problem and I
       | think you should be held responsible for delivering malware to
       | your users, making them botnet members._
       | 
       | I suspect that this goes for _many_ different SDKs. Personally, I
       | am really, _really_ sick of hearing  "That's a _solved_ problem!
       | ", whenever I mention that I tend to "roll my own," as opposed to
       | including some dependency, recommended by some jargon-addled
       | dependency addict.
       | 
       | Bad actors _love_ the dependency addiction of modern developers,
       | and have learned to set some pretty clever traps.
        
         | duskwuff wrote:
         | That may be true but I think you're missing the point here.
         | 
         | The "network sharing" behavior in these SDKs is the sole
         | purpose of the SDK. It isn't being included as a surprise along
         | with some other desirable behavior. What needs to stop is
         | developers including these SDKs as a secondary revenue source
         | in free or ad-supported apps.
        
           | ChrisMarshallNY wrote:
           | _> I think you 're missing the point here_
           | 
           | Doubt it. This is just one -of many- carrots that are used to
           | entice developers to include dodgy software into their apps.
           | 
           | The problem is a _lot_ bigger than these libraries. It 's an
           | endemic cultural issue. Much more difficult to quantify or
           | fix.
        
         | sixtyj wrote:
         | Malware, botnets... it is very similar. And people including
         | developers are - in 80 per cent - eagier to make money,
         | because... Is greed good? No, it isn't. It is a plague.
        
           | II2II wrote:
           | You're a developer who devoted time to develop a piece of
           | software. You discover that you are not generating any income
           | from it: few people can even find it in the sea of similar
           | apps, few of those are willing to pay for it, and those who
           | are willing to pay for it are not willing to pay much. To
           | make matters worse, you're going to lose a cut of what is
           | paid to the middlemen who facilitate the transaction.
           | 
           | Is that greed?
           | 
           | I can find many reasons to be critical of that developer,
           | things like creating a product for a market segment that is
           | saturated, and likely doing so because it is low hanging
           | fruit (both conceptually and in terms of complexity). I can
           | be critical of their moral judgement for how they decided to
           | generate income from their poor business judgment. But I
           | don't thinks it's right to automatically label them as
           | greedy. They _may_ be greedy, but they may also be trying to
           | generate income from their work.
        
             | andelink wrote:
             | > Is that greed?
             | 
             | Umm, yes? You are not owed anything in this life, certainly
             | not income for your choice to spend your time on building a
             | software product no one asked for. Not making money on it
             | is a perfectly fine outcome. If you desperately need
             | guaranteed money, don't build an app expecting it to sell;
             | get a job.
        
               | klabb3 wrote:
               | > If you desperately need guaranteed money, don't build
               | an app expecting it to sell; get a job.
               | 
               | Technically true but a bit of perspective might help. The
               | consumer market is distorted by free (as in beer) apps
               | that does a bunch of shitty things that should in many
               | cases be illegal or require much more informed consent
               | than today, like tracking everything they can. Then you
               | have VC funded "free" as well, where the end game is to
               | raise prices slowly to boil the frog. Then you have loss
               | leaders from megacorps, and a general anti-competitive
               | business culture.
               | 
               | Plus, this is not just in the Wild West shady places,
               | like the old piratebay ads. The top result for "timer" on
               | the App Store (for me) is indeed a timer app, but with
               | IAP of $800/y subscription... facilitated by Apple Inc,
               | who gets 15-30% of the bounty.
               | 
               | Look, the point is it's almost impossible to break into
               | consumer markets because everyone else is a predator.
               | It's a race to the bottom, ripping off clueless
               | customers. Everyone would benefit from a fairer market.
               | Especially honest developers.
        
               | what wrote:
               | >$800/year IAP
               | 
               | That's got to be money laundering or something else
               | illicit? No one is actually paying that for a timer app?
        
               | klabb3 wrote:
               | No I think it's designed to catch misclicks and children
               | operating the phone and such, sold as $17/week possibly
               | masquerading as one-time payment. They pay for App Store
               | ads for it too.
        
               | econ wrote:
               | I prefer to focus on the technical shortcomings.
               | 
               | We could have people ask for software in a more
               | convenient way.
               | 
               | Not making money could be an indication the software
               | isn't useful, but what if it is? What can the collective
               | do in that zone?
               | 
               | I imagine one could ask and pay for unwritten software
               | then get a refund if it doesn't materialize before your
               | deadline.
               | 
               | Why is discovery (of many creation) willingly handed over
               | to a hand full of mega corps?? They seem to think I want
               | to watch and read about Trump and Elon every day.
               | 
               | Promoting something because it is good is a great example
               | of a good thing that shouldn't pay.
        
           | hliyan wrote:
           | There was an earlier discussion on HN about whether
           | advertising should be more heavily regulated (or even banned
           | outright). I'm starting to wonder whether most of the
           | problems on the Web are negative side effects of the
           | incentives created by ads (including all botnets, except
           | those that enable ransomeware and espionage). Even the
           | current worldwide dopamine addition is driven by apps and
           | content created for engagement, whose entire purpose is ad
           | revenue.
        
         | rsedgwick wrote:
         | "Bad actors love the dependency addiction of modern developers"
         | 
         | Brings a new meaning to dependency injection.
        
           | rapind wrote:
           | I mean, as far as patterns go, dependency injection is also
           | quite bad.
        
             | rjbwork wrote:
             | Elaborate on this please. It seems a great boon in having
             | pushed the OO world towards more functional principles, but
             | I'm willing to hear dissent.
        
               | layer8 wrote:
               | How is dependency injection more functional?
               | 
               | My personal beef is that most of the time it acts like
               | hidden global dependencies, and the configuration of
               | those dependencies, along with their lifetimes, becomes
               | harder to understand by not being traceable in the source
               | code.
        
               | kortilla wrote:
               | Because you're passing functions to call.
        
               | layer8 wrote:
               | ??? What functions?
               | 
               | To me it's rather anti-functional. Normally, when you
               | instantiate a class, the resulting object's behavior only
               | depends on the constructor arguments you pass it (= the
               | behavior is purely a function of the arguments). With
               | dependency injection, the object's behavior may depend on
               | some hidden configuration, and not even inspecting the
               | class' source code will be able to tell you the source of
               | that bevavior, because there's only an _@Inject_
               | annotation without any further information.
               | 
               | Conversely, when you modify the configuration of which
               | implementation gets injected for which interface type,
               | you potentially modify the behavior of many places in the
               | code (including, potentially, the behavior of
               | dependencies your project may have), without having
               | passed that code any arguments to that effect. A function
               | executing that code suddenly behaves differently, without
               | any indication of that difference at the call site, or
               | traceable from the call site. That's the opposite of the
               | functional paradigm.
        
               | squeaky-clean wrote:
               | > because there's only an @Inject annotation without any
               | further information
               | 
               | It sounds like you have a gripe with a particular DI
               | framework and not the idea of Dependency Injection.
               | Because
               | 
               | > Normally, when you instantiate a class, the resulting
               | object's behavior only depends on the constructor
               | arguments you pass it (= the behavior is purely a
               | function of the arguments)
               | 
               | With Dependency Injection this is generally still true,
               | even more so than normal because you're making the
               | constructor's dependencies explicit in the arguments. If
               | you have a class CriticalErrorLogger(), you can't
               | directly tell where it logs to, is it using a flat file
               | or stdout or a network logger? If you instead have a
               | class CriticalErrorLogger(logger *io.writer), then when
               | you create it you know exactly what it's using to log
               | because you had to instantiate it and pass it in.
               | 
               | Or like Kortilla said, instead of passing in a class or
               | struct you can pass in a function, so using the same
               | example, something like CriticalErrorLogger(fn write)
        
               | layer8 wrote:
               | I don't quite understand your example, but I don't think
               | the particulars make much of a difference. We can go with
               | the most general description: With dependency injection,
               | you define points in your code where dependencies are
               | injected. The injection point is usually a variable (this
               | includes the case of constructor parameters), whose value
               | (the dependency) will be set by the dependency injection
               | framework. The behavior of the code that reads the
               | variable and hence the injected value will then depend on
               | the specific value that was injected.
               | 
               | My issue with that is this: From the point of view of the
               | code accessing the injected value (and from the point of
               | view of that code's callers), the value appears like out
               | of thin air. There is no way to trace back from that code
               | where the value came from. Similarly, when defining which
               | value will be injected, it can be difficult to trace all
               | the places where it will be injected.
               | 
               | In addition, there are often lifetime issues involved,
               | when the injected value is itself a stateful object, or
               | may indirectly depend on mutable, cached, or lazy-
               | initialized, possibly external state. The time when the
               | value's internal state is initialized or modified, or
               | whether or not it is shared between separate injection
               | points, is something that can't be deduced from the
               | source code containing the injection points, but is often
               | relevant for behavior, error handling, and general
               | reasoning about the code.
               | 
               | All of this makes it more difficult to reason about the
               | injected values, and about the code whose behavior will
               | depend on those values, from looking at the source code.
        
               | squeaky-clean wrote:
               | > whose value (the dependency) will be set by the
               | dependency injection framework
               | 
               | I agree with your definition except for this part, you
               | don't need any framework to do dependency injection. It's
               | simply the idea that instead of having an abstract base
               | class CriticalErrorLogger, with the concrete
               | implementations of StdOutCriticalErrorLogger,
               | FileCriticalErrorLogger, AwsCloudwatchCriticalErrorLogger
               | which bake their dependency into the class design; you
               | instead have a concrete class CriticalErrorLogger(dep
               | *dependency) and create dependency objects externally
               | that implement identical interfaces in different ways.
               | You do text formatting, generating a traceback, etc, and
               | then call dep.write(myFormattedLogString), and the
               | dependency handles whatever that means.
               | 
               | I agree with you that most DI frameworks are too clever
               | and hide too much, and some forms of DI like setter
               | injection and reflection based injection are instant
               | spaghetti code generators. But things like Constructor
               | Injection or Method Injection are so simple they often
               | feel obvious and not like Dependency Injection even
               | though they are. I love DI, but I hate DI frameworks;
               | I've never seen a benefit except for retrofitting legacy
               | code with DI.
               | 
               | And yeah it does add the issue or lifetime management.
               | That's an easy place to F things up in your code using DI
               | and requires careful thought in some circumstances. I
               | can't argue against that.
               | 
               | But DI doesn't need frameworks or magic methods or
               | attributes to work. And there's a lot of situations where
               | DI reduces code duplication, makes refactoring and
               | testing easier, and actually makes code feel less magical
               | than using internal dependencies.
               | 
               | The basic principle is much simpler than most DI
               | frameworks make it seem. Instead of initializing a
               | dependency internally, receive the dependency in some
               | way. It can be through overly abstracted layers or magic
               | methods, but it can also be as simple as adding an
               | argument to the constructor or a given method that takes
               | a reference to the dependency and uses that.
               | 
               | edit: made some examples less ambiguous
        
               | layer8 wrote:
               | The pattern you are describing is what I know as the
               | Strategy pattern [0]. See the example there with the
               | _Car_ class that takes a _BrakeBehavior_ as a constructor
               | parameter [1]. I have no issue with that and use it
               | regularly. The Strategy pattern precedes the notion of
               | dependency injection by around ten years.
               | 
               | The term Dependency Injection was coined by Martin Fowler
               | with this article:
               | https://martinfowler.com/articles/injection.html. See how
               | it presents the examples in terms of wiring up components
               | from a configuration, and how it concludes with stressing
               | the importance of "the principle of separating service
               | configuration from the use of services within an
               | application". The article also presents constructor
               | injection as only one of several forms of dependency
               | injection.
               | 
               | That is how everyone understood dependency injection when
               | it became popular 10-20 years ago: A way to customize
               | behavior at the top application/deployment level by
               | configuration, without having to pass arguments around
               | throughout half the code base to the final object that
               | uses them.
               | 
               | Apparently there has been a divergence of how the term is
               | being understood.
               | 
               | [0] https://en.wikipedia.org/wiki/Strategy_pattern
               | 
               | [1] The fact that _Car_ is abstract in the example is
               | immaterial to the pattern, and a bit unfortunate in the
               | Wikipedia article, from a didactic point of view.
        
               | squeaky-clean wrote:
               | They're not really exclusive ideas. The Constructor
               | Injection section in Fowler's article is exactly the same
               | as the Strategy pattern. But no one talks about the
               | Strategy pattern anymore, it's all wrapped into the idea
               | of DI and that's what caught on.
        
               | morsecodist wrote:
               | It was interesting reading this exchange. I have a
               | similar understanding of DI to you. I have never even
               | heard of a DI framework and I have trouble picturing what
               | it would look like. It was interesting to watch you two
               | converge on where the disconnect was.
        
               | rjbwork wrote:
               | Usually when people refer to "DI Frameworks" they're
               | referring to Inversion of Control (IoC) containers.
        
               | layer8 wrote:
               | I'm curious, which language/dev communities did you pick
               | this up from? Because I don't think it's universal,
               | certainly not in the Java world.
               | 
               | DI in Java is almost completely disconnected from what
               | the Strategy pattern is, so it doesn't make sense to use
               | one to refer to the other there.
        
               | naasking wrote:
               | How is the configuration hidden? Presumably you
               | configured the DI container.
        
               | rjbwork wrote:
               | Dependency injection is just passing your dependencies in
               | as constructor arguments rather than as hidden
               | dependencies that the class itself creates and manages.
               | 
               | It's equivalent to partial application.
               | 
               | An uninstantiated class that follows the dependency
               | injection pattern is equivalent to a family of functions
               | with N+Mk arguments, where Mk is the number of parameters
               | in method k.
               | 
               | Upon instantiation by passing constructor arguments,
               | you've created a family of functions each with a distinct
               | sets of Mk parameters, and N arguments in common.
        
               | theteapot wrote:
               | > Dependency injection is just passing your dependencies
               | in as constructor arguments rather than as hidden
               | dependencies that the class itself creates and manages.
               | 
               | That's the best way to think of it fundamentally. But the
               | main implication of that which is at some point
               | _something_ has to know how to resolve those dependencies
               | - i.e. they can 't just be constructed and then injected
               | from magic land. So global
               | cradles/resolvers/containers/injectors/providers
               | (depending on your language and framework) are also
               | typically part and parcel of DI, and that can have some
               | big implications on the structure of your code that some
               | people don't like. Also you can inject functions and
               | methods not just constructors.
        
               | rjbwork wrote:
               | That's because those containers are convenient to use. If
               | you don't like using them, you can configure the entire
               | application statically from your program's entry point if
               | you prefer.
        
               | layer8 wrote:
               | I don't understand what you're describing has to do with
               | dependency injection. See
               | https://news.ycombinator.com/item?id=43740196.
        
               | KronisLV wrote:
               | > Dependency injection is just passing your dependencies
               | in as constructor arguments rather than as hidden
               | dependencies that the class itself creates and manages.
               | 
               | This is all well and good, but you also need a bunch of
               | code that handles resolving those dependencies, which
               | oftentimes ends up being complex and hard to debug and
               | will also cause runtime errors instead of compile time
               | errors, which I find to be more or less unacceptable.
               | 
               | Edit: to elaborate on this, I've seen DI frameworks _not_
               | be used in "enterprise" projects a grand total of _zero_
               | times. I've done DI directly in personal projects and it
               | was fine, but in most cases you don't get to make that
               | choice.
               | 
               | Just last week, when working on a Java project that's
               | been around for a decade or so, there were issues after
               | migrating it from Spring to Spring Boot - when compiled
               | through the IDE and with the configuration to allow lazy
               | dependency resolution it would work (too many circular
               | dependencies to change the code instead), but when built
               | within a container by Maven that same exact code and
               | configuration would no longer work and injection would
               | fail.
               | 
               | I'm hoping it's not one of those weird JDK platform bugs
               | but rather an issue with how the codebase is compiled
               | during the container image build, but the issue is mind
               | boggling. More fun, if you take the .jar that's built in
               | the IDE and put it in the container, then everything
               | works, otherwise it doesn't. No compilation warnings,
               | most of the startup is fine, but if you build it in the
               | container, you get a DI runtime error about no lazy
               | resolution being enabled even if you hardcode the setting
               | to be on in Java code: https://docs.spring.io/spring-
               | boot/api/kotlin/spring-boot-pr...
               | 
               | I've also seen similar issues before containers, where
               | locally it would run on Jetty and use Tomcat on server
               | environments, leading to everything compiling and working
               | locally but throwing injection errors on the server.
               | 
               | What's more, it's not like you can (easily) put a
               | breakpoint on whatever is trying to inject the
               | dependencies - after years of Java and Spring I grow more
               | and more convinced that anything that doesn't generate
               | code that you can inspect directly (e.g. how you can look
               | at a generated MapStruct mapper implementation) is
               | somewhat user hostile and will complicate things. At
               | least modern Spring Boot is good in that more of the
               | configuration is just code, because otherwise good luck
               | debugging why some XML configuration is acting weird.
               | 
               | In other words, DI can make things more messy due to a
               | bunch of technical factors around how it's implemented
               | (also good luck reading those stack traces), albeit even
               | in the case of Java something like Dagger feels more sane
               | https://dagger.dev/ despite never really catching on.
               | 
               | Of course, one could say that circular dependencies or
               | configuration issues are project specific, but given
               | enough time and projects you will almost inevitably get
               | those sorts of headaches. So while the theory of DI is
               | nice, you can't just have the theory without practice.
        
               | vbezhenar wrote:
               | Dependency injection is not hidden. It's quite the
               | opposite: dependency injection lists explicitly all the
               | dependencies in a well defined place.
               | 
               | Hidden dependencies are: untyped context variable; global
               | "service registry", etc. Those are hidden, the only way
               | to find out which dependencies given module has is to
               | carefully read its code and code of all called functions.
        
               | hliyan wrote:
               | Inclined to agree. Consider that a singleton dependency
               | is essentially a global, and differs from a traditional
               | global, only in that the reference is kept in a container
               | and supplied magically via a constructor variable. Also
               | consider that constructor calls are now outside the
               | application layer frames of the callstack, in case you
               | want to trace execution.
        
               | rapind wrote:
               | It starts off feeling like a superpower allowing to to
               | change a system's behaviour without changing its code
               | directly. It quickly devolves into a maintenance
               | nightmare though every time I've encountered it.
               | 
               | I'm talking more specifically about Aspect Oriented
               | Programming though and DI containers in OOP, which seemed
               | pretty clever in theory, but have a lot of issues in
               | reality.
               | 
               | I take no issues with currying in functional programming.
        
               | rjbwork wrote:
               | In terms of aspects I try to keep it limited to already
               | existing framework touch points for things like logging,
               | authentication and configuration loading. I find that
               | writing middleware that you control with declarative
               | attributes can be good for those use cases.
               | 
               | There are other good uses of it but it absolutely can get
               | out of control, especially if implemented by someone
               | whose just discovered it and wants to use it for
               | everything.
        
             | ironSkillet wrote:
             | I have found that the dependency injection pattern makes it
             | far easier to write clean tests for my code.
        
         | ryandrake wrote:
         | I'm constantly amazed at how careless developers are with
         | pulling 3rd party libraries into their code. Have you audited
         | this code? Do you know everything it does? Do you know what
         | security vulnerabilities exist in it? On what basis do you
         | trust it to do what it says it is doing and nothing else?
         | 
         | But nobody seems to do this diligence. It's just "we are in a
         | rush. we need X. dependency does X. let's use X." and that's
         | it!
        
           | ClumsyPilot wrote:
           | > Have you audited this code?
           | 
           | Wrong question. "Are you paid to audit this code?" And "if
           | you fail to audit this code, who'se problem is it?"
        
             | ryandrake wrote:
             | I think developers are paid to competently deliver software
             | to their employer, and part of that competence is properly
             | vetting the code you are delivering. If I wrote code that
             | ended up having serious bugs like crashing, I'd expect to
             | have at least a minimum consequence, like root causing it
             | and/or writing a postmortem to help avoid it in the future.
             | Same as I'd expect if I pulled in a bad dependency.
        
               | baumy wrote:
               | Your expectations do not match the employment market as I
               | have ever experienced it.
               | 
               | Have you ever worked anywhere that said "go ahead and
               | slow down on delivering product features that drive
               | business value so you can audit the code of your
               | dependencies, that's fine, we'll wait"?
               | 
               | I haven't.
        
               | ryandrake wrote:
               | Yea, and that's the problem. If such absolute rock bottom
               | minimal expectations (know what the code does) are seen
               | as too slow and onerous, the industry is cooked!
        
               | ClumsyPilot wrote:
               | Yeah, about that, businesses are pushing and introducing
               | code written by AI/LLM now, so now you won't even know
               | what your own code does.
        
               | djeastm wrote:
               | Due diligence is a sliding scale. Work at a webdev agency
               | is "get it done as fast as possible for this MVP we
               | need". Work at NASA or a biomedical device company? Every
               | line of code is triple-checked. It's entirely dependent
               | on the cost/benefit analysis.
        
             | Funes- wrote:
             | "who'se" is wild.
        
             | SoftTalker wrote:
             | If a car manufacturer sources a part from a third party,
             | and that part has a serious safety problem, who will the
             | customer blame? And who will be responsible for the recall
             | and the repairs?
        
               | ClumsyPilot wrote:
               | But we aren't car business, am we are in joker business.
               | 
               | When was the last time producer of an app was held
               | legally accountable for negligence, had to pay
               | compensation and damages, etc?
        
         | vinnymac wrote:
         | This is especially true for script kiddies, which is why I am
         | so thankful for https://e18e.dev/
         | 
         | AI is making this worse than ever though, I am constantly
         | having to tell devs that their work is failing to meet
         | requirements, because AI is just as bad as a junior dev when it
         | comes to reaching for a dependency. It's like we need training
         | wheels for the prompts juniors are allowed to write.
        
         | zzo38computer wrote:
         | I agree that there are things with too many dependencies and I
         | try to avoid that. I think it is a good idea to minimize how
         | many dependencies are needed (even indirect dependencies;
         | however, in some cases a dependency is not a specific
         | implementation, and in that case indirect dependencies are less
         | of a problem, although having a good implementation with less
         | indirect dependencies is still beneficial). I may write my own,
         | in many cases. However, another reason for writing my own is
         | because of other kind of problems in the existing programs. Not
         | all problems are malicious; many are just that they do not do
         | what I need, or do too much more than what I need, or both.
         | (However, most of my stuff is C rather than JavaScript; the
         | problem seems to be more severe with JavaScript, but I do not
         | use that much.)
        
         | bloppe wrote:
         | These are kind of separate issues. Apps using Infatica _know_
         | that they 're selling access to their users' bandwidth. It's
         | intentional.
        
       | jonplackett wrote:
       | How is this not just illegal? Surely there's something in GDPR
       | that makes this not allowed.
        
         | Retr0id wrote:
         | iiuc, they do actually ask the user for permission
        
           | fc417fc802 wrote:
           | Which is ironic considering that I strongly disagree with one
           | of the primary walled garden justifications, used
           | particularly in the case of Apple, which amounts to "the end
           | user is too stupid to decide on his own". Unfortunately, even
           | if I disagree with it as a guiding principle sometimes that
           | statement proves true.
        
             | klabb3 wrote:
             | It's not about stupidity, but practicality. People can't
             | give informed consent for 100 ToS for different companies,
             | and keep those up to date. That's why there are laws.
        
           | SoftTalker wrote:
           | No doubt in a dense wall of text that the user must accept to
           | use the application, or worse is deemed to have accepted by
           | using the application at all.
        
       | zahlman wrote:
       | > I am now of the opinion that every form of web-scraping should
       | be considered abusive behaviour and web servers should block all
       | of them. If you think your web-scraping is acceptable behaviour,
       | you can thank these shady companies and the "AI" hype for moving
       | you to the bad corner.
       | 
       | I imagine that e.g. Youtube would be happy to agree with this.
       | Not that it would turn them against AI generally.
        
         | BlueTemplar wrote:
         | Yeah, also this means the death of archival efforts like the
         | Internet Archive.
        
           | jeroenhd wrote:
           | Welcome scrapers (IA, maybe Google and Bing) can publish
           | their IP addresses and get whitelisted. Websites that want to
           | prevent being on the Internet Archive can pretty much just
           | ask for their website to be excluded (even retroactively).
           | 
           | [Cloudflare](https://developers.cloudflare.com/cache/troubles
           | hooting/alwa...) tags the internet archive as operating from
           | 207.241.224.0/20 and 208.70.24.0/21 so disabling the bot-
           | prevention framework on connections from there should be
           | enough.
        
             | trinsic2 wrote:
             | This sounds like it would be a good idea. Create a
             | whitelist of IPs and block the rest.
        
             | realusername wrote:
             | That's basically asking to close the market in favor of the
             | current actors.
             | 
             | New actors have the right to emerge.
        
               | 0dayz wrote:
               | No they don't.
               | 
               | There's no rule that you have to let anyone in who claims
               | to be a web crawler.
        
               | areyourllySorry wrote:
               | which is why they will stop claiming to be one.
        
               | chii wrote:
               | so what happened to competition fostering a better
               | outcome for all then?
        
               | realusername wrote:
               | So who decides that you can be one? Right now it's
               | Cloudflare, a litteral monopoly...
               | 
               | The truth is that I sympathize with the people trying to
               | use mobile connections to bypass such a cartel.
               | 
               | What Cloudflare is doing now is worse than the web
               | crawlers themselves and the legality of blocking crawlers
               | with a monopoly is dubious at best.
        
               | jeroenhd wrote:
               | They have the right to try to convince me to let them
               | scrape me. Most of the time they're thinly veiled data
               | traders. I haven't seen any new company try to scrape my
               | stuff since maybe Kagi.
               | 
               | Kagi is welcome to scrape from their IP addresses. Other
               | bots that behave are fine too (Huawei and various other
               | Chinese bots don't and I've had to put an IP block on
               | those).
        
             | areyourllySorry wrote:
             | a large chunk of internet archive's snapshots are from
             | archiveteam, where "warriors" bring their own ips (and they
             | crawl respectfully!). save page now is important too, but
             | you don't realise what is useful until you lose it.
        
         | Centigonal wrote:
         | yeah, but you can't, that's the problem. Plenty of service
         | operators would like to block every scraper that doesn't obey
         | their robots.txt, but there's no good way to do that without
         | blocking human traffic too (Anubis et al are okay, but they are
         | half-measures).
         | 
         | On a separate note, I believe open web scraping has been a
         | massive benefit to the internet on net, and almost entirely
         | positive pre-2021. Web scraping & crawling enables search
         | engines, services like Internet Archive, walled-garden-busting
         | (like Invidious, yt-dlp, and Nitter), mashups (Spotube, IFTT,
         | and Plaid would have been impossible to bootstrap without web
         | scraping), and all kinds of interesting data science projects
         | (e.g. scraping COVID-19 stats from local health departments to
         | patch together a picture of viral spread for epidemiologists).
        
           | udev4096 wrote:
           | We should have a way to verify the user-agents of the valid
           | and useful scrapers such as Internet Archive by having some
           | kind of cryptographic signature of their user-agents and
           | being able to validate it with any reverse proxy seems like a
           | good start
        
             | nottorp wrote:
             | Self signed, I hope.
             | 
             | Or do you want a central authority that decides who can do
             | new search engines?
        
               | udev4096 wrote:
               | Using DANE is probably the best idea even though it's
               | still not mainstream
        
           | lelanthran wrote:
           | > Plenty of service operators would like to block every
           | scraper that doesn't obey their robots.txt, but there's no
           | good way to do that without blocking human traffic too
           | (Anubis et al are okay, but they are half-measures)
           | 
           | Why is Anubis-type mitigations a half-measure?
        
             | Centigonal wrote:
             | Anubis, go-away, etc are great, don't get me wrong -- but
             | what Anubis does is impose a cost on every query. The
             | website operator is hoping that the compute will have a
             | rate-limiting effect on scrapers while minimally impacting
             | the user experience. It's almost like chemotherapy, in that
             | you're poisoning everyone in the hope that the aggressive
             | bad actors will be more severely affected than the less
             | aggressive good actors. Even the Anubis readme calls it a
             | nuclear option. In practice it appears to work pretty well,
             | which is great!
             | 
             | It's a half-measure because:
             | 
             | 1. You're slowing down scrapers, not blocking them. They
             | will still scrape your site content in violation of
             | robots.txt.
             | 
             | 2. Scrapers with more compute than IP proxies will not be
             | significantly bottlenecked by this.
             | 
             | 3. This may lead to an arms race where AI companies respond
             | by beefing up their scraping infrastructure, necessitating
             | more difficult PoW challenges, and so on. The end result of
             | this hypothetical would be a more inconvenient and
             | inefficient internet for everyone, including human users.
             | 
             | To be clear: I think Anubis is a great tool for website
             | operators, and one of the best self-hostable options
             | available today. However, it's a workaround for the core
             | problem that we can't reliably distinguish traffic from
             | badly behaving AI scrapers from legitimate user traffic.
        
       | pton_xd wrote:
       | I thought the closed-garden app stores were supposed to protect
       | us from this sort of thing?
        
         | whstl wrote:
         | Once again this demonstrate that closed gardens only benefit
         | the owners of the garden, and not the users.
         | 
         | What good is all the app vetting and sandbox protection in iOS
         | (dunno about Android) if it doesn't really protect me from
         | those crappy apps...
        
           | 20after4 wrote:
           | At the very least, Apple should require conspicuous
           | disclosure of this kind of behavior that isn't just hidden in
           | the TOS.
        
           | BlueTemplar wrote:
           | Also my reaction when the call is for Google, Apple,
           | Microsoft to fix this : DDOS being illegal, shouldn't the
           | first reaction instead to be to contact law enforcement ?
           | 
           | If you treat platforms like they are all-powerful, then
           | that's what they are likely to become...
        
           | musicale wrote:
           | Sandboxing means you can limit network access. For example,
           | on Android you can disallow wi-fi and cellular access (not
           | sure about bluetooth) on a per-app basis.
           | 
           | Network access settings should really be more granular for
           | apps that have a legitimate need.
           | 
           | App store disclosure labels should also add network usage
           | disclosure.
        
         | 20after4 wrote:
         | That's what they want you to think.
        
         | kibwen wrote:
         | If you find yourself in a walled garden, understand that you're
         | the crop being grown and harvested.
        
       | jt2190 wrote:
       | I'm really struggling to understand how this is different than
       | malware we've had forever. Can someone explain what's novel about
       | this?
        
         | desertmonad wrote:
         | That its _not_ being treated like malware.
        
           | jt2190 wrote:
           | In the sense that people are voluntarily installing and
           | running this malware on their computers, rather than being
           | _tricked_ into running it? Is that the only difference?
        
             | int_19h wrote:
             | They are still tricked into running it, since it's normally
             | not an advertised "feature" of any app that uses such SDKs.
        
         | downrightmike wrote:
         | I think it is funny that the mobile OS is trying to be as
         | secure as possible, but then they allow this to run on top
        
       | rsedgwick wrote:
       | I think tech can still be beautiful in a less grandiose and
       | "omniparadisical" way than people used to dream of. "A wide open
       | internet, free as in speech this, free as in beer that, open
       | source wonders, open gardens..." Well, there are a lot of
       | incentives that fight that, and game theory wins. Maybe we
       | download software dependencies from our friends, the ones we
       | actually trust. Maybe we write more code ourselves--more
       | homesteading families that raise their own chickens, jar their
       | own pickled carrots, and code their own networking utilities.
       | Maybe we operate on servers we own, or our friends own, and we
       | don't get blindsided by news that the platforms are selling our
       | data and scraping it for training.
       | 
       | Maybe it's less convenient and more expensive and onerous. Do
       | good things require hard work? Or did we expect everyone to
       | ignore incentives forever while the trillion-dollar hyperscalers
       | fought for an open and noble internet and then wrapped it in
       | affordable consumer products to our delight?
       | 
       | It reminds me of the post here a few weeks ago about how Netflix
       | used to be good and "maybe I want a faster horse" - we want
       | things to be built for us, easily, cheaply, conveniently, by
       | companies, and we want those companies not to succumb to
       | enshittification - but somehow when the companies just follow the
       | game theory and turn everything into a TikToky neural-networks-
       | maximizing-engagement-infinite-scroll-experience, it's their
       | fault, and not ours for going with the easy path while hoping the
       | corporations would not take the easy path.
        
       | reconnecting wrote:
       | Residential IP proxies have some weaknesses. One is that they
       | ofter change IP addresses during a single web session. Second, if
       | IP come from the same proxies provider, they are often
       | concentrated within a sing ASN, making them easier to detect.
       | 
       | We are working on an open-source fraud prevention platform [1],
       | and detecting fake users coming from residential proxies is one
       | of its use cases.
       | 
       | [1] https://www.github.com/tirrenotechnologies/tirreno
        
         | gbcfghhjj wrote:
         | At least here in the US most residential ISPs have long leases
         | and change infrequently, weeks or months.
         | 
         | Trying to understand your product, where is it intended to sit
         | in a network? Is it a standalone tool that you use to identify
         | these IPs and feed into something else for blockage or is it
         | intended to be integrated into your existing site or is it
         | supposed to proxy all your web traffic? The reason I ask is it
         | has fairly heavyweight install requirements and Apache and PHP
         | are kind of old school at this point, especially for new
         | projects and companies. It's not what they would commonly be
         | using for their site.
        
           | reconnecting wrote:
           | Indeed, if it's a real user from a residential IP address, in
           | most cases it will be the same network. However, if it's a
           | proxy from residential IPs, there could be 10 requests from
           | one network, the 11th request from a second network, and the
           | 12th request back from the same network. This is a red flag.
           | 
           | Thank you for your question. tirreno is a standalone app that
           | needs to receive API events from your main web application.
           | It can work perfectly with 512GB Postgres RAM or even lower,
           | however, in most cases we're talking about millions of events
           | that request resources.
           | 
           | It's much easier to write a stable application without
           | dependencies based on mature technologies. tirreno is fairly
           | 'boring software'.
        
             | sroussey wrote:
             | My phone will be on the home network until I walk out of
             | the house and then it will change networks. This should not
             | be a red flag.
        
               | reconnecting wrote:
               | Effective fraud prevention relies on both the full user
               | context and the behavioral patterns of known online
               | fraudsters. The key idea is that an IP address cannot be
               | used as a red flag on its own without considering the
               | broader context of the account. However, if we know that
               | the fraudsters we're dealing with are using mobile
               | networks proxies and are randomly switching between two
               | mobile operators, that is certainly a strong risk signal.
        
               | JimDabell wrote:
               | An awful lot of free Wi-Fi networks you find in malls are
               | operated by different providers. Walking from one side of
               | a mall to the other while my phone connects to all the
               | Wi-Fi networks I've used previously would have you flag
               | me as a fraudster if I understand your approach
               | correctly.
        
               | reconnecting wrote:
               | We are discussing user behavior in the context of a web
               | system. The fact that your device has connected to
               | different Wi-Fi networks doesn't necessarily mean that
               | all of them were used to access the web application.
               | 
               | Finally, as mentioned earlier, there is no silver bullet
               | that works for every type of online fraudster. For
               | example, in some applications, a TOR connection might be
               | considered a red flag. However, if we are talking about
               | hn visitors, many of them use TOR on a daily basis.
        
         | andelink wrote:
         | The first blog post in this series[1], linked to at the top of
         | TFA, offers an analysis on the potential of using ASNs to
         | detect such traffic. Their conclusion was that ASNs are not
         | helpful for this use-case, showing that across the 50k IPs
         | they've blocked, there is less than 4 IP addresses per ASN, on
         | average.
         | 
         | [1] https://jan.wildeboer.net/2025/02/Blocking-Stealthy-
         | Botnets/
        
           | reconnecting wrote:
           | What was done manually in the first blog is exactly what
           | tirreno helps to achieve by analyzing traffic, here is live
           | example [1]. Blocking an entire ASN should not be considered
           | a strategy when real users are involved.
           | 
           | Regarding the first post, it's rare to see both datacenter
           | network IPs and mobile proxy IP addresses used
           | simultaneously. This suggests the involvement of more than
           | one botnet. The main idea is to avoid using IP addresses as
           | the sole risk factor. Instead, they should be considered as
           | just one part of the broader picture of user behavior.
           | 
           | [1] https://play.tirreno.com
        
         | gruez wrote:
         | >One is that they ofter change IP addresses during a single web
         | session. Second, if IP come from the same proxies provider,
         | they are often concentrated within a sing ASN, making them
         | easier to detect.
         | 
         | Both are pretty easy to mitigate with a geoip database and some
         | smart routing. One "residential proxy" vendor even has session
         | tokens so your source IP doesn't randomly jump between each
         | request.
        
           | reconnecting wrote:
           | And this is the exact reason why IP addresses cannot be
           | considered as the one and only signal for fraud prevention.
        
       | at0mic22 wrote:
       | Strange the HolaVPN e.g. Brightdata is not mentioned. They've
       | been using user hosts for those purposes for decades, and also
       | selling proxies en masse. Fun fact they don't have any servers
       | for the VPN. All the VPN traffic is routed through ... other
       | users!
        
         | arewethereyeta wrote:
         | They are even the first to do it and the most litigious of all.
         | Trying to push patents on everything possible, even on water if
         | they can.
        
         | Klonoar wrote:
         | Is it really strange if the logo is right there in the article?
        
         | andelink wrote:
         | Hola is mentioned in the authors prior post on this topic,
         | linked to at the top of TFA:
         | https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/
        
       | armchairhacker wrote:
       | > I am now of the opinion that every form of web-scraping should
       | be considered abusive behaviour and web servers should block all
       | of them. If you think your web-scraping is acceptable behaviour,
       | you can thank these shady companies and the "AI" hype for moving
       | you to the bad corner.
       | 
       | Why jump to that conclusion?
       | 
       | If a scraper clearly advertises itself, follows robots.txt, and
       | has reasonable backoff, it's not abusive. You can easily block
       | such a scraper, but then you're encouraging stealth scrapers
       | because they're still getting your data.
       | 
       | I'd block the scrapers that try to hide and waste compute, but
       | deliberately allow those that don't. And maybe provide a sitemap
       | and API (which besides being easier to scrape, can be faster to
       | handle).
        
       | panstromek wrote:
       | I'd expect this to be against app store and google play rules,
       | they are very picky.
        
       | Pesthuf wrote:
       | We need a list of apps that include these libraries and any
       | malware scanner - including Windows Defender, Play Protect and
       | whatever Apple calls theirs - need to put infected applications
       | into quarantine immediately. Just because it's not _directly_
       | causing damage to the device running the malware is running on,
       | that doesn 't mean it's not malware.
        
         | philippta wrote:
         | Apps should be required to ask for permission to access
         | specific domains. Similar to the tracking protection, Apple
         | introduced a while ago.
         | 
         | Not sure how this could work for browsers, but the other 99% of
         | apps I have on my phone should work fine with just a single
         | permitted domain.
        
           | jay_kyburz wrote:
           | Oh, that's an interesting idea. A local DNS where I have to
           | add every entry. A white list rather than Australia's
           | national blacklist.
        
           | snackernews wrote:
           | My iPhone occasionally displays an interrupt screen to remind
           | me that my weather app has been accessing my location in the
           | background and to confirm continued access.
           | 
           | It should also do something similar for apps making chatty
           | background requests to domains not specified at app review
           | time. The legitimate use cases for that behaviour are few.
        
           | zzo38computer wrote:
           | I think capability based security with proxy capabilities is
           | the way to do it, and this would make it possible for the
           | proxy capability to intercept the request and ask permission,
           | or to do whatever else you want it to do (e.g. redirections,
           | log any accesses, automatically allow or disallow based on a
           | file, use or ignore the DNS cache, etc).
           | 
           | The system may have some such functions built in, and asking
           | permission might be a reasonable thing to include by default.
        
             | XorNot wrote:
             | Try actually using a system like this. OpenSnitch and
             | LittleSnitch do it for Linux and MacOS respectively. Fedora
             | has a pretty good interface for SELinux denials.
             | 
             | I've used all of them, and it's a deluge: it is too much
             | information to reasonably react to.
             | 
             | Your broad is either deny or accept but there's no sane way
             | to reliably know what you should do.
             | 
             | This is not and cannot be an individual problem: the easy
             | part is building high fidelity access control, the hard
             | part is making useful policy for it.
        
               | zzo38computer wrote:
               | I suggested proxy capabilities, that it can easily be
               | reprogrammed and reconfigured; if you want to disable
               | this feature then you can do that too. It is not only
               | allow or deny; other things are also possible (e.g.
               | simulate various error conditions, artificially slow down
               | the connection, go through a proxy server, etc). (This
               | proxy capability system would be useful for stuff other
               | than network connections too.)
               | 
               | > it is too much information to reasonably react to.
               | 
               | Even if it asks, does not necessarily mean it has to ask
               | every time if the user lets it keep the answer (either
               | for the current session for until the user deliberately
               | deletes this data). Also, if it asks too much because it
               | tries to access too many remote servers, then might be
               | spyware, malware, etc anyways, and is worth investigating
               | in case that is what it is.
               | 
               | > the hard part is making useful policy for it.
               | 
               | What the default settings should be is a significant
               | issue. However, changing the policies in individual cases
               | for different uses, is also something that a user might
               | do, since the default settings will not always be
               | suitable.
               | 
               | If whoever manages the package repository, app store, etc
               | is able to check for malware, then this is a good thing
               | to do (although it should not prohibit the user from
               | installing their own software and modifying the existing
               | software), but security on the computer is also helpful,
               | and neither of these is the substitute for the other;
               | they are together.
        
           | tzury wrote:
           | Vast majority of revenues in the mobile apps ecosystem are
           | ads, which by design pulled from 3rd parties (and are part of
           | the broader problem discussed in this post).
           | 
           | I am waiting for Apple to enable /etc/hosts or something
           | similar on iOS devices.
        
           | klabb3 wrote:
           | On the one hand, yes this could work for many cases. On the
           | other hand, good bye p2p. Not every app is a passive client-
           | server request-response. One needs to be really careful with
           | designing permission systems. Apple has already killed many
           | markets before they had a chance to even exist, such as
           | companion apps for watches and other peripherals.
        
             | kmeisthax wrote:
             | P2P was practically dead on iPhone even back in 2010. The
             | whole "don't burn the user's battery" thing precludes
             | mobile phones doing anything with P2P other than leeching
             | off of it. The only exceptions are things like AirDrop;
             | i.e. locally peer-to-peer things that are only active when
             | in use and don't try to form an overlay or mesh network
             | that would require the phone to become a router.
             | 
             | And, AFAIK, you already need special permission for
             | anything other than HTTPS to specific domains on the public
             | Internet. That's why apps ping you about permissions to
             | access "local devices".
        
               | zzo38computer wrote:
               | > other than HTTPS to specific domains on the public
               | Internet
               | 
               | They should need special permission for that too.
        
             | Pesthuf wrote:
             | Maybe there could be a special entitlement that Apple's
             | reviewers would only grant to applications that have a
             | legitimate reason to require such connections. Then only
             | applications granted that permission would be able to make
             | requests to arbitrary domains / IP addresses.
             | 
             | That's how it works with other permissions most
             | applications should not have access to, like accessing user
             | locations. (And private entitlements third party
             | applications can't have are one way Apple makes sure nobody
             | can compete with their apps, but that's a separate issue.)
        
             | nottorp wrote:
             | > On the other hand, good bye p2p.
             | 
             | You mean, good bye using my bandwidth without my
             | permission? That's good. And if I install a bittorrent
             | client on my phone, I'll know to give it permission.
             | 
             | > such as companion apps for watches and other peripherals
             | 
             | That's just apple abusing their market position in phones
             | to push their watch. What does it have to do with p2p?
        
               | klabb3 wrote:
               | > using my bandwidth without my permission
               | 
               | What are you talking about?
               | 
               | > What does it have to do with p2p?
               | 
               | It's an example of when you design sandboxes/firewalls
               | it's very easy to assume all apps are one big homogenous
               | blob doing rest calls and everything else is malicious or
               | suspicious. You often need strange permissions to do
               | interesting things. Apple gives themselves these perms
               | all the time.
        
               | nottorp wrote:
               | Wait, why should applications be allowed to do rest calls
               | by default?
               | 
               | > What are you talking about?
               | 
               | That's the main use case for p2p in an application isn't
               | it? Reducing the vendors bandwidth bill...
        
           | vbezhenar wrote:
           | Do you suggest to outright forbid TCP connections for user
           | software? Because you can compile OpenSSL or any other TLS
           | library and do a TCP connection to port 443 which will be
           | opaque for operating system. They can do wild things like
           | kernel-level DPI for outgoing connections to find out host,
           | but that quickly turns into ridiculous competition.
        
             | internetter wrote:
             | > but that quickly turns into ridiculous competition.
             | 
             | Except the platform providers hold the trump card. Fuck
             | around, if they figure it out you'll be finding out.
        
           | udev4096 wrote:
           | Android is so fucking anti-privacy that they still don't have
           | an INTERNET access revoke toggle. The one they have currently
           | is broken and can easily be bypassed with google play
           | services (another highly privileged process running for no
           | reason other than to sell your soul to google). GrapheneOS
           | has this toggle luckily. Whenever you install an app, you can
           | revoke the INTERNET access at the install screen and there is
           | no way that app can bypass it
        
             | mjmas wrote:
             | Asus added this to their phones which is nice.
        
       | proxy_err wrote:
       | Its a fair point but very dynamic to sort out. This needs a full
       | research team to figure out. Or you know.. all of us combined!!
       | It is definitely a problem.
       | 
       | TINFOIL: Sometimes I always wondered if Azure or AWS used bots to
       | push site traffic hits to generate money... they know you are
       | hosted with them.. They have your info.. Send out bots to drive
       | micro accumulation. Slow boil..
        
         | luckylion wrote:
         | I think that's mostly that they don't care about having
         | malicious bots on their networks as long as they pay.
         | 
         | GCE is rare in my experience. Most bots I see are on AWS. The
         | DDOS-adjacent hyper aggressive bots that try random URLs and
         | scan for exploits tend to be on Azure or use VPNs.
         | 
         | AWS is bad when you report malicious traffic. Azure has been
         | completely unresponsive and didn't react, even for C&C servers.
        
       | aucisson_masque wrote:
       | It's interesting but so far there is no definitive proof it's
       | happening.
       | 
       | People are jumping to conclusions a bit fast over here, yes
       | technically it's possible but this kind of behavior would be
       | relatively easy to spot because the app would have to make direct
       | connections to the website it wants to scrap.
       | 
       | Your calculator app for instance connecting to CNN.com ...
       | 
       | iOS have app privacy report where one can check what connections
       | are made by app, how often, last one, etc.
       | 
       | Android by Google doesn't have such a useful feature of course,
       | but you can run third party firewall like pcapdroid, which I
       | recommend highly.
       | 
       | Macos (little snitch).
       | 
       | Windows (fort firewall).
       | 
       | Not everyone run these app obviously, only the most nerdy like
       | myself but we're also the kind of people who would report on app
       | using our device to make, what is in fact, a zombie or bot
       | network.
       | 
       | I'm not saying it's necessarily false but imo it remains a theory
       | until proven otherwise.
        
         | CharlesW wrote:
         | Botnets as a Service are absolutely happening, but as you
         | allude to, the scope of the abuse is very different on iOS
         | than, say, Windows.
        
         | abaymado wrote:
         | > iOS have app privacy report where one can check what
         | connections are made by app, how often, last one, etc.
         | 
         | How often is the average calculator app user checking there
         | Privacy Report? My guess, not many!
        
           | gruez wrote:
           | All it takes is one person to find out and raise the alarm.
           | The average user doesn't read the source code behind openssl
           | or whatever either, that doesn't mean there's no gains in
           | open sourcing it.
        
             | dewey wrote:
             | The average user is also not reading these raised "alarms".
             | And if an app has a bad name, another one will show up with
             | a different name on the same day.
        
               | aucisson_masque wrote:
               | You're on a tech forum, you must have seen one of the
               | many post about app, either on Android or iPhone, that
               | acts like spyware.
               | 
               | They happens from time to time, last one was not more
               | than two week ago where it's been shown that many app
               | were able to read the list of all other app installed on
               | a Android and that Google refused to fix that.
               | 
               | Do you really believe that an app used to make your
               | device part of a bot network wouldn't be posted over here
               | ?
        
               | dewey wrote:
               | "You're on a tech forum", that's exactly the point. The
               | "average user" is not on a tech forum though, the average
               | user opens the app store of their platform, types
               | "calculator" and installs the first one that's free.
        
             | nottorp wrote:
             | The real solution is to add a permission for network
             | access, with the default set to deny.
        
         | throwaway519 wrote:
         | Given 5his is a thing even in browser plugins, and that so very
         | few people analyse their firewalls, I'd not discount it at all.
         | Much of the world's users hve no clue and app stores are
         | notoriously bad at reacting even with publicsed malware e.g.
         | 'free' VPNs in iOS Store.
        
         | andelink wrote:
         | This is a hilariously optimistic, naive, disconnected from
         | reality take. What sort of "proof" would be sufficient for you?
         | TFA includes of course data from the authors own server logs^,
         | but it also references real SDKs and business selling this
         | exact product. You can view the pricing page yourself, right
         | next to stats on how many IPs are available for you to exploit.
         | What else do you need to see?
         | 
         | ^ edit: my mistake, the server logs I mentioned were from the
         | authors prior blog post on this topic, linked to at the top of
         | TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-
         | Botnets/
        
         | jshier wrote:
         | > iOS have app privacy report where one can check what
         | connections are made by app, how often, last one, etc.
         | 
         | Privacy reports do not include that information. They include
         | broad areas of information the app claims to gather. There is
         | zero connection between those claimed areas and what the app
         | actually does unless app review notices something that doesn't
         | match up. But none of that information is updated dynamically,
         | and it has never actually included the domains the app connects
         | to. You may be confusing it with the old domain declarations
         | for less secure HTTP connections. Once the connections met the
         | system standards you no longer needed to declare it.
        
           | zargon wrote:
           | I wasn't aware of this feature. But apparently it does
           | include that information. I just enabled it and can see the
           | domains that apps connect to. https://support.apple.com/en-
           | us/102188
        
             | hoc wrote:
             | Pretty neat, actually. Thanks for looking uo that link.
        
         | Galanwe wrote:
         | There is already a lot of proof. Just ask for a sales pitch
         | from companies selling these data and they will gladly explain
         | everything to you.
         | 
         | Go to a data conference like Neudata and you will see. You can
         | have scraped data from user devices, real-time locations,
         | credit card, Google analytics, etc.
        
       | badmonster wrote:
       | do you think there's a realistic path forward for better
       | transparency or detection--maybe at the OS level or through
       | network-level anomaly detection?
        
       | yungporko wrote:
       | it's funny, i've never heard of or thought about the possibility
       | of this happening but actually in hindsight it seems almost too
       | obvious to not be a thing.
        
       | jeroenhd wrote:
       | > So there is a (IMHO) shady market out there that gives app
       | developers on iOS, Android, MacOS and Windows money for including
       | a library into their apps that sells users network bandwidth
       | 
       | AKA "why do Cloudflare and Google make me fill out these CAPTCHAs
       | all day"
       | 
       | I don't know why Play Protect/MS Defender/whatever Apple has for
       | antivirus don't classify apps that embed such malware as such.
       | It's ridiculous that this is allowed to go on when detection is
       | so easy. I don't know a more obvious example of a trojan than an
       | SDK library making a user's device part of a botnet.
        
         | dx4100 wrote:
         | Cloudflare and Google use CAPTCHAs to sell web scrapers? I
         | don't get your point. I was under the impression the data is
         | used to train models.
        
           | cuu508 wrote:
           | Trojans in your mobile apps ruin your IP's reputation which
           | comes back to you in the form of frequent, annoying CAPTCHAs.
        
           | aloha2436 wrote:
           | The implication is that the users that are being constantly
           | presented with CAPTCHAs are experiencing that because they
           | are unwittingly proxying scrapers through their devices via
           | malicious apps they've installed.
        
             | pentae wrote:
             | .. or that other people on their network/Shared public IP
             | have installed
        
               | evgpbfhnr wrote:
               | or just that they don't run windows/mac OS with chome
               | like everyone else and it's "suspicious". I get
               | cloudflare capchas all the time with firefox on linux...
               | (and I'm pretty sure there's no such app in my home
               | network!)
        
           | jeroenhd wrote:
           | When a random device on your network gets infected with crap
           | like this, your network becomes a bot egress point, and anti
           | bot networks respond appropriately. Cloudflare, Akamai, even
           | Google will start showing CAPTCHAs for every website they
           | protect when your network starts hitting random servers with
           | scrapers or DDoS attacks.
           | 
           | This is even worse with CG-NAT if you don't have IPv6 to
           | solve the CG-NAT problem.
           | 
           | I don't think the data they collect is used to train anything
           | these days. Cloudflare is using AI generated images for
           | CAPTCHAs and Google's actual CAPTCHAs are easier for bots
           | than humans at this point (it's the passive monitoring that
           | makes it still work a little bit).
        
         | areyourllySorry wrote:
         | it's not technically malware, you agreed to it when you
         | accepted the terms of service :^)
        
           | L-four wrote:
           | It's malware it does something malicious.
        
       | panny wrote:
       | >Apple, Microsoft and Google should act.
       | 
       | Do nothing, win.
       | 
       | They are the primary benefactors buying this data since they are
       | the largest AI players.
        
       | neilv wrote:
       | Couldn't Apple and Google (and, to a lesser extent, Microsoft)
       | pretty easily shut down almost all the apps that steal bandwidth?
        
       | greesil wrote:
       | How would I know if an app on my device was doing this?
        
         | wyck wrote:
         | Install a network monitor or go even deeper and sniff packets.
        
           | greesil wrote:
           | I feel like this could be automated. Spin up a virtual device
           | on a monitored network. Install one app, click on some stuff
           | for awhile, uninstall and move onto the next. If the app
           | reaches out to a lot of random sites then flag it
           | 
           | Google could do this. I'm sure Apple could as well. Third
           | parties could for a small set of apps
        
             | jeroenhd wrote:
             | This is being done by a couple of SDKs, it'd be much easier
             | to just find and flag those SDK files. Finding apps becomes
             | a matter of a single pass scan over the application
             | contents rather than attempting to bypass the VM detection
             | methods malware is packed full of.
        
       | matheusmoreira wrote:
       | "Peer-to-business network"! Amazing. uBlock Origin gets rid of
       | this, right?
        
       | __MatrixMan__ wrote:
       | The broken thing about the web is that in order for data to
       | remain readable, a unique sysadmin somewhere has to keep a server
       | running in the face of an increasingly hostile environment.
       | 
       | If instead we had a content addressed model, we could drop the
       | uniqueness constraint. Then these AI scrapers could be gossiping
       | the data to one another (and incidentally serving it to the rest
       | of us) without placing any burden on the original source.
       | 
       | Having other parties interested in your data should make your
       | life easier (because other parties will host it for you), not
       | harder (because now you need to work extra hard to host it for
       | them).
        
         | Timwi wrote:
         | Are there any systems like that, even if experimental?
        
           | jevogel wrote:
           | IPFS
        
             | alakra wrote:
             | I had high hopes for IPFS, but even it has vectors for
             | abuse.
             | 
             | See https://arxiv.org/abs/1905.11880 [Hydras and IPFS: A
             | Decentralised Playground for Malware]
        
               | __MatrixMan__ wrote:
               | Can you point me at what you mean? I'm not immediately
               | finding something that indicates that it is not fit for
               | this use case. The fact that bad actors use it to resist
               | those who want to shut them down is, if anything, an
               | endorsement of its durability. There's a bit of overlap
               | between resisting the AI scrapers and resisting the FBI.
               | You can either have a single point of control and a
               | single point of failure, or you can have neither. If
               | you're after something that's both reliable and reliably
               | censorable--I don't think that's in the cards.
               | 
               | That's not to say that it _is_ a ready replacement for
               | the web as we know it. If you have hash-linked everything
               | then you wind up with problems trying to link things
               | together, for instance. Once two pages exist, you can 't
               | after-the-fact create a link between them because if you
               | update them to contain that link then their hashes change
               | so now you have to propagate the new hash to people. This
               | makes it difficult to do things like have a comments
               | section at the bottom of a blog post. So you've got to
               | handle metadata like that in some kind of extra layer--a
               | layer which isn't hash linked and which might be
               | susceptible to all the same problems that our current web
               | is--and then the browser can build the page from
               | immutable pieces, but the assembly itself ends up being
               | dynamic (and likely sensitive to the users preference,
               | e.g. dark mode as a browser thing not a page thing).
               | 
               | But I still think you could move maybe 95% of the data
               | into an immutable hash-linked world (think of these as
               | nodes in a graph), the remaining 5% just being tuples of
               | hashes and pubic keys indicating which pages are trusted
               | by which users, which ought to be linked to which others,
               | which are known to be the inputs and output of various
               | functions, and you know... structure stuff (these are our
               | graph's edges).
               | 
               | The edges, being smaller, might be subject to different
               | constraints than the web as we know it. I wouldn't
               | propose that we go all the way to a blockchain where
               | every device caches every edge, but it might be feasible
               | for my devices to store all of the edges for the 5% of
               | the web I care about, and your devices to store the edges
               | for the 5% that you care about... the nodes only being
               | summoned when we actually want to view them. The edges
               | can be updated when our devices contact other devices
               | (based on trust, like you know that device's owner
               | personally) and ask "hey, what's new?"
               | 
               | I've sort of been freestyling on this idea in isolation,
               | probably there's already some projects that scratch this
               | itch. A while back I made a note to check out
               | https://ceramic.network/ in this capacity, but I haven't
               | gotten down to trying it out yet.
        
         | XorNot wrote:
         | Except no one wants content addressed data - because if you
         | knew what it was you wanted, then you would already have stored
         | it. The web as we know it is an index - it's a way to discover
         | that data is available and specifically we usually want the
         | _latest_ data that 's available.
         | 
         | AI scrapers aren't trying to find things they already know
         | exist, they're trying to discover what they didn't know
         | existed.
        
           | akoboldfrying wrote:
           | > because if you knew what it was you wanted, then you would
           | already have stored it.
           | 
           | "Content-addressable" has a broader meaning than what you
           | seem to be thinking of -- roughly speaking, it applies if
           | _any function of_ the data is used as the  "address". E.g.,
           | git commits are content-addressable by their SHA1 hashes.
        
             | __MatrixMan__ wrote:
             | But when you do a "git pull" you're not pulling from
             | someplace identified by a hash, but rather a hostname. The
             | learning-about-new-hashes part has to be handled
             | differently.
             | 
             | It's a legit limitation on what content addressing can do,
             | but it's one we can overcome by just not having
             | _everything_ be content addressed. The web we have now is
             | like if you did a `git pull` every time you opened a file.
             | 
             | The web I'm proposing is like how we actually use git--
             | periodically pulling new hashes as a separate action, but
             | spending most of our time browsing content that we already
             | have hashes for.
        
           | __MatrixMan__ wrote:
           | Yes, for the reasons you describe, you can't be both a useful
           | web-like protocol and also 100% immutable/hash-linked.
           | 
           | But there's a lot middle ground to explore here. Loading a
           | modern web page involves making dozens of requests to a
           | variety of different servers, evaluating some javascript, and
           | then doing it again a few times, potentially moving several
           | Mb of data. The part people want, the thing you don't already
           | know exist, it's hidden behind that rather heavy door. It
           | doesn't have to be that way.
           | 
           | If you already know about one thing (by its cryptographic
           | hash, say) and you want to find out which other hashes it's
           | now associated with--associations that might not have existed
           | yesterday--that's much easier than we've made it. It can be
           | done:
           | 
           | - by moving kB not Mb, we're just talking about a tuple of
           | hashes here, maybe a public key and a signature
           | 
           | - without placing additional burden on whoever authored the
           | first thing, they don't even have to be the ones who
           | published the pair of hashes that your scraper is interested
           | in
           | 
           | Once you have the second hash, you can then reenter
           | immutable-space to get whatever it references. I'm not sure
           | if there's already a protocol for such things, but if not
           | then we can surely make one that's more efficient and durable
           | than what we're doing now.
        
             | XorNot wrote:
             | But we already have HEAD requests and etags.
             | 
             | It is entirely possible to serve a fully cached response
             | that says "you already have this". The problem is...people
             | don't implement this well.
        
               | __MatrixMan__ wrote:
               | People don't implement them well because they're
               | overburdened by all of the different expectations we put
               | on them. It's a problem with how DNS forces us to
               | allocate expertise. As it is, you need some kind of write
               | access on the server whose name shows up in the URL if
               | you want to contribute to it. This is how globally unique
               | names create fragility.
               | 
               | If content were handled independently of server names,
               | anyone who cares to distribute metadata for content they
               | care about can do so. One doesn't need write access, or
               | even to be on the same network partition. You could just
               | publish a link between content A and content B because
               | you know their hashes. Assembling all of this can happen
               | in the browser, subject to the user's configs re: who
               | they trust.
        
         | akoboldfrying wrote:
         | Assuming the right incentives can be found to prevent
         | widespread leeching, a distributed content-addressed model
         | indeed solves this problem, but introduces the problem of how
         | to control your own content over time. How do you get rid of a
         | piece of content? How do you modify the content at a given URL?
         | 
         | I know, as far as possible it's a good idea to have content-
         | immutable URLs. But at some point, I need to make
         | www.myexamplebusiness.com show new content. How would that
         | work?
        
           | __MatrixMan__ wrote:
           | As for how to get rid of a piece of content... I think that
           | one's a lost cause. If the goal is to prevent things that
           | make content unavailable (e.g. AI scrapers) then you end up
           | with a design that prevents things that makes content
           | unavailable (e.g. legitimate deletions). The whole point is
           | that you're not the only one participating in propagating the
           | content, and that comes with trade-offs.
           | 
           | But as for updating, you just format your URLs like so: {my-
           | public-key}/foo/bar
           | 
           | And then you alter the protocol so that the {my-public-key}
           | part resolves to the merkle-root of whatever you most
           | recently published. So people who are interested in your
           | latest content end up with a whole new set of hashes whenever
           | you make an update. In this way, it's not 100% immutable, but
           | the mutable payload stays small (it's just a bunch of hashes)
           | and since it can be verified (presumably there's a signature
           | somewhere) it can be gossiped around and remain available
           | even if your device is not.
           | 
           | You can soft-delete something just by updating whatever
           | pointed to it to not point to it anymore. Eventually most
           | nodes will forget it. But you can't really prevent a node
           | from hanging on to an old copy if they want to. But then
           | again, could you ever do that? Deleting something on on the
           | web has always been a bit of a fiction.
        
             | akoboldfrying wrote:
             | > But then again, could you ever do that?
             | 
             | True in the absolute sense, but the effect size is much
             | worse under the kind of content-addressable model you're
             | proposing. Currently, if I download something from you and
             | you later delete that thing, I can still keep my downloaded
             | copy; under your model, if _anyone ever_ downloads that
             | thing from you and you later delete that thing, with high
             | probability I can still acquire it at any later point.
             | 
             | As you say, this is by design, and there are cases where
             | this design makes sense. I think it mostly doesn't for what
             | we currently use the web for.
        
         | areyourllySorry wrote:
         | there is no incentive for different companies to share data
         | with each other, or with anyone really (facebook leeching
         | books?)
        
           | __MatrixMan__ wrote:
           | I figure we'd create that incentive by configuring our
           | devices to only talk to devices controlled by people we
           | trust. If they want the data at all, they have to gain our
           | trust, and if they want that, they have to seed the data. Or
           | you know, whatever else the agreement ends up being. Maybe we
           | make them pay us.
        
       | theteapot wrote:
       | Are ad blockers like AdBlock, uBlock effective against these?
        
         | areyourllySorry wrote:
         | i don't believe extensions can modify other extensions
        
       | 156287745637 wrote:
       | AI scrapers and "sneaker bots" are just the tip of the iceberg.
       | Why are all these entities concentrated and metastasizing from
       | just a few superhubs? Why do they look, smell and behave like
       | state-level machinery? If you've researched you'll know exactly
       | what I'm talking about.
       | 
       | Unless complicit, tech leaders (Apple Google Microsoft) have a
       | duty to respond swiftly and decisively. This has been going on
       | far too long.
        
       | _ink_ wrote:
       | How can I detect such behaviour on my devices / in my home
       | network?
        
       | gpi wrote:
       | "Infatica is partnered with Bitdefender, a global leader in
       | cybersecurity, to protect our SDK users from malicious web
       | traffic and content, including infected URLs, untrusted web
       | pages, fraudulent and phishing links, and more."
       | 
       | That's not good.
        
       | Quarrel wrote:
       | FWIW, Trend Micro wrote up a decent piece on this space in 2023.
       | 
       | It is still a pretty good lay-of-the-land.
       | 
       | https://www.trendmicro.com/vinfo/us/security/news/vulnerabil...
        
       | hinkley wrote:
       | When the enshitification initially hit the fan, I had little
       | flashbacks of Phil Zimmerman talking about Web of Trust and
       | amusing myself thinking maybe we need humans proving they're
       | humans to other humans so we know we aren't arguing with LLMs on
       | the internet or letting them scan our websites.
       | 
       | But it just doesn't scale to internet size so I'm fucked if I
       | know how we should fix it. We all have that cousin or dude in our
       | highschool class who would do anything for a bit of money and
       | introducing his 'friend' Paul who is in fact a bot whose owner
       | paid for the lie. And not like enough money to make it a moral
       | dilemma, just drinking money or enough for a new video game. So
       | once you get past about 10,000 people you're pretty much back
       | where we are right now.
        
         | akoboldfrying wrote:
         | I think it should be possible to build something that
         | generalises the idea of Web of Trust so that it's more
         | flexible, and less prone to catastrophic breakdown past some
         | scaling limit.
         | 
         | Binary "X trusts Y" statements, plus transitive closure, can
         | lead to long trust paths that we probably shouldn't actually
         | trust the endpoints of. Could we not instead assign
         | probabilities like "X trusts Y 95%", multiply probabilities
         | along paths starting from our own identity, and take the max at
         | each vertex? We could then decide whether to finally trust some
         | Z if its percentage is more than some threshold T%. (Other ways
         | of combining in-edges may be more suitable than max(); it's
         | just a simple and conservative choice.)
         | 
         | Perhaps a variant of backprop could be used to automatically
         | update either (a) all or (b) just our own weights, given new
         | information ("V has been discovered to be fraudulent").
        
           | hinkley wrote:
           | True. Perhaps a collective vote past 2 degrees of freedom out
           | where multiple parties need to vouch for the same person
           | before you believe they aren't a bot. Then you're using the
           | exponential number of people to provide diminishing weight
           | instead of increasing likelihood of malfeasance.
        
             | nottorp wrote:
             | But do we need an infinite and global web of trust?
             | 
             | How about restricting them to everyone-knows-everyone sized
             | groups, of like a couple hundred people?
             | 
             | One can be a member of multiple groups so you're not
             | actually limited. But the groups will be small enough to
             | self regulate.
        
               | hinkley wrote:
               | What's that going to do about all of the top search
               | results and a good percentage of social media traffic
               | being generated by SEO bots? Nothing.
               | 
               | You want to chat with a Dunbar number of people get
               | yourself a private discord or slack channel.
        
               | nottorp wrote:
               | The Dunbar number of people could vouch for small web
               | sites they come across. Or even for FB accounts if they
               | choose to.
        
               | hinkley wrote:
               | I suspect a lot of people here are the ones in their
               | circle who bring in a lot of the cool info that their
               | friends missed out on. This still sounds like Slack.
        
               | nottorp wrote:
               | We're talking about webs of trust aren't we? Not about
               | chat rooms.
               | 
               | I'm hypothesising that any such large scale structure
               | will be perverted by commercial interests, while having
               | multiple Dunbar sized such structures will have a chance
               | to be useful.
        
         | sfink wrote:
         | Isn't the point of the web of trust that you can do something
         | about the cousins/dudes out there? Once you discover that they
         | sold out, even once, you sever them from the web. It doesn't
         | matter if they took 20 years to succumb to the temptation, you
         | can cut them off tomorrow. And that cuts off everyone they
         | vouched for, recursively, unless there's a still-trusted vouch
         | chain to someone.
         | 
         | At least, that's the way I've always imagined it working. Maybe
         | I need to read up.
        
       | hubraumhugo wrote:
       | We all agree that AI crawlers are a big issue as they don't
       | respect any established best practices, but we rarely talk about
       | the path forward. Scraping has been around for as long as the
       | internet, and it was mostly fine. There are many very legitimate
       | use cases for browser automation and data extraction (I work in
       | this space).
       | 
       | So what are potential solutions? We're somehow still stuck with
       | CAPTCHAS, a 25 years old concept that wastes millions of human
       | hours and billions in infra costs [0].
       | 
       | How can enable beneficial automation while protecting against
       | abusive AI crawlers?
       | 
       | [0] https://arxiv.org/abs/2311.10911
        
         | udev4096 wrote:
         | Blame the "AI" companies for that. I am glad the small web is
         | pushing hard against these scrapers, with the rise of Anubis as
         | a starting point
        
           | lelanthran wrote:
           | > Blame the "AI" companies for that. I am glad the small web
           | is pushing hard towards these scrapers, with the rise of
           | Anubis as a starting point
           | 
           | Did you mean "against"?
        
             | udev4096 wrote:
             | Corrected, thanks
        
         | eastbound wrote:
         | But people don't interact with your website anymore; they as an
         | AI. So the AI crawler is a real user.
         | 
         | I say we ask Google Analytics to count an AI crawler as a real
         | view. Let's see who's most popular.
        
         | CalRobert wrote:
         | I hate this but I suspect a login-only deanonymised web (made
         | simple with chrome and WEI!) is the future. Firefox users can
         | go to hell.
        
           | ArinaS wrote:
           | We won't.
        
         | CaptainFever wrote:
         | My pet peeve is that using the term "AI crawler" for this
         | conflates things unnecessarily. There's some people who are
         | angry at it due to anti-AI bias and not wishing to share
         | information, while there are others who are more concerned
         | about it due to the large amount of bandwidth and server
         | overloading.
         | 
         | Not to mention that it's unknown if these are actually from AI
         | companies, or from people pretending to be AI companies. You
         | can set anything as your user agent.
         | 
         | It's more appropriate to mention the specific issue one haves
         | about the crawlers, like "they request things too quickly" or
         | "they're overloading my server". Then from there, it is easier
         | to come to a solution than just "I hate AI". For example, one
         | would realize that things like Anubis have existed forever,
         | they are just called DDoS protection, specifically those using
         | proof-of-work schemes (e.g. https://github.com/RuiSiang/PoW-
         | Shield).
         | 
         | This also shifts the discussion away from something that adds
         | to the discrimination against scraping in general, and more
         | towards what is actually the issue: overloading servers, or in
         | other words, DDoS.
        
           | johnnyanmac wrote:
           | It's become unbearable in the "AI era". So it's appropriate
           | to blame AI for it, ib my eyes. Especially since so much
           | defense is based aroind training LLMs.
           | 
           | It's just like how not all Ddoss's are actually hackers or
           | bots. Sometimes a server just can't take the traffic of a
           | large site flooding in. But the result is the same until
           | something is investigated.
        
           | queenkjuul wrote:
           | It's not a coincidence that this wasn't a major problem until
           | everybody and their dog started trying to build the next
           | great LLM.
        
         | jeroenhd wrote:
         | The best solution I've seen is to hit everyone with a proof of
         | work wall and whitelist the scrapers that are welcome (search
         | engines and such).
         | 
         | Running SHA hash calculations for a second or so once every
         | week is not bad for users, but with scrapers constantly
         | starting new sessions they end up spending most of their time
         | running useless Javascript, slowing the down significantly.
         | 
         | The most effective alternative to proof of work calculations
         | seems to be remote attestation. The downside is that you're
         | getting captchas if you're one of the 0.1% who disable secure
         | boot and run Linux, but the vast majority of web users will
         | live a captcha free life. This same mechanism could in theory
         | also be used to authenticate welcome scrapers rather than
         | relying on pure IP whitelists.
        
         | 0manrho wrote:
         | > So what are potential solutions?
         | 
         | It won't fully solve the problem, but with the problem
         | relatively identified, you must then ask why people are
         | engaging in this behavior. Answer: money, for the most part.
         | Therefore, follow the money and identify the financial
         | incentives driving this behavior. This leads you pretty quickly
         | to a solution most people would reject out-of-hand: turn off
         | the financial incentive that is driving the enshittification of
         | the web. Which is to say, kill the ad-economy.
         | 
         | Or at least better regulate it while also levying punitive
         | damages that are significant enough to both disuade bad-actors
         | and encourage entities to view data-breaches (or the potential
         | therein) and "leakage[0]" as something that should actually be
         | effectively secured against. Afterall, there are some upsides
         | to the ad-economy that, without it, would present some hard
         | challenges (eg, how many people are willing to pay for search?
         | what happens to the vibrant sphere of creators of all stripes
         | that are incentivized by the ad-economy? etc).
         | 
         | Personally, I can't imagine this would actually happen.
         | Pushback from monied interests aside, most people have given up
         | on the idea of data-privacy or personal-ownership of their
         | data, if they ever even cared in the first place. So, in the
         | absence of willing to do do something about the incentive for
         | this maligned behavior, we're left with few good options.
         | 
         | 0: https://news.ycombinator.com/item?id=43716704 (see comments
         | on all the various ways people's data is being
         | leaked/leached/tracked/etc)
        
         | mjaseem wrote:
         | I wrote an article about a possible proof of personhood
         | solution idea: https://mjaseem.github.io/tech/2025/04/12/proof-
         | of-humanity.....
         | 
         | The broad idea is to use zero knowledge proofs with
         | certification. It sort of flips the public key certification
         | system and adds some privacy.
         | 
         | To get into place, the powers in charge need to sway.
        
         | marginalia_nu wrote:
         | Proof-of-work works in terms of preventing large-scale
         | automation.
         | 
         | As for letting well behaved crawlers in, I've had an idea for
         | something like DKIM for crawlers. Should be possible to set up
         | a fairly cheap cryptographic solution that enables crawlers a
         | persistent identity that can't be forged.
         | 
         | Basically put a header containing first a string including
         | today's date, the crawler's IP, and a domain name, then a
         | cryptographic signature of the string. The domain has a TXT
         | record with a public key for verifying the identity. It's cheap
         | because you really only need to verify the string it once on
         | the server side, and the crawler only needs to regenerate it
         | once per day.
         | 
         | With that in place, crawlers can crawl with their reputation at
         | stake. The big problem with these rogue scrapers are that
         | they're basically impossible to identify or block, which means
         | they don't have any incentives to behave well.
        
         | caelinsutch wrote:
         | CAPTCHAS are also quickly becoming irrelevant / not enough.
         | Fingerprint based approaches seem to be the only realistic way
         | forward in the cat / mouse game
        
       | y42 wrote:
       | Let me get this straight: we want computers knowing everything,
       | to solve current and future problems, but we don't want to give
       | them access to our knowledge?
        
         | chairmansteve wrote:
         | Not sure we do.
        
         | 3np wrote:
         | I don't want your computer to know everything about me, in
         | fact.
        
         | drawfloat wrote:
         | Most people don't want computers to know everything - ask the
         | average person if they want more or less of their lives
         | recorded and stored.
        
         | lelanthran wrote:
         | > Let me get this straight: we want computers knowing
         | everything, to solve current and future problems, but we don't
         | want to give them access to our knowledge?
         | 
         | Who said that?
         | 
         | There's basically two extremes:
         | 
         | 1. We want access to all of human knowledge, now and forever,
         | in order to monetise it and make more money for us, and us
         | alone.
         | 
         | and
         | 
         | 2. We don't want our freely available knowledge sold back to
         | us, with no credits to the original authors.
        
         | jeroenhd wrote:
         | I don't want computers to know everything. Most knowledge on
         | the internet is false and entirely useless.
         | 
         | The companies selling us computers that supposedly know
         | everything should pay for their database, or they should give
         | away the knowledge they gained for free. Right now, the
         | scraping and copying is free and the knowledge is behind a
         | subscription to access a proprietary model that forms the basis
         | of their business.
         | 
         | Humanity doesn't benefit, the snake oil salesmen do.
        
       | areyourllySorry wrote:
       | further reading
       | 
       | https://krebsonsecurity.com/?s=infatica
       | 
       | https://krebsonsecurity.com/tag/residential-proxies/
       | 
       | https://spur.us/blog/
       | 
       | https://bright-sdk.com/ <- way bigger than infatica
        
       | dspillett wrote:
       | _> So there is a (IMHO) shady market out there that gives app
       | developers on iOS, Android, MacOS and Windows money for including
       | a library into their apps that sells users network bandwidth._
       | 
       | This is yet another reason why we need to be wary of popular
       | apps, add-ons, extensions, and so forth changing hands, by
       | legitimate sale or more nefarious methods. Initially innocent
       | utilities can be quickly coopted into being parts of this sort of
       | scheme.
        
       | aorth wrote:
       | In the last week I've had to deal with two large-scale influxes
       | of traffic on one particular web server in our organization.
       | 
       | The first involved requests from 300,000 unique IPs in a span of
       | a few hours. I analyzed them and found that ~250,000 were from
       | Brazil. I'm used to using ASNs to block network ranges sending
       | this kind of traffic, but in this case they were spread thinly
       | over 6,000+ ASNs! I ended up blocking all of Brazil (sorry).
       | 
       | A few days later this same web server was on fire again. I
       | performed the same analysis on IPs and found a similar number of
       | unique addresses, but spread across Turkey, Russia, Argentina,
       | Algeria and many more countries. What is going on?! Eventually I
       | _think_ I found a pattern to identify the requests, in that they
       | were using ancient Chrome user agents. Chrome 40, 50, 60 and up
       | to 90, all released 5 to 15 years ago. Then, just before I could
       | implement a block based on these user agents, the traffic
       | stopped.
       | 
       | In both cases the traffic from datacenter networks was limited
       | because I already rate limit a few dozen of the larger ones.
       | 
       | Sysadmin life...
        
         | rollcat wrote:
         | Try Anubis: <https://anubis.techaro.lol>
         | 
         | It's a reverse proxy that presents a PoC challenge to every new
         | visitor. It shifts the initial cost of accessing your server's
         | resources back at the client. Assuming your uplink can handle
         | 300k clients requesting a single 70kb web page, it should solve
         | most of your problems.
         | 
         | For science, can you estimate your _peak_ QPS?
        
           | marginalia_nu wrote:
           | Anubis is a good choice because it whitelists legitimate and
           | well behaved crawlers based on IP + user-agent. Cloudflare
           | works as well in that regard but then you're MITM:ing all
           | your visitors.
        
           | Imustaskforhelp wrote:
           | Also, I was just watching brodie robertson video about how
           | United Nations has this random search page of unesco which
           | actually has anubis.
           | 
           | Crazy how I remember the HN post where anubis's blog post was
           | first made. Though, I always thought it was a bit funny with
           | anime and it was made by frustration of (I think AWS? AI
           | scrapers who won't follow general rules and it was constantly
           | giving requests to his git server and it actually made his
           | git server down I guess??) I didn't expect it to blow up to
           | ... UN.
        
             | xena wrote:
             | Her*
             | 
             | It was frustration at AWS' Alexa team and their abuse of
             | the commons. Amusingly if they had replied to my email
             | before I wrote my shitpost of an implementation this all
             | could have turned out vastly differently.
        
         | luckylion wrote:
         | I've seen a few attacks where the operators placed malicious
         | code on high-traffic sites (e.g. some government thing, larger
         | newspapers), and then just let browsers load your site as an
         | img. Did you see images, css, js being loaded from these IPs?
         | If they were expecting images, they wouldn't parse the HTML and
         | not load other resources.
         | 
         | It's a pretty effective attack because you get large numbers of
         | individual browsers to contribute. Hosters don't care, so
         | unless the site owners are technical enough, they can stay
         | online quite a bit.
         | 
         | If they work with Referrer Policy, they should be able to mask
         | themselves fairly well - the ones I saw back then did not.
        
       | jgalt212 wrote:
       | I blame the VCs. They don't stop, and implicitly encourage,
       | website-crushing scrapers among their funded ventures.
       | 
       | It's not a crime if we do it with an app
       | 
       | https://pluralistic.net/2025/01/25/potatotrac/#carbo-loading
        
       | reincoder wrote:
       | I work for IPinfo (a commercial service). We offer a residential
       | proxy detection service, but it costs money.
       | 
       | If you are being bombarded by suspicious IP addresses, please
       | consider using our free service and blocking IP addresses by ASN
       | or Country. I think ASN is a common parameter for malicious IP
       | addresses. If you do not have time to explore our services/tools
       | (it is mostly just our CLI: https://github.com/ipinfo/cli),
       | simply paste the IP addresses (or logs) in plain text, send it to
       | me and I will let you know the ASNs and corresponding ranges to
       | block.
        
         | throwaway74663 wrote:
         | Blocking countries is such a poorly disguised form of racism.
         | Funny how it's always the brown / yellow people countries that
         | get blocked, and never the US, despite it being one of the
         | leading nations in malicious traffic.
        
       ___________________________________________________________________
       (page generated 2025-04-20 23:01 UTC)