[HN Gopher] The Web Is Broken - Botnet Part 2
___________________________________________________________________
The Web Is Broken - Botnet Part 2
Author : todsacerdoti
Score : 387 points
Date : 2025-04-19 18:59 UTC (1 days ago)
(HTM) web link (jan.wildeboer.net)
(TXT) w3m dump (jan.wildeboer.net)
| api wrote:
| This is nasty in other ways too. What happens when someone uses
| these B2P residential proxies to commit crimes that get traced
| back to you?
|
| Anything incorporating anything like this is malware.
| reconnecting wrote:
| Many years ago cybercriminals used to hack computers to use
| them as residential proxies, now they purchase them online as a
| service.
|
| In most cases they are used for conducting real financial
| crimes, but the police investigators are also aware that there
| is a very low chance that sophisticated fraud is committed
| directly from a residential IP address.
| kastden wrote:
| Are there any lists with known c&c servers for these services
| that can be added to Pihole/etc?
| udev4096 wrote:
| You can use one of the list from here:
| https://github.com/hagezi/dns-blocklists
| Liftyee wrote:
| I don't know if I should be surprised about what's described in
| this article, given the current state of the world. Certainly I
| didn't know about it before, and I agree with the article's
| conclusion.
|
| Personally, I think the "network sharing" software bundled with
| apps should fall into the category of potentially unwanted
| applications along with adware and spyware. All of the above "tag
| along" with something the user DID want to install, and quietly
| misuse the user's resources. Proxies like this definitely have an
| impact for metered/slow connections - I'm tempted to start
| Wireshark'ing my devices now to look for suspicious activity.
|
| There should be a public repository of apps known to have these
| shady behaviours. Having done some light web scraping for
| archival/automation before, it's a pity that it'll become
| collateral damage in the anti-AI-botfarm fight.
| zzo38computer wrote:
| I agree, this should be called spyware, and malware. There are
| many other kind of software that also should, but netcat and
| ncat (probably) aren't malware.
| akoboldfrying wrote:
| I agree, but the harm done to the users is only one part of the
| total harm. I think it's quite plausible that many users
| wouldn't mind some small amount of their bandwidth being used,
| if it meant being able to use a handy browser extension that
| they would otherwise have to pay actual dollars for -- but the
| harm done to those running the servers remains.
| arewethereyeta wrote:
| I have some success in catching most of them at
| https://visitorquery.com
| lq9AJ8yrfs wrote:
| I went to your website.
|
| Is the premise that users should not be allowed to use vpns in
| order to participate in ecommerce?
| arewethereyeta wrote:
| Nobody said that, it's your choice to take whatever action
| fits your scenario. I have clients where VPNs are blocked
| yes, it depends on the industry, fraud rate, chargeback rates
| etc.
| ivas wrote:
| Checked my connection via VPN by Google/Cloudflare WARP:
| "Proxy/VPN not detected"
| arewethereyeta wrote:
| Could be, I don't claim 100% success rate. I'll have a look
| at one of those and see why I missed it. Thank you for
| letting me know.
| nickphx wrote:
| measuring latency between different endpoints? I see the
| webrtc turn relay request..
| karmanGO wrote:
| Has anyone tried to compile a list of software that uses these
| libraries? It would be great to know what apps to avoid
| arewethereyeta wrote:
| No but here's the thing. Being in the industry for many years I
| know they are required to mention it in the TOS when using the
| SDKs. A crawler pulling app TOSs and parsing them could be a
| thing. List or not, it won't be too useful outside this tech
| community.
| mzajc wrote:
| In the case of Android, exodus has one[1], though I couldn't
| find the malware library listed in TFA. Aurora Store[2], a FOSS
| Google Play Store client, also integrates it.
|
| [1] https://reports.exodus-privacy.eu.org/en/trackers/ [2]
| https://f-droid.org/packages/com.aurora.store/
| takluyver wrote:
| That seems to be looking at tracking and data collection
| libraries, though, for things like advertising and crash
| reporting. I don't see any mention of the kind of 'network
| sharing' libraries that this article is about. Have I missed
| it?
| lelanthran wrote:
| > Has anyone tried to compile a list of software that uses
| these libraries? It would be great to know what apps to avoid
|
| I wouldn't mind reading a comprehensive report on SOTA with
| regard to bot-blocking.
|
| Sure, there's Anubis (although someone elsethread called it a
| half-measure, and I'd like to know why), there's captcha's,
| there's relying on a monopoly (cloudflare, etc) who probably
| also wants to run their own bots at some point, but what else
| is there?
| il-b wrote:
| A good portion of free VPN apps sell their traffic. This was
| the thing even before the AI bot explosion.
| amiga-workbench wrote:
| What is the point of app stores holding up releases for review if
| they don't even catch obvious malware like this?
| SoftTalker wrote:
| Money
| _Algernon_ wrote:
| They pretend to do a review to justify their 30% cartel tax.
| klabb3 wrote:
| Oh no, they review thoroughly, to make sure you don't try to
| avoid the tax.
| politelemon wrote:
| Their marketing tells you it's for protection. What they fail
| to omit is it's for _their_ revenue protection - observe that
| as long as you do not threaten their revenue models, or the
| revenue models of their partners, you are allowed through. It
| has never been about the users or developers.
| charcircuit wrote:
| The definition of malware is fuzzy.
| wyck wrote:
| This isn't obvious, 99% of apps make multiple calls to multiple
| services, and these SDK's are embedded into the app. How can
| you tell whats legit outbound/inbound? Doing a fingerprint
| search for the worst culprits might help catch some, but it
| would likely be a game of cat and mouse.
| nottorp wrote:
| > How can you tell whats legit outbound/inbound?
|
| If the app isn't a web browser, none are legit?
| vlan121 wrote:
| when the shit hits the fan, this seems like the product.
| ChrisMarshallNY wrote:
| _> So if you as an app developer include such a 3rd party SDK in
| your app to make some money -- you are part of the problem and I
| think you should be held responsible for delivering malware to
| your users, making them botnet members._
|
| I suspect that this goes for _many_ different SDKs. Personally, I
| am really, _really_ sick of hearing "That's a _solved_ problem!
| ", whenever I mention that I tend to "roll my own," as opposed to
| including some dependency, recommended by some jargon-addled
| dependency addict.
|
| Bad actors _love_ the dependency addiction of modern developers,
| and have learned to set some pretty clever traps.
| duskwuff wrote:
| That may be true but I think you're missing the point here.
|
| The "network sharing" behavior in these SDKs is the sole
| purpose of the SDK. It isn't being included as a surprise along
| with some other desirable behavior. What needs to stop is
| developers including these SDKs as a secondary revenue source
| in free or ad-supported apps.
| ChrisMarshallNY wrote:
| _> I think you 're missing the point here_
|
| Doubt it. This is just one -of many- carrots that are used to
| entice developers to include dodgy software into their apps.
|
| The problem is a _lot_ bigger than these libraries. It 's an
| endemic cultural issue. Much more difficult to quantify or
| fix.
| sixtyj wrote:
| Malware, botnets... it is very similar. And people including
| developers are - in 80 per cent - eagier to make money,
| because... Is greed good? No, it isn't. It is a plague.
| II2II wrote:
| You're a developer who devoted time to develop a piece of
| software. You discover that you are not generating any income
| from it: few people can even find it in the sea of similar
| apps, few of those are willing to pay for it, and those who
| are willing to pay for it are not willing to pay much. To
| make matters worse, you're going to lose a cut of what is
| paid to the middlemen who facilitate the transaction.
|
| Is that greed?
|
| I can find many reasons to be critical of that developer,
| things like creating a product for a market segment that is
| saturated, and likely doing so because it is low hanging
| fruit (both conceptually and in terms of complexity). I can
| be critical of their moral judgement for how they decided to
| generate income from their poor business judgment. But I
| don't thinks it's right to automatically label them as
| greedy. They _may_ be greedy, but they may also be trying to
| generate income from their work.
| andelink wrote:
| > Is that greed?
|
| Umm, yes? You are not owed anything in this life, certainly
| not income for your choice to spend your time on building a
| software product no one asked for. Not making money on it
| is a perfectly fine outcome. If you desperately need
| guaranteed money, don't build an app expecting it to sell;
| get a job.
| klabb3 wrote:
| > If you desperately need guaranteed money, don't build
| an app expecting it to sell; get a job.
|
| Technically true but a bit of perspective might help. The
| consumer market is distorted by free (as in beer) apps
| that does a bunch of shitty things that should in many
| cases be illegal or require much more informed consent
| than today, like tracking everything they can. Then you
| have VC funded "free" as well, where the end game is to
| raise prices slowly to boil the frog. Then you have loss
| leaders from megacorps, and a general anti-competitive
| business culture.
|
| Plus, this is not just in the Wild West shady places,
| like the old piratebay ads. The top result for "timer" on
| the App Store (for me) is indeed a timer app, but with
| IAP of $800/y subscription... facilitated by Apple Inc,
| who gets 15-30% of the bounty.
|
| Look, the point is it's almost impossible to break into
| consumer markets because everyone else is a predator.
| It's a race to the bottom, ripping off clueless
| customers. Everyone would benefit from a fairer market.
| Especially honest developers.
| what wrote:
| >$800/year IAP
|
| That's got to be money laundering or something else
| illicit? No one is actually paying that for a timer app?
| klabb3 wrote:
| No I think it's designed to catch misclicks and children
| operating the phone and such, sold as $17/week possibly
| masquerading as one-time payment. They pay for App Store
| ads for it too.
| econ wrote:
| I prefer to focus on the technical shortcomings.
|
| We could have people ask for software in a more
| convenient way.
|
| Not making money could be an indication the software
| isn't useful, but what if it is? What can the collective
| do in that zone?
|
| I imagine one could ask and pay for unwritten software
| then get a refund if it doesn't materialize before your
| deadline.
|
| Why is discovery (of many creation) willingly handed over
| to a hand full of mega corps?? They seem to think I want
| to watch and read about Trump and Elon every day.
|
| Promoting something because it is good is a great example
| of a good thing that shouldn't pay.
| hliyan wrote:
| There was an earlier discussion on HN about whether
| advertising should be more heavily regulated (or even banned
| outright). I'm starting to wonder whether most of the
| problems on the Web are negative side effects of the
| incentives created by ads (including all botnets, except
| those that enable ransomeware and espionage). Even the
| current worldwide dopamine addition is driven by apps and
| content created for engagement, whose entire purpose is ad
| revenue.
| rsedgwick wrote:
| "Bad actors love the dependency addiction of modern developers"
|
| Brings a new meaning to dependency injection.
| rapind wrote:
| I mean, as far as patterns go, dependency injection is also
| quite bad.
| rjbwork wrote:
| Elaborate on this please. It seems a great boon in having
| pushed the OO world towards more functional principles, but
| I'm willing to hear dissent.
| layer8 wrote:
| How is dependency injection more functional?
|
| My personal beef is that most of the time it acts like
| hidden global dependencies, and the configuration of
| those dependencies, along with their lifetimes, becomes
| harder to understand by not being traceable in the source
| code.
| kortilla wrote:
| Because you're passing functions to call.
| layer8 wrote:
| ??? What functions?
|
| To me it's rather anti-functional. Normally, when you
| instantiate a class, the resulting object's behavior only
| depends on the constructor arguments you pass it (= the
| behavior is purely a function of the arguments). With
| dependency injection, the object's behavior may depend on
| some hidden configuration, and not even inspecting the
| class' source code will be able to tell you the source of
| that bevavior, because there's only an _@Inject_
| annotation without any further information.
|
| Conversely, when you modify the configuration of which
| implementation gets injected for which interface type,
| you potentially modify the behavior of many places in the
| code (including, potentially, the behavior of
| dependencies your project may have), without having
| passed that code any arguments to that effect. A function
| executing that code suddenly behaves differently, without
| any indication of that difference at the call site, or
| traceable from the call site. That's the opposite of the
| functional paradigm.
| squeaky-clean wrote:
| > because there's only an @Inject annotation without any
| further information
|
| It sounds like you have a gripe with a particular DI
| framework and not the idea of Dependency Injection.
| Because
|
| > Normally, when you instantiate a class, the resulting
| object's behavior only depends on the constructor
| arguments you pass it (= the behavior is purely a
| function of the arguments)
|
| With Dependency Injection this is generally still true,
| even more so than normal because you're making the
| constructor's dependencies explicit in the arguments. If
| you have a class CriticalErrorLogger(), you can't
| directly tell where it logs to, is it using a flat file
| or stdout or a network logger? If you instead have a
| class CriticalErrorLogger(logger *io.writer), then when
| you create it you know exactly what it's using to log
| because you had to instantiate it and pass it in.
|
| Or like Kortilla said, instead of passing in a class or
| struct you can pass in a function, so using the same
| example, something like CriticalErrorLogger(fn write)
| layer8 wrote:
| I don't quite understand your example, but I don't think
| the particulars make much of a difference. We can go with
| the most general description: With dependency injection,
| you define points in your code where dependencies are
| injected. The injection point is usually a variable (this
| includes the case of constructor parameters), whose value
| (the dependency) will be set by the dependency injection
| framework. The behavior of the code that reads the
| variable and hence the injected value will then depend on
| the specific value that was injected.
|
| My issue with that is this: From the point of view of the
| code accessing the injected value (and from the point of
| view of that code's callers), the value appears like out
| of thin air. There is no way to trace back from that code
| where the value came from. Similarly, when defining which
| value will be injected, it can be difficult to trace all
| the places where it will be injected.
|
| In addition, there are often lifetime issues involved,
| when the injected value is itself a stateful object, or
| may indirectly depend on mutable, cached, or lazy-
| initialized, possibly external state. The time when the
| value's internal state is initialized or modified, or
| whether or not it is shared between separate injection
| points, is something that can't be deduced from the
| source code containing the injection points, but is often
| relevant for behavior, error handling, and general
| reasoning about the code.
|
| All of this makes it more difficult to reason about the
| injected values, and about the code whose behavior will
| depend on those values, from looking at the source code.
| squeaky-clean wrote:
| > whose value (the dependency) will be set by the
| dependency injection framework
|
| I agree with your definition except for this part, you
| don't need any framework to do dependency injection. It's
| simply the idea that instead of having an abstract base
| class CriticalErrorLogger, with the concrete
| implementations of StdOutCriticalErrorLogger,
| FileCriticalErrorLogger, AwsCloudwatchCriticalErrorLogger
| which bake their dependency into the class design; you
| instead have a concrete class CriticalErrorLogger(dep
| *dependency) and create dependency objects externally
| that implement identical interfaces in different ways.
| You do text formatting, generating a traceback, etc, and
| then call dep.write(myFormattedLogString), and the
| dependency handles whatever that means.
|
| I agree with you that most DI frameworks are too clever
| and hide too much, and some forms of DI like setter
| injection and reflection based injection are instant
| spaghetti code generators. But things like Constructor
| Injection or Method Injection are so simple they often
| feel obvious and not like Dependency Injection even
| though they are. I love DI, but I hate DI frameworks;
| I've never seen a benefit except for retrofitting legacy
| code with DI.
|
| And yeah it does add the issue or lifetime management.
| That's an easy place to F things up in your code using DI
| and requires careful thought in some circumstances. I
| can't argue against that.
|
| But DI doesn't need frameworks or magic methods or
| attributes to work. And there's a lot of situations where
| DI reduces code duplication, makes refactoring and
| testing easier, and actually makes code feel less magical
| than using internal dependencies.
|
| The basic principle is much simpler than most DI
| frameworks make it seem. Instead of initializing a
| dependency internally, receive the dependency in some
| way. It can be through overly abstracted layers or magic
| methods, but it can also be as simple as adding an
| argument to the constructor or a given method that takes
| a reference to the dependency and uses that.
|
| edit: made some examples less ambiguous
| layer8 wrote:
| The pattern you are describing is what I know as the
| Strategy pattern [0]. See the example there with the
| _Car_ class that takes a _BrakeBehavior_ as a constructor
| parameter [1]. I have no issue with that and use it
| regularly. The Strategy pattern precedes the notion of
| dependency injection by around ten years.
|
| The term Dependency Injection was coined by Martin Fowler
| with this article:
| https://martinfowler.com/articles/injection.html. See how
| it presents the examples in terms of wiring up components
| from a configuration, and how it concludes with stressing
| the importance of "the principle of separating service
| configuration from the use of services within an
| application". The article also presents constructor
| injection as only one of several forms of dependency
| injection.
|
| That is how everyone understood dependency injection when
| it became popular 10-20 years ago: A way to customize
| behavior at the top application/deployment level by
| configuration, without having to pass arguments around
| throughout half the code base to the final object that
| uses them.
|
| Apparently there has been a divergence of how the term is
| being understood.
|
| [0] https://en.wikipedia.org/wiki/Strategy_pattern
|
| [1] The fact that _Car_ is abstract in the example is
| immaterial to the pattern, and a bit unfortunate in the
| Wikipedia article, from a didactic point of view.
| squeaky-clean wrote:
| They're not really exclusive ideas. The Constructor
| Injection section in Fowler's article is exactly the same
| as the Strategy pattern. But no one talks about the
| Strategy pattern anymore, it's all wrapped into the idea
| of DI and that's what caught on.
| morsecodist wrote:
| It was interesting reading this exchange. I have a
| similar understanding of DI to you. I have never even
| heard of a DI framework and I have trouble picturing what
| it would look like. It was interesting to watch you two
| converge on where the disconnect was.
| rjbwork wrote:
| Usually when people refer to "DI Frameworks" they're
| referring to Inversion of Control (IoC) containers.
| layer8 wrote:
| I'm curious, which language/dev communities did you pick
| this up from? Because I don't think it's universal,
| certainly not in the Java world.
|
| DI in Java is almost completely disconnected from what
| the Strategy pattern is, so it doesn't make sense to use
| one to refer to the other there.
| naasking wrote:
| How is the configuration hidden? Presumably you
| configured the DI container.
| rjbwork wrote:
| Dependency injection is just passing your dependencies in
| as constructor arguments rather than as hidden
| dependencies that the class itself creates and manages.
|
| It's equivalent to partial application.
|
| An uninstantiated class that follows the dependency
| injection pattern is equivalent to a family of functions
| with N+Mk arguments, where Mk is the number of parameters
| in method k.
|
| Upon instantiation by passing constructor arguments,
| you've created a family of functions each with a distinct
| sets of Mk parameters, and N arguments in common.
| theteapot wrote:
| > Dependency injection is just passing your dependencies
| in as constructor arguments rather than as hidden
| dependencies that the class itself creates and manages.
|
| That's the best way to think of it fundamentally. But the
| main implication of that which is at some point
| _something_ has to know how to resolve those dependencies
| - i.e. they can 't just be constructed and then injected
| from magic land. So global
| cradles/resolvers/containers/injectors/providers
| (depending on your language and framework) are also
| typically part and parcel of DI, and that can have some
| big implications on the structure of your code that some
| people don't like. Also you can inject functions and
| methods not just constructors.
| rjbwork wrote:
| That's because those containers are convenient to use. If
| you don't like using them, you can configure the entire
| application statically from your program's entry point if
| you prefer.
| layer8 wrote:
| I don't understand what you're describing has to do with
| dependency injection. See
| https://news.ycombinator.com/item?id=43740196.
| KronisLV wrote:
| > Dependency injection is just passing your dependencies
| in as constructor arguments rather than as hidden
| dependencies that the class itself creates and manages.
|
| This is all well and good, but you also need a bunch of
| code that handles resolving those dependencies, which
| oftentimes ends up being complex and hard to debug and
| will also cause runtime errors instead of compile time
| errors, which I find to be more or less unacceptable.
|
| Edit: to elaborate on this, I've seen DI frameworks _not_
| be used in "enterprise" projects a grand total of _zero_
| times. I've done DI directly in personal projects and it
| was fine, but in most cases you don't get to make that
| choice.
|
| Just last week, when working on a Java project that's
| been around for a decade or so, there were issues after
| migrating it from Spring to Spring Boot - when compiled
| through the IDE and with the configuration to allow lazy
| dependency resolution it would work (too many circular
| dependencies to change the code instead), but when built
| within a container by Maven that same exact code and
| configuration would no longer work and injection would
| fail.
|
| I'm hoping it's not one of those weird JDK platform bugs
| but rather an issue with how the codebase is compiled
| during the container image build, but the issue is mind
| boggling. More fun, if you take the .jar that's built in
| the IDE and put it in the container, then everything
| works, otherwise it doesn't. No compilation warnings,
| most of the startup is fine, but if you build it in the
| container, you get a DI runtime error about no lazy
| resolution being enabled even if you hardcode the setting
| to be on in Java code: https://docs.spring.io/spring-
| boot/api/kotlin/spring-boot-pr...
|
| I've also seen similar issues before containers, where
| locally it would run on Jetty and use Tomcat on server
| environments, leading to everything compiling and working
| locally but throwing injection errors on the server.
|
| What's more, it's not like you can (easily) put a
| breakpoint on whatever is trying to inject the
| dependencies - after years of Java and Spring I grow more
| and more convinced that anything that doesn't generate
| code that you can inspect directly (e.g. how you can look
| at a generated MapStruct mapper implementation) is
| somewhat user hostile and will complicate things. At
| least modern Spring Boot is good in that more of the
| configuration is just code, because otherwise good luck
| debugging why some XML configuration is acting weird.
|
| In other words, DI can make things more messy due to a
| bunch of technical factors around how it's implemented
| (also good luck reading those stack traces), albeit even
| in the case of Java something like Dagger feels more sane
| https://dagger.dev/ despite never really catching on.
|
| Of course, one could say that circular dependencies or
| configuration issues are project specific, but given
| enough time and projects you will almost inevitably get
| those sorts of headaches. So while the theory of DI is
| nice, you can't just have the theory without practice.
| vbezhenar wrote:
| Dependency injection is not hidden. It's quite the
| opposite: dependency injection lists explicitly all the
| dependencies in a well defined place.
|
| Hidden dependencies are: untyped context variable; global
| "service registry", etc. Those are hidden, the only way
| to find out which dependencies given module has is to
| carefully read its code and code of all called functions.
| hliyan wrote:
| Inclined to agree. Consider that a singleton dependency
| is essentially a global, and differs from a traditional
| global, only in that the reference is kept in a container
| and supplied magically via a constructor variable. Also
| consider that constructor calls are now outside the
| application layer frames of the callstack, in case you
| want to trace execution.
| rapind wrote:
| It starts off feeling like a superpower allowing to to
| change a system's behaviour without changing its code
| directly. It quickly devolves into a maintenance
| nightmare though every time I've encountered it.
|
| I'm talking more specifically about Aspect Oriented
| Programming though and DI containers in OOP, which seemed
| pretty clever in theory, but have a lot of issues in
| reality.
|
| I take no issues with currying in functional programming.
| rjbwork wrote:
| In terms of aspects I try to keep it limited to already
| existing framework touch points for things like logging,
| authentication and configuration loading. I find that
| writing middleware that you control with declarative
| attributes can be good for those use cases.
|
| There are other good uses of it but it absolutely can get
| out of control, especially if implemented by someone
| whose just discovered it and wants to use it for
| everything.
| ironSkillet wrote:
| I have found that the dependency injection pattern makes it
| far easier to write clean tests for my code.
| ryandrake wrote:
| I'm constantly amazed at how careless developers are with
| pulling 3rd party libraries into their code. Have you audited
| this code? Do you know everything it does? Do you know what
| security vulnerabilities exist in it? On what basis do you
| trust it to do what it says it is doing and nothing else?
|
| But nobody seems to do this diligence. It's just "we are in a
| rush. we need X. dependency does X. let's use X." and that's
| it!
| ClumsyPilot wrote:
| > Have you audited this code?
|
| Wrong question. "Are you paid to audit this code?" And "if
| you fail to audit this code, who'se problem is it?"
| ryandrake wrote:
| I think developers are paid to competently deliver software
| to their employer, and part of that competence is properly
| vetting the code you are delivering. If I wrote code that
| ended up having serious bugs like crashing, I'd expect to
| have at least a minimum consequence, like root causing it
| and/or writing a postmortem to help avoid it in the future.
| Same as I'd expect if I pulled in a bad dependency.
| baumy wrote:
| Your expectations do not match the employment market as I
| have ever experienced it.
|
| Have you ever worked anywhere that said "go ahead and
| slow down on delivering product features that drive
| business value so you can audit the code of your
| dependencies, that's fine, we'll wait"?
|
| I haven't.
| ryandrake wrote:
| Yea, and that's the problem. If such absolute rock bottom
| minimal expectations (know what the code does) are seen
| as too slow and onerous, the industry is cooked!
| ClumsyPilot wrote:
| Yeah, about that, businesses are pushing and introducing
| code written by AI/LLM now, so now you won't even know
| what your own code does.
| djeastm wrote:
| Due diligence is a sliding scale. Work at a webdev agency
| is "get it done as fast as possible for this MVP we
| need". Work at NASA or a biomedical device company? Every
| line of code is triple-checked. It's entirely dependent
| on the cost/benefit analysis.
| Funes- wrote:
| "who'se" is wild.
| SoftTalker wrote:
| If a car manufacturer sources a part from a third party,
| and that part has a serious safety problem, who will the
| customer blame? And who will be responsible for the recall
| and the repairs?
| ClumsyPilot wrote:
| But we aren't car business, am we are in joker business.
|
| When was the last time producer of an app was held
| legally accountable for negligence, had to pay
| compensation and damages, etc?
| vinnymac wrote:
| This is especially true for script kiddies, which is why I am
| so thankful for https://e18e.dev/
|
| AI is making this worse than ever though, I am constantly
| having to tell devs that their work is failing to meet
| requirements, because AI is just as bad as a junior dev when it
| comes to reaching for a dependency. It's like we need training
| wheels for the prompts juniors are allowed to write.
| zzo38computer wrote:
| I agree that there are things with too many dependencies and I
| try to avoid that. I think it is a good idea to minimize how
| many dependencies are needed (even indirect dependencies;
| however, in some cases a dependency is not a specific
| implementation, and in that case indirect dependencies are less
| of a problem, although having a good implementation with less
| indirect dependencies is still beneficial). I may write my own,
| in many cases. However, another reason for writing my own is
| because of other kind of problems in the existing programs. Not
| all problems are malicious; many are just that they do not do
| what I need, or do too much more than what I need, or both.
| (However, most of my stuff is C rather than JavaScript; the
| problem seems to be more severe with JavaScript, but I do not
| use that much.)
| bloppe wrote:
| These are kind of separate issues. Apps using Infatica _know_
| that they 're selling access to their users' bandwidth. It's
| intentional.
| jonplackett wrote:
| How is this not just illegal? Surely there's something in GDPR
| that makes this not allowed.
| Retr0id wrote:
| iiuc, they do actually ask the user for permission
| fc417fc802 wrote:
| Which is ironic considering that I strongly disagree with one
| of the primary walled garden justifications, used
| particularly in the case of Apple, which amounts to "the end
| user is too stupid to decide on his own". Unfortunately, even
| if I disagree with it as a guiding principle sometimes that
| statement proves true.
| klabb3 wrote:
| It's not about stupidity, but practicality. People can't
| give informed consent for 100 ToS for different companies,
| and keep those up to date. That's why there are laws.
| SoftTalker wrote:
| No doubt in a dense wall of text that the user must accept to
| use the application, or worse is deemed to have accepted by
| using the application at all.
| zahlman wrote:
| > I am now of the opinion that every form of web-scraping should
| be considered abusive behaviour and web servers should block all
| of them. If you think your web-scraping is acceptable behaviour,
| you can thank these shady companies and the "AI" hype for moving
| you to the bad corner.
|
| I imagine that e.g. Youtube would be happy to agree with this.
| Not that it would turn them against AI generally.
| BlueTemplar wrote:
| Yeah, also this means the death of archival efforts like the
| Internet Archive.
| jeroenhd wrote:
| Welcome scrapers (IA, maybe Google and Bing) can publish
| their IP addresses and get whitelisted. Websites that want to
| prevent being on the Internet Archive can pretty much just
| ask for their website to be excluded (even retroactively).
|
| [Cloudflare](https://developers.cloudflare.com/cache/troubles
| hooting/alwa...) tags the internet archive as operating from
| 207.241.224.0/20 and 208.70.24.0/21 so disabling the bot-
| prevention framework on connections from there should be
| enough.
| trinsic2 wrote:
| This sounds like it would be a good idea. Create a
| whitelist of IPs and block the rest.
| realusername wrote:
| That's basically asking to close the market in favor of the
| current actors.
|
| New actors have the right to emerge.
| 0dayz wrote:
| No they don't.
|
| There's no rule that you have to let anyone in who claims
| to be a web crawler.
| areyourllySorry wrote:
| which is why they will stop claiming to be one.
| chii wrote:
| so what happened to competition fostering a better
| outcome for all then?
| realusername wrote:
| So who decides that you can be one? Right now it's
| Cloudflare, a litteral monopoly...
|
| The truth is that I sympathize with the people trying to
| use mobile connections to bypass such a cartel.
|
| What Cloudflare is doing now is worse than the web
| crawlers themselves and the legality of blocking crawlers
| with a monopoly is dubious at best.
| jeroenhd wrote:
| They have the right to try to convince me to let them
| scrape me. Most of the time they're thinly veiled data
| traders. I haven't seen any new company try to scrape my
| stuff since maybe Kagi.
|
| Kagi is welcome to scrape from their IP addresses. Other
| bots that behave are fine too (Huawei and various other
| Chinese bots don't and I've had to put an IP block on
| those).
| areyourllySorry wrote:
| a large chunk of internet archive's snapshots are from
| archiveteam, where "warriors" bring their own ips (and they
| crawl respectfully!). save page now is important too, but
| you don't realise what is useful until you lose it.
| Centigonal wrote:
| yeah, but you can't, that's the problem. Plenty of service
| operators would like to block every scraper that doesn't obey
| their robots.txt, but there's no good way to do that without
| blocking human traffic too (Anubis et al are okay, but they are
| half-measures).
|
| On a separate note, I believe open web scraping has been a
| massive benefit to the internet on net, and almost entirely
| positive pre-2021. Web scraping & crawling enables search
| engines, services like Internet Archive, walled-garden-busting
| (like Invidious, yt-dlp, and Nitter), mashups (Spotube, IFTT,
| and Plaid would have been impossible to bootstrap without web
| scraping), and all kinds of interesting data science projects
| (e.g. scraping COVID-19 stats from local health departments to
| patch together a picture of viral spread for epidemiologists).
| udev4096 wrote:
| We should have a way to verify the user-agents of the valid
| and useful scrapers such as Internet Archive by having some
| kind of cryptographic signature of their user-agents and
| being able to validate it with any reverse proxy seems like a
| good start
| nottorp wrote:
| Self signed, I hope.
|
| Or do you want a central authority that decides who can do
| new search engines?
| udev4096 wrote:
| Using DANE is probably the best idea even though it's
| still not mainstream
| lelanthran wrote:
| > Plenty of service operators would like to block every
| scraper that doesn't obey their robots.txt, but there's no
| good way to do that without blocking human traffic too
| (Anubis et al are okay, but they are half-measures)
|
| Why is Anubis-type mitigations a half-measure?
| Centigonal wrote:
| Anubis, go-away, etc are great, don't get me wrong -- but
| what Anubis does is impose a cost on every query. The
| website operator is hoping that the compute will have a
| rate-limiting effect on scrapers while minimally impacting
| the user experience. It's almost like chemotherapy, in that
| you're poisoning everyone in the hope that the aggressive
| bad actors will be more severely affected than the less
| aggressive good actors. Even the Anubis readme calls it a
| nuclear option. In practice it appears to work pretty well,
| which is great!
|
| It's a half-measure because:
|
| 1. You're slowing down scrapers, not blocking them. They
| will still scrape your site content in violation of
| robots.txt.
|
| 2. Scrapers with more compute than IP proxies will not be
| significantly bottlenecked by this.
|
| 3. This may lead to an arms race where AI companies respond
| by beefing up their scraping infrastructure, necessitating
| more difficult PoW challenges, and so on. The end result of
| this hypothetical would be a more inconvenient and
| inefficient internet for everyone, including human users.
|
| To be clear: I think Anubis is a great tool for website
| operators, and one of the best self-hostable options
| available today. However, it's a workaround for the core
| problem that we can't reliably distinguish traffic from
| badly behaving AI scrapers from legitimate user traffic.
| pton_xd wrote:
| I thought the closed-garden app stores were supposed to protect
| us from this sort of thing?
| whstl wrote:
| Once again this demonstrate that closed gardens only benefit
| the owners of the garden, and not the users.
|
| What good is all the app vetting and sandbox protection in iOS
| (dunno about Android) if it doesn't really protect me from
| those crappy apps...
| 20after4 wrote:
| At the very least, Apple should require conspicuous
| disclosure of this kind of behavior that isn't just hidden in
| the TOS.
| BlueTemplar wrote:
| Also my reaction when the call is for Google, Apple,
| Microsoft to fix this : DDOS being illegal, shouldn't the
| first reaction instead to be to contact law enforcement ?
|
| If you treat platforms like they are all-powerful, then
| that's what they are likely to become...
| musicale wrote:
| Sandboxing means you can limit network access. For example,
| on Android you can disallow wi-fi and cellular access (not
| sure about bluetooth) on a per-app basis.
|
| Network access settings should really be more granular for
| apps that have a legitimate need.
|
| App store disclosure labels should also add network usage
| disclosure.
| 20after4 wrote:
| That's what they want you to think.
| kibwen wrote:
| If you find yourself in a walled garden, understand that you're
| the crop being grown and harvested.
| jt2190 wrote:
| I'm really struggling to understand how this is different than
| malware we've had forever. Can someone explain what's novel about
| this?
| desertmonad wrote:
| That its _not_ being treated like malware.
| jt2190 wrote:
| In the sense that people are voluntarily installing and
| running this malware on their computers, rather than being
| _tricked_ into running it? Is that the only difference?
| int_19h wrote:
| They are still tricked into running it, since it's normally
| not an advertised "feature" of any app that uses such SDKs.
| downrightmike wrote:
| I think it is funny that the mobile OS is trying to be as
| secure as possible, but then they allow this to run on top
| rsedgwick wrote:
| I think tech can still be beautiful in a less grandiose and
| "omniparadisical" way than people used to dream of. "A wide open
| internet, free as in speech this, free as in beer that, open
| source wonders, open gardens..." Well, there are a lot of
| incentives that fight that, and game theory wins. Maybe we
| download software dependencies from our friends, the ones we
| actually trust. Maybe we write more code ourselves--more
| homesteading families that raise their own chickens, jar their
| own pickled carrots, and code their own networking utilities.
| Maybe we operate on servers we own, or our friends own, and we
| don't get blindsided by news that the platforms are selling our
| data and scraping it for training.
|
| Maybe it's less convenient and more expensive and onerous. Do
| good things require hard work? Or did we expect everyone to
| ignore incentives forever while the trillion-dollar hyperscalers
| fought for an open and noble internet and then wrapped it in
| affordable consumer products to our delight?
|
| It reminds me of the post here a few weeks ago about how Netflix
| used to be good and "maybe I want a faster horse" - we want
| things to be built for us, easily, cheaply, conveniently, by
| companies, and we want those companies not to succumb to
| enshittification - but somehow when the companies just follow the
| game theory and turn everything into a TikToky neural-networks-
| maximizing-engagement-infinite-scroll-experience, it's their
| fault, and not ours for going with the easy path while hoping the
| corporations would not take the easy path.
| reconnecting wrote:
| Residential IP proxies have some weaknesses. One is that they
| ofter change IP addresses during a single web session. Second, if
| IP come from the same proxies provider, they are often
| concentrated within a sing ASN, making them easier to detect.
|
| We are working on an open-source fraud prevention platform [1],
| and detecting fake users coming from residential proxies is one
| of its use cases.
|
| [1] https://www.github.com/tirrenotechnologies/tirreno
| gbcfghhjj wrote:
| At least here in the US most residential ISPs have long leases
| and change infrequently, weeks or months.
|
| Trying to understand your product, where is it intended to sit
| in a network? Is it a standalone tool that you use to identify
| these IPs and feed into something else for blockage or is it
| intended to be integrated into your existing site or is it
| supposed to proxy all your web traffic? The reason I ask is it
| has fairly heavyweight install requirements and Apache and PHP
| are kind of old school at this point, especially for new
| projects and companies. It's not what they would commonly be
| using for their site.
| reconnecting wrote:
| Indeed, if it's a real user from a residential IP address, in
| most cases it will be the same network. However, if it's a
| proxy from residential IPs, there could be 10 requests from
| one network, the 11th request from a second network, and the
| 12th request back from the same network. This is a red flag.
|
| Thank you for your question. tirreno is a standalone app that
| needs to receive API events from your main web application.
| It can work perfectly with 512GB Postgres RAM or even lower,
| however, in most cases we're talking about millions of events
| that request resources.
|
| It's much easier to write a stable application without
| dependencies based on mature technologies. tirreno is fairly
| 'boring software'.
| sroussey wrote:
| My phone will be on the home network until I walk out of
| the house and then it will change networks. This should not
| be a red flag.
| reconnecting wrote:
| Effective fraud prevention relies on both the full user
| context and the behavioral patterns of known online
| fraudsters. The key idea is that an IP address cannot be
| used as a red flag on its own without considering the
| broader context of the account. However, if we know that
| the fraudsters we're dealing with are using mobile
| networks proxies and are randomly switching between two
| mobile operators, that is certainly a strong risk signal.
| JimDabell wrote:
| An awful lot of free Wi-Fi networks you find in malls are
| operated by different providers. Walking from one side of
| a mall to the other while my phone connects to all the
| Wi-Fi networks I've used previously would have you flag
| me as a fraudster if I understand your approach
| correctly.
| reconnecting wrote:
| We are discussing user behavior in the context of a web
| system. The fact that your device has connected to
| different Wi-Fi networks doesn't necessarily mean that
| all of them were used to access the web application.
|
| Finally, as mentioned earlier, there is no silver bullet
| that works for every type of online fraudster. For
| example, in some applications, a TOR connection might be
| considered a red flag. However, if we are talking about
| hn visitors, many of them use TOR on a daily basis.
| andelink wrote:
| The first blog post in this series[1], linked to at the top of
| TFA, offers an analysis on the potential of using ASNs to
| detect such traffic. Their conclusion was that ASNs are not
| helpful for this use-case, showing that across the 50k IPs
| they've blocked, there is less than 4 IP addresses per ASN, on
| average.
|
| [1] https://jan.wildeboer.net/2025/02/Blocking-Stealthy-
| Botnets/
| reconnecting wrote:
| What was done manually in the first blog is exactly what
| tirreno helps to achieve by analyzing traffic, here is live
| example [1]. Blocking an entire ASN should not be considered
| a strategy when real users are involved.
|
| Regarding the first post, it's rare to see both datacenter
| network IPs and mobile proxy IP addresses used
| simultaneously. This suggests the involvement of more than
| one botnet. The main idea is to avoid using IP addresses as
| the sole risk factor. Instead, they should be considered as
| just one part of the broader picture of user behavior.
|
| [1] https://play.tirreno.com
| gruez wrote:
| >One is that they ofter change IP addresses during a single web
| session. Second, if IP come from the same proxies provider,
| they are often concentrated within a sing ASN, making them
| easier to detect.
|
| Both are pretty easy to mitigate with a geoip database and some
| smart routing. One "residential proxy" vendor even has session
| tokens so your source IP doesn't randomly jump between each
| request.
| reconnecting wrote:
| And this is the exact reason why IP addresses cannot be
| considered as the one and only signal for fraud prevention.
| at0mic22 wrote:
| Strange the HolaVPN e.g. Brightdata is not mentioned. They've
| been using user hosts for those purposes for decades, and also
| selling proxies en masse. Fun fact they don't have any servers
| for the VPN. All the VPN traffic is routed through ... other
| users!
| arewethereyeta wrote:
| They are even the first to do it and the most litigious of all.
| Trying to push patents on everything possible, even on water if
| they can.
| Klonoar wrote:
| Is it really strange if the logo is right there in the article?
| andelink wrote:
| Hola is mentioned in the authors prior post on this topic,
| linked to at the top of TFA:
| https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/
| armchairhacker wrote:
| > I am now of the opinion that every form of web-scraping should
| be considered abusive behaviour and web servers should block all
| of them. If you think your web-scraping is acceptable behaviour,
| you can thank these shady companies and the "AI" hype for moving
| you to the bad corner.
|
| Why jump to that conclusion?
|
| If a scraper clearly advertises itself, follows robots.txt, and
| has reasonable backoff, it's not abusive. You can easily block
| such a scraper, but then you're encouraging stealth scrapers
| because they're still getting your data.
|
| I'd block the scrapers that try to hide and waste compute, but
| deliberately allow those that don't. And maybe provide a sitemap
| and API (which besides being easier to scrape, can be faster to
| handle).
| panstromek wrote:
| I'd expect this to be against app store and google play rules,
| they are very picky.
| Pesthuf wrote:
| We need a list of apps that include these libraries and any
| malware scanner - including Windows Defender, Play Protect and
| whatever Apple calls theirs - need to put infected applications
| into quarantine immediately. Just because it's not _directly_
| causing damage to the device running the malware is running on,
| that doesn 't mean it's not malware.
| philippta wrote:
| Apps should be required to ask for permission to access
| specific domains. Similar to the tracking protection, Apple
| introduced a while ago.
|
| Not sure how this could work for browsers, but the other 99% of
| apps I have on my phone should work fine with just a single
| permitted domain.
| jay_kyburz wrote:
| Oh, that's an interesting idea. A local DNS where I have to
| add every entry. A white list rather than Australia's
| national blacklist.
| snackernews wrote:
| My iPhone occasionally displays an interrupt screen to remind
| me that my weather app has been accessing my location in the
| background and to confirm continued access.
|
| It should also do something similar for apps making chatty
| background requests to domains not specified at app review
| time. The legitimate use cases for that behaviour are few.
| zzo38computer wrote:
| I think capability based security with proxy capabilities is
| the way to do it, and this would make it possible for the
| proxy capability to intercept the request and ask permission,
| or to do whatever else you want it to do (e.g. redirections,
| log any accesses, automatically allow or disallow based on a
| file, use or ignore the DNS cache, etc).
|
| The system may have some such functions built in, and asking
| permission might be a reasonable thing to include by default.
| XorNot wrote:
| Try actually using a system like this. OpenSnitch and
| LittleSnitch do it for Linux and MacOS respectively. Fedora
| has a pretty good interface for SELinux denials.
|
| I've used all of them, and it's a deluge: it is too much
| information to reasonably react to.
|
| Your broad is either deny or accept but there's no sane way
| to reliably know what you should do.
|
| This is not and cannot be an individual problem: the easy
| part is building high fidelity access control, the hard
| part is making useful policy for it.
| zzo38computer wrote:
| I suggested proxy capabilities, that it can easily be
| reprogrammed and reconfigured; if you want to disable
| this feature then you can do that too. It is not only
| allow or deny; other things are also possible (e.g.
| simulate various error conditions, artificially slow down
| the connection, go through a proxy server, etc). (This
| proxy capability system would be useful for stuff other
| than network connections too.)
|
| > it is too much information to reasonably react to.
|
| Even if it asks, does not necessarily mean it has to ask
| every time if the user lets it keep the answer (either
| for the current session for until the user deliberately
| deletes this data). Also, if it asks too much because it
| tries to access too many remote servers, then might be
| spyware, malware, etc anyways, and is worth investigating
| in case that is what it is.
|
| > the hard part is making useful policy for it.
|
| What the default settings should be is a significant
| issue. However, changing the policies in individual cases
| for different uses, is also something that a user might
| do, since the default settings will not always be
| suitable.
|
| If whoever manages the package repository, app store, etc
| is able to check for malware, then this is a good thing
| to do (although it should not prohibit the user from
| installing their own software and modifying the existing
| software), but security on the computer is also helpful,
| and neither of these is the substitute for the other;
| they are together.
| tzury wrote:
| Vast majority of revenues in the mobile apps ecosystem are
| ads, which by design pulled from 3rd parties (and are part of
| the broader problem discussed in this post).
|
| I am waiting for Apple to enable /etc/hosts or something
| similar on iOS devices.
| klabb3 wrote:
| On the one hand, yes this could work for many cases. On the
| other hand, good bye p2p. Not every app is a passive client-
| server request-response. One needs to be really careful with
| designing permission systems. Apple has already killed many
| markets before they had a chance to even exist, such as
| companion apps for watches and other peripherals.
| kmeisthax wrote:
| P2P was practically dead on iPhone even back in 2010. The
| whole "don't burn the user's battery" thing precludes
| mobile phones doing anything with P2P other than leeching
| off of it. The only exceptions are things like AirDrop;
| i.e. locally peer-to-peer things that are only active when
| in use and don't try to form an overlay or mesh network
| that would require the phone to become a router.
|
| And, AFAIK, you already need special permission for
| anything other than HTTPS to specific domains on the public
| Internet. That's why apps ping you about permissions to
| access "local devices".
| zzo38computer wrote:
| > other than HTTPS to specific domains on the public
| Internet
|
| They should need special permission for that too.
| Pesthuf wrote:
| Maybe there could be a special entitlement that Apple's
| reviewers would only grant to applications that have a
| legitimate reason to require such connections. Then only
| applications granted that permission would be able to make
| requests to arbitrary domains / IP addresses.
|
| That's how it works with other permissions most
| applications should not have access to, like accessing user
| locations. (And private entitlements third party
| applications can't have are one way Apple makes sure nobody
| can compete with their apps, but that's a separate issue.)
| nottorp wrote:
| > On the other hand, good bye p2p.
|
| You mean, good bye using my bandwidth without my
| permission? That's good. And if I install a bittorrent
| client on my phone, I'll know to give it permission.
|
| > such as companion apps for watches and other peripherals
|
| That's just apple abusing their market position in phones
| to push their watch. What does it have to do with p2p?
| klabb3 wrote:
| > using my bandwidth without my permission
|
| What are you talking about?
|
| > What does it have to do with p2p?
|
| It's an example of when you design sandboxes/firewalls
| it's very easy to assume all apps are one big homogenous
| blob doing rest calls and everything else is malicious or
| suspicious. You often need strange permissions to do
| interesting things. Apple gives themselves these perms
| all the time.
| nottorp wrote:
| Wait, why should applications be allowed to do rest calls
| by default?
|
| > What are you talking about?
|
| That's the main use case for p2p in an application isn't
| it? Reducing the vendors bandwidth bill...
| vbezhenar wrote:
| Do you suggest to outright forbid TCP connections for user
| software? Because you can compile OpenSSL or any other TLS
| library and do a TCP connection to port 443 which will be
| opaque for operating system. They can do wild things like
| kernel-level DPI for outgoing connections to find out host,
| but that quickly turns into ridiculous competition.
| internetter wrote:
| > but that quickly turns into ridiculous competition.
|
| Except the platform providers hold the trump card. Fuck
| around, if they figure it out you'll be finding out.
| udev4096 wrote:
| Android is so fucking anti-privacy that they still don't have
| an INTERNET access revoke toggle. The one they have currently
| is broken and can easily be bypassed with google play
| services (another highly privileged process running for no
| reason other than to sell your soul to google). GrapheneOS
| has this toggle luckily. Whenever you install an app, you can
| revoke the INTERNET access at the install screen and there is
| no way that app can bypass it
| mjmas wrote:
| Asus added this to their phones which is nice.
| proxy_err wrote:
| Its a fair point but very dynamic to sort out. This needs a full
| research team to figure out. Or you know.. all of us combined!!
| It is definitely a problem.
|
| TINFOIL: Sometimes I always wondered if Azure or AWS used bots to
| push site traffic hits to generate money... they know you are
| hosted with them.. They have your info.. Send out bots to drive
| micro accumulation. Slow boil..
| luckylion wrote:
| I think that's mostly that they don't care about having
| malicious bots on their networks as long as they pay.
|
| GCE is rare in my experience. Most bots I see are on AWS. The
| DDOS-adjacent hyper aggressive bots that try random URLs and
| scan for exploits tend to be on Azure or use VPNs.
|
| AWS is bad when you report malicious traffic. Azure has been
| completely unresponsive and didn't react, even for C&C servers.
| aucisson_masque wrote:
| It's interesting but so far there is no definitive proof it's
| happening.
|
| People are jumping to conclusions a bit fast over here, yes
| technically it's possible but this kind of behavior would be
| relatively easy to spot because the app would have to make direct
| connections to the website it wants to scrap.
|
| Your calculator app for instance connecting to CNN.com ...
|
| iOS have app privacy report where one can check what connections
| are made by app, how often, last one, etc.
|
| Android by Google doesn't have such a useful feature of course,
| but you can run third party firewall like pcapdroid, which I
| recommend highly.
|
| Macos (little snitch).
|
| Windows (fort firewall).
|
| Not everyone run these app obviously, only the most nerdy like
| myself but we're also the kind of people who would report on app
| using our device to make, what is in fact, a zombie or bot
| network.
|
| I'm not saying it's necessarily false but imo it remains a theory
| until proven otherwise.
| CharlesW wrote:
| Botnets as a Service are absolutely happening, but as you
| allude to, the scope of the abuse is very different on iOS
| than, say, Windows.
| abaymado wrote:
| > iOS have app privacy report where one can check what
| connections are made by app, how often, last one, etc.
|
| How often is the average calculator app user checking there
| Privacy Report? My guess, not many!
| gruez wrote:
| All it takes is one person to find out and raise the alarm.
| The average user doesn't read the source code behind openssl
| or whatever either, that doesn't mean there's no gains in
| open sourcing it.
| dewey wrote:
| The average user is also not reading these raised "alarms".
| And if an app has a bad name, another one will show up with
| a different name on the same day.
| aucisson_masque wrote:
| You're on a tech forum, you must have seen one of the
| many post about app, either on Android or iPhone, that
| acts like spyware.
|
| They happens from time to time, last one was not more
| than two week ago where it's been shown that many app
| were able to read the list of all other app installed on
| a Android and that Google refused to fix that.
|
| Do you really believe that an app used to make your
| device part of a bot network wouldn't be posted over here
| ?
| dewey wrote:
| "You're on a tech forum", that's exactly the point. The
| "average user" is not on a tech forum though, the average
| user opens the app store of their platform, types
| "calculator" and installs the first one that's free.
| nottorp wrote:
| The real solution is to add a permission for network
| access, with the default set to deny.
| throwaway519 wrote:
| Given 5his is a thing even in browser plugins, and that so very
| few people analyse their firewalls, I'd not discount it at all.
| Much of the world's users hve no clue and app stores are
| notoriously bad at reacting even with publicsed malware e.g.
| 'free' VPNs in iOS Store.
| andelink wrote:
| This is a hilariously optimistic, naive, disconnected from
| reality take. What sort of "proof" would be sufficient for you?
| TFA includes of course data from the authors own server logs^,
| but it also references real SDKs and business selling this
| exact product. You can view the pricing page yourself, right
| next to stats on how many IPs are available for you to exploit.
| What else do you need to see?
|
| ^ edit: my mistake, the server logs I mentioned were from the
| authors prior blog post on this topic, linked to at the top of
| TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-
| Botnets/
| jshier wrote:
| > iOS have app privacy report where one can check what
| connections are made by app, how often, last one, etc.
|
| Privacy reports do not include that information. They include
| broad areas of information the app claims to gather. There is
| zero connection between those claimed areas and what the app
| actually does unless app review notices something that doesn't
| match up. But none of that information is updated dynamically,
| and it has never actually included the domains the app connects
| to. You may be confusing it with the old domain declarations
| for less secure HTTP connections. Once the connections met the
| system standards you no longer needed to declare it.
| zargon wrote:
| I wasn't aware of this feature. But apparently it does
| include that information. I just enabled it and can see the
| domains that apps connect to. https://support.apple.com/en-
| us/102188
| hoc wrote:
| Pretty neat, actually. Thanks for looking uo that link.
| Galanwe wrote:
| There is already a lot of proof. Just ask for a sales pitch
| from companies selling these data and they will gladly explain
| everything to you.
|
| Go to a data conference like Neudata and you will see. You can
| have scraped data from user devices, real-time locations,
| credit card, Google analytics, etc.
| badmonster wrote:
| do you think there's a realistic path forward for better
| transparency or detection--maybe at the OS level or through
| network-level anomaly detection?
| yungporko wrote:
| it's funny, i've never heard of or thought about the possibility
| of this happening but actually in hindsight it seems almost too
| obvious to not be a thing.
| jeroenhd wrote:
| > So there is a (IMHO) shady market out there that gives app
| developers on iOS, Android, MacOS and Windows money for including
| a library into their apps that sells users network bandwidth
|
| AKA "why do Cloudflare and Google make me fill out these CAPTCHAs
| all day"
|
| I don't know why Play Protect/MS Defender/whatever Apple has for
| antivirus don't classify apps that embed such malware as such.
| It's ridiculous that this is allowed to go on when detection is
| so easy. I don't know a more obvious example of a trojan than an
| SDK library making a user's device part of a botnet.
| dx4100 wrote:
| Cloudflare and Google use CAPTCHAs to sell web scrapers? I
| don't get your point. I was under the impression the data is
| used to train models.
| cuu508 wrote:
| Trojans in your mobile apps ruin your IP's reputation which
| comes back to you in the form of frequent, annoying CAPTCHAs.
| aloha2436 wrote:
| The implication is that the users that are being constantly
| presented with CAPTCHAs are experiencing that because they
| are unwittingly proxying scrapers through their devices via
| malicious apps they've installed.
| pentae wrote:
| .. or that other people on their network/Shared public IP
| have installed
| evgpbfhnr wrote:
| or just that they don't run windows/mac OS with chome
| like everyone else and it's "suspicious". I get
| cloudflare capchas all the time with firefox on linux...
| (and I'm pretty sure there's no such app in my home
| network!)
| jeroenhd wrote:
| When a random device on your network gets infected with crap
| like this, your network becomes a bot egress point, and anti
| bot networks respond appropriately. Cloudflare, Akamai, even
| Google will start showing CAPTCHAs for every website they
| protect when your network starts hitting random servers with
| scrapers or DDoS attacks.
|
| This is even worse with CG-NAT if you don't have IPv6 to
| solve the CG-NAT problem.
|
| I don't think the data they collect is used to train anything
| these days. Cloudflare is using AI generated images for
| CAPTCHAs and Google's actual CAPTCHAs are easier for bots
| than humans at this point (it's the passive monitoring that
| makes it still work a little bit).
| areyourllySorry wrote:
| it's not technically malware, you agreed to it when you
| accepted the terms of service :^)
| L-four wrote:
| It's malware it does something malicious.
| panny wrote:
| >Apple, Microsoft and Google should act.
|
| Do nothing, win.
|
| They are the primary benefactors buying this data since they are
| the largest AI players.
| neilv wrote:
| Couldn't Apple and Google (and, to a lesser extent, Microsoft)
| pretty easily shut down almost all the apps that steal bandwidth?
| greesil wrote:
| How would I know if an app on my device was doing this?
| wyck wrote:
| Install a network monitor or go even deeper and sniff packets.
| greesil wrote:
| I feel like this could be automated. Spin up a virtual device
| on a monitored network. Install one app, click on some stuff
| for awhile, uninstall and move onto the next. If the app
| reaches out to a lot of random sites then flag it
|
| Google could do this. I'm sure Apple could as well. Third
| parties could for a small set of apps
| jeroenhd wrote:
| This is being done by a couple of SDKs, it'd be much easier
| to just find and flag those SDK files. Finding apps becomes
| a matter of a single pass scan over the application
| contents rather than attempting to bypass the VM detection
| methods malware is packed full of.
| matheusmoreira wrote:
| "Peer-to-business network"! Amazing. uBlock Origin gets rid of
| this, right?
| __MatrixMan__ wrote:
| The broken thing about the web is that in order for data to
| remain readable, a unique sysadmin somewhere has to keep a server
| running in the face of an increasingly hostile environment.
|
| If instead we had a content addressed model, we could drop the
| uniqueness constraint. Then these AI scrapers could be gossiping
| the data to one another (and incidentally serving it to the rest
| of us) without placing any burden on the original source.
|
| Having other parties interested in your data should make your
| life easier (because other parties will host it for you), not
| harder (because now you need to work extra hard to host it for
| them).
| Timwi wrote:
| Are there any systems like that, even if experimental?
| jevogel wrote:
| IPFS
| alakra wrote:
| I had high hopes for IPFS, but even it has vectors for
| abuse.
|
| See https://arxiv.org/abs/1905.11880 [Hydras and IPFS: A
| Decentralised Playground for Malware]
| __MatrixMan__ wrote:
| Can you point me at what you mean? I'm not immediately
| finding something that indicates that it is not fit for
| this use case. The fact that bad actors use it to resist
| those who want to shut them down is, if anything, an
| endorsement of its durability. There's a bit of overlap
| between resisting the AI scrapers and resisting the FBI.
| You can either have a single point of control and a
| single point of failure, or you can have neither. If
| you're after something that's both reliable and reliably
| censorable--I don't think that's in the cards.
|
| That's not to say that it _is_ a ready replacement for
| the web as we know it. If you have hash-linked everything
| then you wind up with problems trying to link things
| together, for instance. Once two pages exist, you can 't
| after-the-fact create a link between them because if you
| update them to contain that link then their hashes change
| so now you have to propagate the new hash to people. This
| makes it difficult to do things like have a comments
| section at the bottom of a blog post. So you've got to
| handle metadata like that in some kind of extra layer--a
| layer which isn't hash linked and which might be
| susceptible to all the same problems that our current web
| is--and then the browser can build the page from
| immutable pieces, but the assembly itself ends up being
| dynamic (and likely sensitive to the users preference,
| e.g. dark mode as a browser thing not a page thing).
|
| But I still think you could move maybe 95% of the data
| into an immutable hash-linked world (think of these as
| nodes in a graph), the remaining 5% just being tuples of
| hashes and pubic keys indicating which pages are trusted
| by which users, which ought to be linked to which others,
| which are known to be the inputs and output of various
| functions, and you know... structure stuff (these are our
| graph's edges).
|
| The edges, being smaller, might be subject to different
| constraints than the web as we know it. I wouldn't
| propose that we go all the way to a blockchain where
| every device caches every edge, but it might be feasible
| for my devices to store all of the edges for the 5% of
| the web I care about, and your devices to store the edges
| for the 5% that you care about... the nodes only being
| summoned when we actually want to view them. The edges
| can be updated when our devices contact other devices
| (based on trust, like you know that device's owner
| personally) and ask "hey, what's new?"
|
| I've sort of been freestyling on this idea in isolation,
| probably there's already some projects that scratch this
| itch. A while back I made a note to check out
| https://ceramic.network/ in this capacity, but I haven't
| gotten down to trying it out yet.
| XorNot wrote:
| Except no one wants content addressed data - because if you
| knew what it was you wanted, then you would already have stored
| it. The web as we know it is an index - it's a way to discover
| that data is available and specifically we usually want the
| _latest_ data that 's available.
|
| AI scrapers aren't trying to find things they already know
| exist, they're trying to discover what they didn't know
| existed.
| akoboldfrying wrote:
| > because if you knew what it was you wanted, then you would
| already have stored it.
|
| "Content-addressable" has a broader meaning than what you
| seem to be thinking of -- roughly speaking, it applies if
| _any function of_ the data is used as the "address". E.g.,
| git commits are content-addressable by their SHA1 hashes.
| __MatrixMan__ wrote:
| But when you do a "git pull" you're not pulling from
| someplace identified by a hash, but rather a hostname. The
| learning-about-new-hashes part has to be handled
| differently.
|
| It's a legit limitation on what content addressing can do,
| but it's one we can overcome by just not having
| _everything_ be content addressed. The web we have now is
| like if you did a `git pull` every time you opened a file.
|
| The web I'm proposing is like how we actually use git--
| periodically pulling new hashes as a separate action, but
| spending most of our time browsing content that we already
| have hashes for.
| __MatrixMan__ wrote:
| Yes, for the reasons you describe, you can't be both a useful
| web-like protocol and also 100% immutable/hash-linked.
|
| But there's a lot middle ground to explore here. Loading a
| modern web page involves making dozens of requests to a
| variety of different servers, evaluating some javascript, and
| then doing it again a few times, potentially moving several
| Mb of data. The part people want, the thing you don't already
| know exist, it's hidden behind that rather heavy door. It
| doesn't have to be that way.
|
| If you already know about one thing (by its cryptographic
| hash, say) and you want to find out which other hashes it's
| now associated with--associations that might not have existed
| yesterday--that's much easier than we've made it. It can be
| done:
|
| - by moving kB not Mb, we're just talking about a tuple of
| hashes here, maybe a public key and a signature
|
| - without placing additional burden on whoever authored the
| first thing, they don't even have to be the ones who
| published the pair of hashes that your scraper is interested
| in
|
| Once you have the second hash, you can then reenter
| immutable-space to get whatever it references. I'm not sure
| if there's already a protocol for such things, but if not
| then we can surely make one that's more efficient and durable
| than what we're doing now.
| XorNot wrote:
| But we already have HEAD requests and etags.
|
| It is entirely possible to serve a fully cached response
| that says "you already have this". The problem is...people
| don't implement this well.
| __MatrixMan__ wrote:
| People don't implement them well because they're
| overburdened by all of the different expectations we put
| on them. It's a problem with how DNS forces us to
| allocate expertise. As it is, you need some kind of write
| access on the server whose name shows up in the URL if
| you want to contribute to it. This is how globally unique
| names create fragility.
|
| If content were handled independently of server names,
| anyone who cares to distribute metadata for content they
| care about can do so. One doesn't need write access, or
| even to be on the same network partition. You could just
| publish a link between content A and content B because
| you know their hashes. Assembling all of this can happen
| in the browser, subject to the user's configs re: who
| they trust.
| akoboldfrying wrote:
| Assuming the right incentives can be found to prevent
| widespread leeching, a distributed content-addressed model
| indeed solves this problem, but introduces the problem of how
| to control your own content over time. How do you get rid of a
| piece of content? How do you modify the content at a given URL?
|
| I know, as far as possible it's a good idea to have content-
| immutable URLs. But at some point, I need to make
| www.myexamplebusiness.com show new content. How would that
| work?
| __MatrixMan__ wrote:
| As for how to get rid of a piece of content... I think that
| one's a lost cause. If the goal is to prevent things that
| make content unavailable (e.g. AI scrapers) then you end up
| with a design that prevents things that makes content
| unavailable (e.g. legitimate deletions). The whole point is
| that you're not the only one participating in propagating the
| content, and that comes with trade-offs.
|
| But as for updating, you just format your URLs like so: {my-
| public-key}/foo/bar
|
| And then you alter the protocol so that the {my-public-key}
| part resolves to the merkle-root of whatever you most
| recently published. So people who are interested in your
| latest content end up with a whole new set of hashes whenever
| you make an update. In this way, it's not 100% immutable, but
| the mutable payload stays small (it's just a bunch of hashes)
| and since it can be verified (presumably there's a signature
| somewhere) it can be gossiped around and remain available
| even if your device is not.
|
| You can soft-delete something just by updating whatever
| pointed to it to not point to it anymore. Eventually most
| nodes will forget it. But you can't really prevent a node
| from hanging on to an old copy if they want to. But then
| again, could you ever do that? Deleting something on on the
| web has always been a bit of a fiction.
| akoboldfrying wrote:
| > But then again, could you ever do that?
|
| True in the absolute sense, but the effect size is much
| worse under the kind of content-addressable model you're
| proposing. Currently, if I download something from you and
| you later delete that thing, I can still keep my downloaded
| copy; under your model, if _anyone ever_ downloads that
| thing from you and you later delete that thing, with high
| probability I can still acquire it at any later point.
|
| As you say, this is by design, and there are cases where
| this design makes sense. I think it mostly doesn't for what
| we currently use the web for.
| areyourllySorry wrote:
| there is no incentive for different companies to share data
| with each other, or with anyone really (facebook leeching
| books?)
| __MatrixMan__ wrote:
| I figure we'd create that incentive by configuring our
| devices to only talk to devices controlled by people we
| trust. If they want the data at all, they have to gain our
| trust, and if they want that, they have to seed the data. Or
| you know, whatever else the agreement ends up being. Maybe we
| make them pay us.
| theteapot wrote:
| Are ad blockers like AdBlock, uBlock effective against these?
| areyourllySorry wrote:
| i don't believe extensions can modify other extensions
| 156287745637 wrote:
| AI scrapers and "sneaker bots" are just the tip of the iceberg.
| Why are all these entities concentrated and metastasizing from
| just a few superhubs? Why do they look, smell and behave like
| state-level machinery? If you've researched you'll know exactly
| what I'm talking about.
|
| Unless complicit, tech leaders (Apple Google Microsoft) have a
| duty to respond swiftly and decisively. This has been going on
| far too long.
| _ink_ wrote:
| How can I detect such behaviour on my devices / in my home
| network?
| gpi wrote:
| "Infatica is partnered with Bitdefender, a global leader in
| cybersecurity, to protect our SDK users from malicious web
| traffic and content, including infected URLs, untrusted web
| pages, fraudulent and phishing links, and more."
|
| That's not good.
| Quarrel wrote:
| FWIW, Trend Micro wrote up a decent piece on this space in 2023.
|
| It is still a pretty good lay-of-the-land.
|
| https://www.trendmicro.com/vinfo/us/security/news/vulnerabil...
| hinkley wrote:
| When the enshitification initially hit the fan, I had little
| flashbacks of Phil Zimmerman talking about Web of Trust and
| amusing myself thinking maybe we need humans proving they're
| humans to other humans so we know we aren't arguing with LLMs on
| the internet or letting them scan our websites.
|
| But it just doesn't scale to internet size so I'm fucked if I
| know how we should fix it. We all have that cousin or dude in our
| highschool class who would do anything for a bit of money and
| introducing his 'friend' Paul who is in fact a bot whose owner
| paid for the lie. And not like enough money to make it a moral
| dilemma, just drinking money or enough for a new video game. So
| once you get past about 10,000 people you're pretty much back
| where we are right now.
| akoboldfrying wrote:
| I think it should be possible to build something that
| generalises the idea of Web of Trust so that it's more
| flexible, and less prone to catastrophic breakdown past some
| scaling limit.
|
| Binary "X trusts Y" statements, plus transitive closure, can
| lead to long trust paths that we probably shouldn't actually
| trust the endpoints of. Could we not instead assign
| probabilities like "X trusts Y 95%", multiply probabilities
| along paths starting from our own identity, and take the max at
| each vertex? We could then decide whether to finally trust some
| Z if its percentage is more than some threshold T%. (Other ways
| of combining in-edges may be more suitable than max(); it's
| just a simple and conservative choice.)
|
| Perhaps a variant of backprop could be used to automatically
| update either (a) all or (b) just our own weights, given new
| information ("V has been discovered to be fraudulent").
| hinkley wrote:
| True. Perhaps a collective vote past 2 degrees of freedom out
| where multiple parties need to vouch for the same person
| before you believe they aren't a bot. Then you're using the
| exponential number of people to provide diminishing weight
| instead of increasing likelihood of malfeasance.
| nottorp wrote:
| But do we need an infinite and global web of trust?
|
| How about restricting them to everyone-knows-everyone sized
| groups, of like a couple hundred people?
|
| One can be a member of multiple groups so you're not
| actually limited. But the groups will be small enough to
| self regulate.
| hinkley wrote:
| What's that going to do about all of the top search
| results and a good percentage of social media traffic
| being generated by SEO bots? Nothing.
|
| You want to chat with a Dunbar number of people get
| yourself a private discord or slack channel.
| nottorp wrote:
| The Dunbar number of people could vouch for small web
| sites they come across. Or even for FB accounts if they
| choose to.
| hinkley wrote:
| I suspect a lot of people here are the ones in their
| circle who bring in a lot of the cool info that their
| friends missed out on. This still sounds like Slack.
| nottorp wrote:
| We're talking about webs of trust aren't we? Not about
| chat rooms.
|
| I'm hypothesising that any such large scale structure
| will be perverted by commercial interests, while having
| multiple Dunbar sized such structures will have a chance
| to be useful.
| sfink wrote:
| Isn't the point of the web of trust that you can do something
| about the cousins/dudes out there? Once you discover that they
| sold out, even once, you sever them from the web. It doesn't
| matter if they took 20 years to succumb to the temptation, you
| can cut them off tomorrow. And that cuts off everyone they
| vouched for, recursively, unless there's a still-trusted vouch
| chain to someone.
|
| At least, that's the way I've always imagined it working. Maybe
| I need to read up.
| hubraumhugo wrote:
| We all agree that AI crawlers are a big issue as they don't
| respect any established best practices, but we rarely talk about
| the path forward. Scraping has been around for as long as the
| internet, and it was mostly fine. There are many very legitimate
| use cases for browser automation and data extraction (I work in
| this space).
|
| So what are potential solutions? We're somehow still stuck with
| CAPTCHAS, a 25 years old concept that wastes millions of human
| hours and billions in infra costs [0].
|
| How can enable beneficial automation while protecting against
| abusive AI crawlers?
|
| [0] https://arxiv.org/abs/2311.10911
| udev4096 wrote:
| Blame the "AI" companies for that. I am glad the small web is
| pushing hard against these scrapers, with the rise of Anubis as
| a starting point
| lelanthran wrote:
| > Blame the "AI" companies for that. I am glad the small web
| is pushing hard towards these scrapers, with the rise of
| Anubis as a starting point
|
| Did you mean "against"?
| udev4096 wrote:
| Corrected, thanks
| eastbound wrote:
| But people don't interact with your website anymore; they as an
| AI. So the AI crawler is a real user.
|
| I say we ask Google Analytics to count an AI crawler as a real
| view. Let's see who's most popular.
| CalRobert wrote:
| I hate this but I suspect a login-only deanonymised web (made
| simple with chrome and WEI!) is the future. Firefox users can
| go to hell.
| ArinaS wrote:
| We won't.
| CaptainFever wrote:
| My pet peeve is that using the term "AI crawler" for this
| conflates things unnecessarily. There's some people who are
| angry at it due to anti-AI bias and not wishing to share
| information, while there are others who are more concerned
| about it due to the large amount of bandwidth and server
| overloading.
|
| Not to mention that it's unknown if these are actually from AI
| companies, or from people pretending to be AI companies. You
| can set anything as your user agent.
|
| It's more appropriate to mention the specific issue one haves
| about the crawlers, like "they request things too quickly" or
| "they're overloading my server". Then from there, it is easier
| to come to a solution than just "I hate AI". For example, one
| would realize that things like Anubis have existed forever,
| they are just called DDoS protection, specifically those using
| proof-of-work schemes (e.g. https://github.com/RuiSiang/PoW-
| Shield).
|
| This also shifts the discussion away from something that adds
| to the discrimination against scraping in general, and more
| towards what is actually the issue: overloading servers, or in
| other words, DDoS.
| johnnyanmac wrote:
| It's become unbearable in the "AI era". So it's appropriate
| to blame AI for it, ib my eyes. Especially since so much
| defense is based aroind training LLMs.
|
| It's just like how not all Ddoss's are actually hackers or
| bots. Sometimes a server just can't take the traffic of a
| large site flooding in. But the result is the same until
| something is investigated.
| queenkjuul wrote:
| It's not a coincidence that this wasn't a major problem until
| everybody and their dog started trying to build the next
| great LLM.
| jeroenhd wrote:
| The best solution I've seen is to hit everyone with a proof of
| work wall and whitelist the scrapers that are welcome (search
| engines and such).
|
| Running SHA hash calculations for a second or so once every
| week is not bad for users, but with scrapers constantly
| starting new sessions they end up spending most of their time
| running useless Javascript, slowing the down significantly.
|
| The most effective alternative to proof of work calculations
| seems to be remote attestation. The downside is that you're
| getting captchas if you're one of the 0.1% who disable secure
| boot and run Linux, but the vast majority of web users will
| live a captcha free life. This same mechanism could in theory
| also be used to authenticate welcome scrapers rather than
| relying on pure IP whitelists.
| 0manrho wrote:
| > So what are potential solutions?
|
| It won't fully solve the problem, but with the problem
| relatively identified, you must then ask why people are
| engaging in this behavior. Answer: money, for the most part.
| Therefore, follow the money and identify the financial
| incentives driving this behavior. This leads you pretty quickly
| to a solution most people would reject out-of-hand: turn off
| the financial incentive that is driving the enshittification of
| the web. Which is to say, kill the ad-economy.
|
| Or at least better regulate it while also levying punitive
| damages that are significant enough to both disuade bad-actors
| and encourage entities to view data-breaches (or the potential
| therein) and "leakage[0]" as something that should actually be
| effectively secured against. Afterall, there are some upsides
| to the ad-economy that, without it, would present some hard
| challenges (eg, how many people are willing to pay for search?
| what happens to the vibrant sphere of creators of all stripes
| that are incentivized by the ad-economy? etc).
|
| Personally, I can't imagine this would actually happen.
| Pushback from monied interests aside, most people have given up
| on the idea of data-privacy or personal-ownership of their
| data, if they ever even cared in the first place. So, in the
| absence of willing to do do something about the incentive for
| this maligned behavior, we're left with few good options.
|
| 0: https://news.ycombinator.com/item?id=43716704 (see comments
| on all the various ways people's data is being
| leaked/leached/tracked/etc)
| mjaseem wrote:
| I wrote an article about a possible proof of personhood
| solution idea: https://mjaseem.github.io/tech/2025/04/12/proof-
| of-humanity.....
|
| The broad idea is to use zero knowledge proofs with
| certification. It sort of flips the public key certification
| system and adds some privacy.
|
| To get into place, the powers in charge need to sway.
| marginalia_nu wrote:
| Proof-of-work works in terms of preventing large-scale
| automation.
|
| As for letting well behaved crawlers in, I've had an idea for
| something like DKIM for crawlers. Should be possible to set up
| a fairly cheap cryptographic solution that enables crawlers a
| persistent identity that can't be forged.
|
| Basically put a header containing first a string including
| today's date, the crawler's IP, and a domain name, then a
| cryptographic signature of the string. The domain has a TXT
| record with a public key for verifying the identity. It's cheap
| because you really only need to verify the string it once on
| the server side, and the crawler only needs to regenerate it
| once per day.
|
| With that in place, crawlers can crawl with their reputation at
| stake. The big problem with these rogue scrapers are that
| they're basically impossible to identify or block, which means
| they don't have any incentives to behave well.
| caelinsutch wrote:
| CAPTCHAS are also quickly becoming irrelevant / not enough.
| Fingerprint based approaches seem to be the only realistic way
| forward in the cat / mouse game
| y42 wrote:
| Let me get this straight: we want computers knowing everything,
| to solve current and future problems, but we don't want to give
| them access to our knowledge?
| chairmansteve wrote:
| Not sure we do.
| 3np wrote:
| I don't want your computer to know everything about me, in
| fact.
| drawfloat wrote:
| Most people don't want computers to know everything - ask the
| average person if they want more or less of their lives
| recorded and stored.
| lelanthran wrote:
| > Let me get this straight: we want computers knowing
| everything, to solve current and future problems, but we don't
| want to give them access to our knowledge?
|
| Who said that?
|
| There's basically two extremes:
|
| 1. We want access to all of human knowledge, now and forever,
| in order to monetise it and make more money for us, and us
| alone.
|
| and
|
| 2. We don't want our freely available knowledge sold back to
| us, with no credits to the original authors.
| jeroenhd wrote:
| I don't want computers to know everything. Most knowledge on
| the internet is false and entirely useless.
|
| The companies selling us computers that supposedly know
| everything should pay for their database, or they should give
| away the knowledge they gained for free. Right now, the
| scraping and copying is free and the knowledge is behind a
| subscription to access a proprietary model that forms the basis
| of their business.
|
| Humanity doesn't benefit, the snake oil salesmen do.
| areyourllySorry wrote:
| further reading
|
| https://krebsonsecurity.com/?s=infatica
|
| https://krebsonsecurity.com/tag/residential-proxies/
|
| https://spur.us/blog/
|
| https://bright-sdk.com/ <- way bigger than infatica
| dspillett wrote:
| _> So there is a (IMHO) shady market out there that gives app
| developers on iOS, Android, MacOS and Windows money for including
| a library into their apps that sells users network bandwidth._
|
| This is yet another reason why we need to be wary of popular
| apps, add-ons, extensions, and so forth changing hands, by
| legitimate sale or more nefarious methods. Initially innocent
| utilities can be quickly coopted into being parts of this sort of
| scheme.
| aorth wrote:
| In the last week I've had to deal with two large-scale influxes
| of traffic on one particular web server in our organization.
|
| The first involved requests from 300,000 unique IPs in a span of
| a few hours. I analyzed them and found that ~250,000 were from
| Brazil. I'm used to using ASNs to block network ranges sending
| this kind of traffic, but in this case they were spread thinly
| over 6,000+ ASNs! I ended up blocking all of Brazil (sorry).
|
| A few days later this same web server was on fire again. I
| performed the same analysis on IPs and found a similar number of
| unique addresses, but spread across Turkey, Russia, Argentina,
| Algeria and many more countries. What is going on?! Eventually I
| _think_ I found a pattern to identify the requests, in that they
| were using ancient Chrome user agents. Chrome 40, 50, 60 and up
| to 90, all released 5 to 15 years ago. Then, just before I could
| implement a block based on these user agents, the traffic
| stopped.
|
| In both cases the traffic from datacenter networks was limited
| because I already rate limit a few dozen of the larger ones.
|
| Sysadmin life...
| rollcat wrote:
| Try Anubis: <https://anubis.techaro.lol>
|
| It's a reverse proxy that presents a PoC challenge to every new
| visitor. It shifts the initial cost of accessing your server's
| resources back at the client. Assuming your uplink can handle
| 300k clients requesting a single 70kb web page, it should solve
| most of your problems.
|
| For science, can you estimate your _peak_ QPS?
| marginalia_nu wrote:
| Anubis is a good choice because it whitelists legitimate and
| well behaved crawlers based on IP + user-agent. Cloudflare
| works as well in that regard but then you're MITM:ing all
| your visitors.
| Imustaskforhelp wrote:
| Also, I was just watching brodie robertson video about how
| United Nations has this random search page of unesco which
| actually has anubis.
|
| Crazy how I remember the HN post where anubis's blog post was
| first made. Though, I always thought it was a bit funny with
| anime and it was made by frustration of (I think AWS? AI
| scrapers who won't follow general rules and it was constantly
| giving requests to his git server and it actually made his
| git server down I guess??) I didn't expect it to blow up to
| ... UN.
| xena wrote:
| Her*
|
| It was frustration at AWS' Alexa team and their abuse of
| the commons. Amusingly if they had replied to my email
| before I wrote my shitpost of an implementation this all
| could have turned out vastly differently.
| luckylion wrote:
| I've seen a few attacks where the operators placed malicious
| code on high-traffic sites (e.g. some government thing, larger
| newspapers), and then just let browsers load your site as an
| img. Did you see images, css, js being loaded from these IPs?
| If they were expecting images, they wouldn't parse the HTML and
| not load other resources.
|
| It's a pretty effective attack because you get large numbers of
| individual browsers to contribute. Hosters don't care, so
| unless the site owners are technical enough, they can stay
| online quite a bit.
|
| If they work with Referrer Policy, they should be able to mask
| themselves fairly well - the ones I saw back then did not.
| jgalt212 wrote:
| I blame the VCs. They don't stop, and implicitly encourage,
| website-crushing scrapers among their funded ventures.
|
| It's not a crime if we do it with an app
|
| https://pluralistic.net/2025/01/25/potatotrac/#carbo-loading
| reincoder wrote:
| I work for IPinfo (a commercial service). We offer a residential
| proxy detection service, but it costs money.
|
| If you are being bombarded by suspicious IP addresses, please
| consider using our free service and blocking IP addresses by ASN
| or Country. I think ASN is a common parameter for malicious IP
| addresses. If you do not have time to explore our services/tools
| (it is mostly just our CLI: https://github.com/ipinfo/cli),
| simply paste the IP addresses (or logs) in plain text, send it to
| me and I will let you know the ASNs and corresponding ranges to
| block.
| throwaway74663 wrote:
| Blocking countries is such a poorly disguised form of racism.
| Funny how it's always the brown / yellow people countries that
| get blocked, and never the US, despite it being one of the
| leading nations in malicious traffic.
___________________________________________________________________
(page generated 2025-04-20 23:01 UTC)